On Wednesday 23rd of January I attended the Digital Preservation Coalition briefing day titled ‘Email Preservation: How Hard Can It Be? 2’ with my colleague Iram. As I attended the first briefing day back in July 2017 it was a great opportunity to see what advances and changes had been achieved. This blog post will briefly highlight what I found particularly thought provoking and focus on two of the talks about e-discovery from a lawyers view point.
The day began with an introduction by the co-chair of the report, Chris Prom (@chrisprom), informing us of the work that the task force had been doing. This was followed by a variety of talks about the use of email archives and some of the technologies used for the large scale processing from the perspective of researchers and lawyers. The day was concluded with a panel discussion (for a twist, we the audience were the panel) about the pending report and the next steps.
Update on Task Force on Technical Approaches to Email Archives Report
Chris Prom told us how the report had taken on the comments from the previous briefing day and also from consultation with many other people and organisations. This led to clearer and more concise messages. The report itself does not aim to provide hard rules but to give an overview of the current situation and some recommendations that people or organisations involved with, interested in or are considering email preservation can consider.
Reconstruction of Narrative in e-Discovery Investigations and The Future of Email Archiving: Four Propositions
Simon Attfield (Middlesex university) and Larry Chapin (attorney) spoke about narrative and e-discovery. It was a fascinating insight into a lawyers requirements for use of email archives. Larry used the LIBOR scandal as an example of a project he worked on and the power of emails in bringing people to justice. E-discovery from his perspective was its importance to help create a narrative and tell a story, something at the moment a computer cannot do. Emails ‘capture the stuff of story making’ as they have the ability to reach into the crevasses of things and detail the small. He noted how emails contain slang and interestingly the language of intention and desire. These subtleties show the true meaning of what people are saying and that is important in the quest for the truth. Simon Attfield presented his research on the coding aspect to aid lawyers in assessing and sorting through these vast data sets. The work he described here was too technical for me to truly understand however it was clear that collaboration between archivist, users and the programmers/researchers will be vital for better preservation and use strategies.
Jason Baron (@JasonRBaron1) (attorney) gave a talk on the future of email archiving detailing four propositions.
The general conclusions from this talk was that automation and technology will be playing an even bigger part in the future to help with acquisition, review (filtering out sensitive material) and searching (aiding access to larger collections). As one of the leads of the Capstone project, he told us how that particular project saves all emails for a short time and some forever, removing the misconceptions that all emails are going to be saved forever. Analysis of how successful Capstone has been in reducing signal to noise ratio (so only capturing email records of permanent value) will be important going forward.
The problem of scale, which permeates into most aspects of digital preservation, again arose here. For lawyers, they must review any and all information, which when looking at emails accounts can be colossal. The analogy that was given was of finding a needle in a haystack – lawyers need to find ALL the needles (100% recall).
Current predictive coding for discovery requires human assistance. Users have to tell the program whether the recommendations it produced were correct, the program will learn from this process and hopefully become more accurate. Whilst a program can efficiently and effectively sort personal information such as telephone numbers, date of birth etc it cannot currently sort out textual content that required prior knowledge and non-textual content such as images.
Panel Discussion and Future Direction
The final report is due to be published around May 2018. Email is a complex digital object and the solution to its preservation and archiving will be complex also.
The technical aspects of physically preserving emails are available but we still need to address the effective review and selection of the emails to be made available to the researcher. The tools currently available are not accurate enough for large scale processing, however, as artificial intelligence becomes better and more advanced, it appears this technology will be part of the solution.
Tim Gollins (@timgollins) gave a great overview of the current use of technology within this context, and stressed the point that the current technology is here to ASSIST humans. The tools for selection, appraisal and review need to be tailored for each process and quality test data is needed to train the programs effectively.
The non technical aspects further add to the complexity, and might be more difficult to address, as a community we need to find answers to:
- Who’s email to capture (particularly interesting when an email account is linked to a position rather than a person)
- How much to capture (entire accounts such as in the case of Capstone or allowing the user to choose what is worthy of preservation)
- How to get persons of interest engaged (effectiveness of tools that aid the process e.g. drag and drop into record management systems or integrated preservation tools)
- Legal implications
- How to best present the emails for scholarly research (bespoke software such as ePADD or emulation tools that recreate the original environment or a system that a user is familiar with)
Like most things in the digital sector, this is a fast moving area with ever changing technologies and trends. It might be frustrating there is no hard guidance on email preservation, when the Task Force on Technical Approaches to Email Archives report is published it will be an invaluable resource and a must read for anyone with an interest or actively involved in email preservation. The takeaway message was, and still is, that emails matter!