Category Archives: Event

#WeMissiPRES: A Bridge from 2019 to 2021

Every year, the international digital preservation community meets for the iPRES conference, an opportunity for practitioners to exchange knowledge and showcase the latest developments in the field. With the 2020 conference unable to take place due to the global pandemic, digital preservation professionals instead gathered online for #WeMissiPRES to ensure that the global community remained connected. Our graduate trainee digital archivist Simon Mackley attended the first day of the event; in this blog post he reflects on some of the highlights of the talks and what they tell us about the state of the field.

How do you keep the global digital preservation community connected when international conferences are not possible? This was the challenge faced by the organisers of #WeMissIPres, a three-day online event hosted by the Digital Preservation Coalition. Conceived as a festival of digital preservation, the aim was not to try and replicate the regular iPRES conference in an online format, but instead to serve as a bridge for the digital preservation community, connecting the efforts of 2019 with the plans for 2021.

As might be expected, the impact of the pandemic loomed large in many of the talks. Caylin Smith (Cambridge University Library) and Sara Day Thomson (University of Edinburgh) for instance gave a fascinating paper on the challenge of rapidly collecting institutional responses to coronavirus, focusing on the development of new workflows and streamlined processes. The difficulties of working from home, the requirements of remote access to resources, and the need to move training online likewise proved to be recurrent themes throughout the day. As someone whose own experience of digital preservation has been heavily shaped by the pandemic (I began my traineeship at the start of lockdown!) it was really useful to hear how colleagues in other institutions have risen to these challenges.

I was also struck by the different ways in which responses to the crisis have strengthened digital preservation efforts. Lynn Bruce and Eve Wright (National Records of Scotland) noted for instance that the experience of the pandemic has led to increased appreciation of the value of web-archiving from stakeholders, as the need to capture rapidly-changing content has become more apparent. Similarly, Natalie Harrower (Digital Repository of Ireland) made the excellent point that the crisis had not only highlighted the urgent need for the sharing of medical research data, but also the need to preserve it: Coronavirus data may one day prove essential to fighting a future pandemic, and so there is therefore a moral imperative for us to ensure that it is preserved.

As our keynote speaker Geert Lovink (Institute of Network Cultures) reminded us, the events of the past year have been momentous quite apart from the pandemic, with issues such as the distorting impacts of social media on society, the climate emergency, and global demands for racial justice all having risen to the forefront of society. It was great therefore to see the role of digital preservation in these challenges being addressed in many of the panel sessions. A personal highlight for me was the presentation by Daniel Steinmeier (KB National Library of the Netherlands) on diversity and digital preservation. Steinmeier stressed that in order for diversity efforts to be successful, institutions needed to commit to continuing programmes of inclusion rather than one-off actions, with the communities concerned actively included in the archiving process.

So what challenges can we expect from the year ahead? Perhaps more than ever, this year this has been a difficult question to answer. Nonetheless, a key theme that struck me from many of the discussions was that the growing challenge of archiving social media platforms was matched only by the increasing need to preserve the content hosted on them. As Zefi Kavvadia (International Institute of Social History) noted, many social media platforms actively resist archiving; even when preservation is possible, curators are faced with a dilemma between capturing user experiences and capturing platform data. Navigating this challenge will surely be a major priority for the profession going forward.

While perhaps no substitute for meeting in person, #WeMissiPRES nonetheless succeeded in bringing the international digital preservation community together in a shared celebration of the progress being made in the field, successfully bridging the gap between 2019 and 2021, and laying the foundations for next year’s conference.

 

#WeMissiPRES was held online from 22nd-24th September 2020. For more information, and for recordings of the talks and panel sessions, see the event page on the DPC website.

The Archives and Records of Humanitarian Organisations

On 20th November the Bodleian Libraries hosted a workshop on ‘The Archives and Records of Humanitarian Organisations: Challenges and Opportunities’. The event was attended by archivists, curators and academics working within the field of humanitarian archives and I was pleased to be invited along to learn more about their work and write a blogpost about some of my observations.

The first talk was given by Chrissie Webb, Project Archivist at the Bodleian Libraries, who discussed her work on the archive of the international charity, Oxfam. The archive was donated to the Bodleian in 2012 and constitutes an enormous collection of over 10,000 boxes of material. Chrissie explained that the archive mostly consists of written documents, but also contains objects and ephemera, audio recordings and digital materials. Cataloguing the archive took several years and was funded by a grant from the Wellcome Trust, as the materials are of great interest to those studying the history of health and public policy, humanitarianism and the voluntary sector. Chrissie touched on a number of issues in her talk, particularly highlighting the challenges of appraising and arranging a collection of such size in sufficient detail. As a trainee the principles of arrangement are still quite new to me, so the idea of working on a collection so big is extremely daunting! The work required robust workflows and proved useful as a case study for development of the Bodleian Libraries appraisal guidelines for future collections. Chrissie also highlighted that the Oxfam catalogue was published on a rolling basis to allow the Libraries to promote the collection and prevent an end-of-project information dump of epic proportions. If you’re curious to learn more, the Oxfam archive can be explored via Bodleian Archives & Manuscripts: https://archives.bodleian.ox.ac.uk/

The second talk was about the Save the Children Fund archive and was given by Matthew Goodwin, Project Archivist at the Cadbury Research Library, University of Birmingham. The Save the Children Fund archive shares some immediate similarities with the Oxfam archive: it was acquired by the University of Birmingham at around the same time (2011) and is being catalogued thanks to a grant from the Wellcome Trust. The archive covers the activities of the charity in the 20th and early 21st Century and while it is smaller than the Oxfam archive, it still spans over 2000 boxes of material. Matthew noted some interesting trends that he came across in the archive, such as the charity’s move away from campaign material that included intense images of child poverty and towards more positive images that highlighted the charity’s life-saving work. This is a trend that is noticeable across the sector, as many humanitarian organisations have chosen to pivot their publicity materials in this way in recent years.

A particularly interesting discussion evolved around the challenges presented by archives that contain graphic or distressing material and how this effects the archivists cataloguing the collections and the readers who access them. Several attendees noted that their work with collections from humanitarian and aid organisations had presented this issue. Possible solutions discussed included inserting warning notices inside boxes containing especially graphic material to warn users in advance of their contents and seating those using these materials in separate parts of the reading room to prevent other readers from accidentally viewing them. The archival community has shown an increased awareness of these challenges in recent years and in 2017 the Archives and Records Association (ARA) released guidance for professionals working with potentially disturbing materials. Their documents explore the current research around ‘vicarious’ or secondary trauma and compassion fatigue, as well as offering practical techniques for staff and detailing how to access support. Their guidance can be found here: https://www.archives.org.uk/what-we-do/emotional-support-guides.html

Regrettably I wasn’t able to attend the afternoon workshop sessions which discussed the Red Cross Archive and Museum and how the collections of humanitarian organisations factor into the work of NGOs. Hopefully as my traineeship develops I will get a chance to revisit these collections and learn more!

Archives Unleashed – Vancouver Datathon

On the 1st-2nd of November 2018 I was lucky enough to attend the  Archives Unleashed Datathon Vancouver co-hosted by the Archives Unleashed Team and Simon Fraser University Library along with KEY (SFU Big Data Initiative). I was very thankful and appreciative of the generous travel grant from the Andrew W. Mellon Foundation that made this possible.

The SFU campus at the Habour Centre was an amazing venue for the Datathon and it was nice to be able to take in some views of the surrounding mountains.

About the Archives Unleashed Project

The Archives Unleashed Project is a three year project with a focus on making historical internet content easily accessible to scholars and researchers whose interests lay in exploring and researching both the recent past and contemporary history.

After a series of datathons held at a number of International institutions such as the British Library, University of Toronto, Library of Congress and the Internet Archive, the Archives Unleashed Team identified some key areas of development that would enable and help to deliver their aim of making petabytes of valuable web content accessible.

Key Areas of Development
  • Better analytics tools
  • Community infrastructure
  • Accessible web archival interfaces

By engaging and building a community, alongside developing web archive search and data analysis tools the project is successfully enabling a wide range of people including scholars, programmers, archivists and librarians to “access, share and investigate recent history since the early days of the World Wide Web.”

The project has a three-pronged approach
  1. Build a software toolkit (Archives Unleashed Toolkit)
  2. Deploy the toolkit in a cloud-based environment (Archives Unleashed Cloud)
  3. Build a cohesive user community that is sustainable and inclusive by bringing together the project team members with archivists, librarians and researchers (Datathons)
Archives Unleashed Toolkit

The Archives Unleashed Toolkit (AUT) is an open-source platform for analysing web archives with Apache Spark. I was really impressed by AUT due to its scalability, relative ease of use and the huge amount of analytical options it provides. It can work on a laptop (Mac OS, Linux or Windows), a powerful cluster or on a single-node server and if you wanted to, you could even use a Raspberry Pi to run AUT. The Toolkit allows for a number of search functions across the entirety of a web archive collection. You can filter collections by domain, URL pattern, date, languages and more. Create lists of URLs to return the top ten in a collection. Extract plain text files from HTML files in the ARC or WARC file and clean the data by removing ‘boilerplate’ content such as advertisements. Its also possible to use the Stanford Named Entity Recognizer (NER) to extract names of entities, locations, organisations and persons. I’m looking forward to seeing the possibilities of how this functionality is adapted to localised instances and controlled vocabularies – would it be possible to run a similar programme for automated tagging of web archive collections in the future? Maybe ingest a collection into ATK , run a NER and automatically tag up the data providing richer metadata for web archives and subsequent research.

Archives Unleashed Cloud

The Archives Unleashed Cloud (AUK) is a GUI based front end for working with AUT, it essentially provides an accessible interface for generating research derivatives from Web archive files (WARCS). With a few clicks users can ingest and sync Archive-it collections, analyse the collections, create network graphs and visualise connections and nodes. It is currently free to use and runs on AUK central servers.

My experience at the Vancouver Datathon

The datathons bring together a small group of 15-20 people of varied professional backgrounds and experience to work and experiment with the Archives Unleashed Toolkit and the Archives Unleashed Cloud. I really like that the team have chosen to minimise the numbers that attend because it created a close knit working group that was full of collaboration, knowledge and idea exchange. It was a relaxed, fun and friendly environment to work in.

Day One

After a quick coffee and light breakfast, the Datathon opened with introductory talks from project team members Ian Milligan (Principal Investigator), Nick Ruest (Co-Principal Investigator) and Samantha Fritz (Project Manager), relating to the project – its goals and outcomes, the toolkit, available datasets and event logistics.

Another quick coffee break and it was back to work – participants were asked to think about the datasets that interested them, techniques they might want to use and questions or themes they would like to explore and write these on sticky notes.

Once placed on the white board, teams naturally formed around datasets, themes and questions. The team I was in consisted of  Kathleen Reed and Ben O’Brien  and formed around a common interest in exploring the First Nations and Indigenous communities dataset.

Virtual Machines were kindly provided by Compute Canada and available for use throughout the Datathon to run AUT, datasets were preloaded onto these VMs and a number of derivative files had already been created. We spent some time brainstorming, sharing ideas and exploring datasets using a number of different tools. The day finished with some informative lightning talks about the work participants had been doing with web archives at their home institutions.

Day Two

On day two we continued to explore datasets by using the full text derivatives and running some NER and performing key word searches using the command line tool Grep. We also analysed the text using sentiment analysis with the Natural Language Toolkit. To help visualise the data, we took the new text files produced from the key word searches and uploaded them into Voyant tools. This helped by visualising links between words, creating a list of top terms and provides quantitative data such as how many times each word appears. It was here we found that the word ‘letter’ appeared quite frequently and we finalised the dataset we would be using – University of British Columbia – bc-hydro-site-c.

We hunted down the site and found it contained a number of letters from people about the BC Hydro Dam Project. The problem was that the letters were in a table and when extracted the data was not clean enough. Ben O’Brien came up with a clever extraction solution utilising the raw HTML files and some script magic. The data was then prepped for geocoding by Kathleen Reed to show the geographical spread of the letter writers, hot-spots and timeline, a useful way of looking at the issue from the perspective of engagement and the community.

Map of letter writers.

Time Lapse of locations of letter writers. 

At the end of day 2 each team had a chance to present their project to the other teams. You can view the presentation (Exploring Letters of protest for the BC Hydro Dam Site C) we prepared here, as well as the other team projects.

Why Web Archives Matter

How we preserve, collect, share and exchange cultural information has changed dramatically. The act of remembering at National Institutes and Libraries has altered greatly in terms of scope, speed and scale due to the web. The way in which we provide access to, use and engage with archival material has been disrupted. All current and future historians who want to study the periods after the 1990s will have to use web archives as a resource. Currently issues around accessibility and usability have lagged behind and many students and historians are not ready. Projects like Archives Unleashed will help to furnish and equip researchers, historians, students and the community with the necessary tools to combat these problems. I look forward to seeing the next steps the project takes.

Archives Unleashed are currently accepted submissions for the next Datathon in March 2019, I highly recommend it.

Attending the ARA Annual Conference 2018

ARA Annual Conference 2018, Grand Central Hotel, Glasgow

ARA Annual Conference 2018, Grand Central Hotel, Glasgow

Having been awarded the Diversity Bursary for BME individuals, sponsored by Kevin J Bolton Ltd., I was able to attend the ARA Annual Conference 2018 held in Glasgow in August.

Capitalising on the host city’s existing ubiquitous branding of People Make Glasgow,  the Conference Committee set People Make Records as this year’s conference theme. This was then divided into three individual themes, one for each day of the conference:

  • People in Records
  • People Using Records
  • People Looking After Records

Examined through the lens of the above themes over the course of three days,  this year’s conference addressed three keys areas within the sector: representation, diversity and engagement.

Following an introduction from Kevin Bolton (@kevjbolton), the conference kicked off with Professor Gus John (@Gus_John) delivering the opening keynote address, entitled “Choices of the Living and the Dead”. With People Make Records the theme for the day, Professor John gave a powerful talk discussing how people are impacting the records and recordkeeping of African (and other) diaspora in the UK, enabling the airbrushing of the history of oppressed communities. Professor John noted yes people make records, but we also determine what to record, and what to do with it once it has been recorded.

Noting the ignorance surrounding racial prejudice and violence, citing the Notting Hill race riots, the Windrush generation,  and Stephen Lawrence as examples, Professor John illustrated how the commemoration of historical events is selective: while in 2018 the 50th anniversary of the Race Relations Act received much attention, in comparison the 500th anniversary of the start of the Transatlantic Slave Trade was largely ignored, by the sector and the media alike.  This culture of oppression, and omission, he said, is leading to ignorance amongst young people about major defining events, contributing to a removal of context to historically oppressed groups.

In response to questions from the audience, Professor John noted that one of the problems facing the sector is the failure to interrogate the ‘business as usual’ climate, and that it may be ‘too difficult to consider what an alternative route might be’. Professor John challenged us to question the status quo: ‘Why is my curriculum white? Why isn’t my lecturer black? What does “de-colonising” the curriculum mean? This is what we must ask ourselves’.

Following Professor John’s keynote and his ultimate call to action, there was a palpable atmosphere of engagement amongst the delegates, with myself and those around me eager to spend the next three days learning from the experiences of others, listening to new perspectives and extracting guidance on the actions we may take to develop and improve our sector, in terms of representation, diversity and engagement.

Various issues relating to these areas were threaded throughout many of the presentations, and as a person of colour at the start of my career in this sector, and recipient of the Diversity Bursary, I was excited to hear more about the challenges facing marginalised communities in archives and records, including some I could relate to on a personal and professional level, and, hopefully, also take away some proposed solutions and recommendations.

I attended an excellent talk by Adele Patrick (@AdelePAtrickGWL),  of Glasgow Women’s Library, who discussed the place for feminism within the archive, noting GWL’s history in resistance, and insistence on a plural representation, when women’s work, past and present, is eclipsed. Dr Alan Butler (@AButlerArchive), Coordinator at Plymouth LGBT Community Archive, discussed his experiences of trying to create a sense of community within a group that is inherently quite nebulous.  Nevertheless, Butler illustrated the importance of capturing LGBTQIA+ history, as people today are increasingly removed from the struggles that previous generations have had to overcome, echoing a similar point Professor Gus John made earlier.

A presentation which particularly resonated with me came from Kirsty Fife (@DIYarchivist) and Hannah Henthorn (@hanarchovist), on the issue of diversity in the workforce. Fife and Henthorn presented the findings from their research, including their survey of experiences of marginalisation in the UK archive sector, highlighting the structural barriers to diversifying the archive sector workforce. Fife and Henthorn identified several key themes which are experienced  by marginalised communities in the sector, including: the feeling of isolation and otherness in both workplace and universities; difficulties in gaining qualifications, perhaps due to ill health/disability/financial barriers/other commitments; feeling unsafe and under confident in professional spaces and a frustration at the lack of diversity in leadership roles.

As a Graduate Trainee Digital Archivist, I couldn’t abandon my own focus on digital preservation and digital archiving, and as such attended various digital-related talks, including “Machines make records: the future of archival processing” by Jenny Bunn (@JnyBn1), discussing the impact of taking a computational approach to archival processing, “Using digital preservation and access to build a sustainable future for your archive” led by Ann Keen of Preservica, with presentations given by various Preservica users, as well as a mini-workshop led by Sarah Higgins and William Kilbride, on ethics in digital preservation, asking us to consider if we need our own code of conduct in digital preservation, and what this could look like.

Image of William Kilbride and Sarah Higgins running their workshop "Encoding ethics: professional practice digital preservation", ARA Annual Conference 2018, Glasgow

William Kilbride and Sarah Higgins running their workshop “Encoding ethics: professional practice digital preservation”, ARA Annual Conference 2018, Glasgow

I have only been able to touch on a very small amount of what I heard and learnt at the many and varied talks, presentations and workshops at the ARA conference,  however,  one thing I took away from the conference was the realisation that archivists and recordkeepers have the power to challenge structural inequalities, and must act now, in order to become truly inclusive. As Michelle Caswell (@professorcaz), 2nd keynote speaker said, we must act with sensitivity, acknowledge our privileges and, above all empower not marginalise. This conference felt like a call to action to the archive and recordkeeping community, in order to include the ‘hard to reach’ communities, or alternatively as Adele Patrick noted, the ‘easy to ignore’. As William Kilbride (@William Kilbride) said, this is an exciting time to be in archives.

I want to thank Kevin Bolton for sponsoring the Diversity Bursary, which enabled me to attend an enriching, engaging and informative event, which otherwise would have been inaccessible for me.

________________________________________
Because every day is a school day, as homework for us all, I made a note of some of the recommendations made by speakers throughout the conference, compiled into this very brief list which I thought I would share:

Reading list

Celebrating the Life of Clement Attlee

Photograph of Clement Attlee, n.d. [MS. CRA. 99].


Join the Attlee Foundation and Bodleian Libraries on the 25
th of October in the Weston Lecture Theatre to celebrate the life and legacy of Clement Attlee.

The event will commence with a lecture given by John Bew on the political thought of Clement Attlee. A  Professor of History and Foreign Policy at the War Studies Department at King’s College London, John Bew is also the author of five books including the award-winning biography Citizen Clem: A Life of Attlee (2016), which received the Orwell Prize for Political Writing, the Elizabeth Longford Prize for Historical Biography and the Best Book in the U.K.

A list by Clement Attlee of his “best appointments”, n.d. [post 1951] [MS. CRA. 10].


The lecture will be accompanied by a display of items from Clement Attlee’s personal archive. Covering the years 1945-1951, the display offers viewers a unique insight into the life and work of Attlee, forming a celebration of his achievements in both personal, political and public arenas.

Booking Information:

This event is free but places are limited so please complete the booking form via our website  to reserve tickets in advance. All bookings are subject to a £1 booking fee.

Doors open at 6.15pm. The lecture begins at 6.30pm, and will be followed by a drinks reception.

Sir Oliver Wardrop’s desk diaries donated to the library

Audience members who attended the launch of Nikoloz Aleksidze’s book Georgia: a Cultural Journey through the Wardrop Collection  at the Weston Library on June 1st also had the novel experience of witnessing the arrival of a further addition to the Bodleian’s Wardrop  holdings. A family descendant of Sir Oliver, who was attending the launch, brought his desk diaries to donate to the collection. The Wardrop collection forms the nucleus of the Bodleian’s rich holdings of Georgian books and the donation of the desk diaries enriches this significant collection still further.

Dating from 1882-1948, the diaries provide details of Sir Oliver’s daily meetings and activities. They  will offer scholars an important glimpse into his day-to-day life, particularly during the critical period leading up to and immediately after the formation of the Democratic Republic of Georgia when he served as the British High Commissioner for Transcaucasia.

 

A life in letters: a tribute to Jenny Joseph

Miriam Margolyes

Miriam Margolyes

On Sunday 13th May the actress Miriam Margolyes will be in Oxford to perform a public reading of poems by Oxford alumna Jenny Joseph, the author of Warning:

‘When I am an old woman I shall wear purple
With a red hat which doesn’t go, and doesn’t suit me’

The event, hosted by the Bodleian and St Hilda’s College, celebrates the life and work of Jenny Joseph, who died this January, and will include a selection of poetry ranging across her more than 50 year-long writing career. She donated her literary archive to the Bodleian in 2017.

The reading will be at the beautiful, seventeenth-century Convocation House in the Old Bodleian Library from 11.30pm-1.00pm. Tickets cost £12 (£10 concessions), including tea/coffee and a pastry. You can book tickets online at What’s on, or phone the box office at 01865 278112 (there is a £2 booking fee for phone bookings).

Please note that tickets will not be available on the door.

DPC Email Preservation: How Hard Can It Be? Part 2

Source: https://lu2cspjiis-flywheel.netdna-ssl.com/wp-content/uploads/2015/09/email-marketing.jpg

In July last year my colleague Miten and I attended a DPC Briefing Day titled Email Preservation: How Hard Can It Be?  which introduced me to the work of the Task Force on Technical Approaches to Email Archives  and we were lucky enough to attend the second session last week.

Arranging a second session gave Chris Prom (@chrisprom), University of Illinois at Urbana-Champaign and Kate Murray (@fileformatology), Library of Congress, co-chair’s of the Task Force the opportunity to reflect upon and add the issues raised from the first session to the Task Force Report, and provided the event attendees with an update on their progress overall, in anticipation of their final report scheduled to be published some time in April.

“Using Email Archives in Research”

The first guest presentation was given by Dr. James Baker (@j_w_baker), University of Sussex, who was inspired to write about the use of email archives within research by two key texts; Born-digital archives at the Wellcome Library: appraisal and sensitivity review of two hard drives (2016), an article by Victoria Sloyan, and Dust (2001) a book by Carolyn Steedman.

These texts led Dr. Baker to think of the “imagination of the archive” as he put it, the mystique of archival research, stemming from the imagery of  19th century research processes. He expanded on this idea, stating “physically and ontologically unique; the manuscript, is no longer what we imagine to be an archive”.

However, despite this new platform for research, Dr. Baker stated that very few people outside of archive professionals know that born-digital archives exist, yet alone use them. This is an issue, as archives require evidence of use, therefore, we need to encourage use.

To address this, Dr. Baker set up a Born-Digital Access Workshop, at the Wellcome Library in collaboration with their Collections Information Team, where he gathered people who use born-digital archives and the archivists who make them, and provided them with a set of 4 varying case-studies. These 4 case-studies were designed to explore the following:

A) the “original” environment; hard drive files in a Windows OS
B) the view experience; using the Wellcome’s Viewer
C) levels of curation; comparing reformatted and renamed collections with unaltered ones
D) the physical media; asking does the media hold value?

Several interesting observations came out of this workshop, which Dr. Baker organised in to three areas:

  1. Levels of description; filenames are important, and are valuable data in themselves to researchers. Users need a balance between curation and an authentic representation of the original order.
  2. “Bog-standard” laptop as access point; using modern technology that is already used by many researchers as the mode of access to email and digital archives creates a sense of familiarity when engaging with the content.
  3. Getting the researcher from desk to archive; there is a substantial amount of work needed to make the researcher aware of the resources available to them and how – can they remote access, how much collection level description is necessary?

Dr. Baker concluded that even with outreach and awareness events such as the one we were all attending, born-digital archives are not yet accessible to researchers, and this has made me realise the digital preservation community must push for access solutions,  and get these out to users, to enable researchers to gain the insights they might from our digital collections.

“Email as a Corporate Record”

The third presentation of the day was given by James Lappin (@JamesLappin), Loughborough University, who discussed the issues involved in applying archival policies to emails in a governmental context.

His main point concerned the routine deletion of email that happens in governments around the world. He said there are no civil servants email accounts scheduled to be saved past the next 3 – 4 years – but, they may be available via a different structure; a kind of records management system. However, Lappin pointed out the crux in this scenario: government departments have no budget to move and save many individuals email accounts, and no real idea of the numerics: how much to save, how much can be saved?

“email is the record of our age” – James Lappin

Lappin suggested an alternative: keep the emails of the senior staff only, however, this begs the questions, how do we filter out sensitive and personal content?

Lappin posits that auto-deletion is the solution, aiming to spare institutions from unmanageable volumes of email and the consequential breach of data protection.
Autodeletion encourages:

  •  governments to kickstart email preservation action,
  • the integration of tech for records management solutions,
  • actively considering the value of emails for long-term preservation

But how do we transfer emails to a EDRMS, what structures do we use, how do we separate individuals, how do we enforce the transfer of emails? These issues are to be worked out, and can be, Lappin argues, if we implement auto-deletion as tool to make email preservation less daunting , as at the end of the day, the current goal is to retain the “important” emails, which will make both government departments and historians happy, and in turn, this makes archivists happy. This does indeed seem like a positive scenario for us all!

However, it was particularly interesting when Lappin made his next point: what if the very nature of email, as intimate and immediate, makes governments uncomfortable with the idea of saving and preserving governmental correspondence? Therefore, governments must be more active in their selection processes, and save something, rather than nothing – which is where the implementation of auto-deletion, could, again, prove useful!

To conclude, Lappin presented a list of characteristics which could justify the preservation of an individuals government email accounts, which included:

  • The role they play is of historic interest
  • They expect their account to be permanently preserved
  • They are given the chance to flag or remove personal correspondence
  • Access to personal correspondence is prevented except in case of overriding legal need

I, personally, feel this fair and thorough, but only time will tell what route various governments take.

On a side note: Lappin runs an excellent comic-based blog on Records Management which you can see here.

Conclusions
One of the key issues that stood out for me today was, maybe surprisingly, not to do with the technology used in email preservation, but how to address the myriad issues email preservation brings to light, namely the feasibility of data protection, sensitivity review and appraisal, particularly prevalent when dealing in such vast quantities of material.

Email can only be preserved once we have defined what constitutes ’email’ and how to proceed ethically, morally and legally. Then, we can move forward with the implementation of the technical frameworks, which have been designed to meet our pre-defined requirements, that will enable access to historically valuable, and information rich, email archives, that will yield much in the name of research.

In the tweet below, Evil Archivist succinctly reminds us of the importance of maintaining and managing our digital records…

Email Preservation: How Hard Can It Be? 2 – DPC Briefing Day

On Wednesday 23rd of January I attended the Digital Preservation Coalition briefing day titled ‘Email Preservation: How Hard Can It Be? 2’ with my colleague Iram. As I attended the first briefing day back in July 2017 it was a great opportunity to see what advances and changes had been achieved. This blog post will briefly highlight what I found particularly thought provoking and focus on two of the talks about e-discovery from a lawyers view point.

The day began with an introduction by the co-chair of the report, Chris Prom (@chrisprom), informing us of the work that the task force had been doing. This was followed by a variety of talks about the use of email archives and some of the technologies used for the large scale processing  from the perspective of researchers and lawyers. The day was concluded with a panel discussion (for a twist, we the audience were the panel) about the pending report and the next steps.

Update on Task Force on Technical Approaches to Email Archives Report

Chris Prom told us how the report had taken on the comments from the previous briefing day and also from consultation with many other people and organisations. This led to clearer and more concise messages. The report itself does not aim to provide hard rules but to give an overview of the current situation and some recommendations that people or organisations involved with, interested in or are considering email preservation can consider.

Reconstruction of Narrative in e-Discovery Investigations and The Future of Email Archiving: Four Propositions

Simon Attfield (Middlesex university) and Larry Chapin (attorney) spoke about narrative and e-discovery. It was a fascinating insight into a lawyers requirements for use of email archives. Larry used the LIBOR scandal as an example of a project he worked on and the power of emails in bringing people to justice. E-discovery from his perspective was its importance to help create a narrative and tell a story, something at the moment a computer cannot do. Emails ‘capture the stuff of story making’ as they have the ability to reach into the crevasses of things and detail the small. He noted how emails contain slang and interestingly the language of intention and desire. These subtleties show the true meaning of what people are saying and that is important in the quest for the truth. Simon Attfield presented his research on the coding aspect to aid lawyers in assessing and sorting through these vast data sets. The work he described here was too technical for me to truly understand however it was clear that collaboration between archivist, users and the programmers/researchers will be vital for better preservation and use strategies.

Jason Baron (@JasonRBaron1) (attorney) gave a talk on the future of email archiving detailing four propositions.

Slide detailing the four propositions for the future of email archives. By Jason R Baron 2018

The general conclusions from this talk was that automation and technology will be playing an even bigger part in the future to help with acquisition, review (filtering out sensitive material) and searching (aiding access to larger collections). As one of the leads of the Capstone project, he told us how that particular project saves all emails for a short time and some forever, removing the misconceptions that all emails are going to be saved forever. Analysis of how successful Capstone has been in reducing signal to noise ratio (so only capturing email records of permanent value) will be important going forward.

The problem of scale, which permeates into most aspects of digital preservation, again arose here. For lawyers, they must review any and all information, which when looking at emails accounts can be colossal. The analogy that was given was of finding a needle in a haystack – lawyers need to find ALL the needles (100% recall).

Current predictive coding for discovery requires human assistance. Users have to tell the program whether the recommendations it produced were correct, the program will learn from this process and hopefully become more accurate. Whilst a program can efficiently and effectively sort personal information such as telephone numbers, date of birth etc it cannot currently sort out textual content that required prior knowledge and non-textual content such as images.

Panel Discussion and Future Direction

The final report is due to be published around May 2018. Email is a complex digital object and the solution to its preservation and archiving will be complex also.

The technical aspects of physically preserving emails are available but we still need to address the effective review and selection of the emails to be made available to the researcher. The tools currently available are not accurate enough for large scale processing, however, as artificial intelligence becomes better and more advanced, it appears this technology will be part of the solution.

Tim Gollins (@timgollins) gave a great overview of the current use of technology within this context, and stressed the point that the current technology is here to ASSIST humans. The tools for selection, appraisal and review need to be tailored for each process and quality test data is needed to train the programs effectively.

The non technical aspects further add to the complexity, and might be more difficult to address, as a community we need to find answers to:

  • Who’s email to capture (particularly interesting when an email account is linked to a position rather than a person)
  • How much to capture (entire accounts such as in the case of Capstone or allowing the user to choose what is worthy of preservation)
  • How to get persons of interest engaged (effectiveness of tools that aid the process e.g. drag and drop into record management systems or integrated preservation tools)
  • Legal implications
  • How to best present the emails for scholarly research (bespoke software such as ePADD or emulation tools that recreate the original environment or a system that a user is familiar with) 

Like most things in the digital sector, this is a fast moving area with ever changing technologies and trends. It might be frustrating there is no hard guidance on email preservation, when the Task Force on Technical Approaches to Email Archives report is published it will be an invaluable resource and a must read for anyone with an interest or actively involved in email preservation. The takeaway message was, and still is, that emails matter!    

What I Wish I Knew Before I Started – DPC Student Conference 2018

On January 24th, four Archives Assistants from Archives and Modern Manuscripts visited Senate House, London for the DPC Student Conference. With the 2018 theme being ‘What I Wish I Knew Before I Started’, it was an opportunity for digital archivists to pass on their wealth of knowledge in the field.

Getting started with digital preservation

The day started with a brief introduction to digital preservation by Sharon McMeekin from the Digital Preservation Coalition. This included an outline of the three basic models of digital preservation: OAIS, DCC lifecycle and the three-legged stool. (More information about these models can be found in the DPC handbook.) Aimed at beginners, this introduction was made accessible and easy to understand, whilst also giving us plenty to think about.

Next to take the stage was Steph Taylor, an Information Manager from CoSector, University of London. Steph is a huge advocate for the use of Twitter to find out the latest information and opinion in the world of digital preservation. As someone who has never had a Twitter account, it made me realise the importance of social media for staying up to date in such a fast-moving profession. Needless to say, I signed myself up to Twitter that evening to find out what I had been missing out on. (You can follow what was happening at the conference with the hashtag #dpc_wiwik.)

The final speaker before lunch was Matthew Addis, giving a technologist’s perspective. Matthew broke down the steps that you would need to take should you be faced with the potentially overwhelming job of starting from the beginning with a depository of digital material. He referenced a two-step approach – conceived by Tim Gollins – named ‘Parsimonious Preservation’, which involves firstly understanding what you have, and secondly keeping the bits safe. In the world of digital preservation, the worst thing you can do is do nothing, so by dealing with the simple and usually low-cost files first, you can protect the vast majority of the collection rather than going straight into the technical, time-consuming and costly minority of material. In the long run, the simple material that could have been dealt with initially may become technical and costly – due to software obsolescence, for instance.

That morning, the thought of tackling a simple digital preservation project would have seemed somewhat daunting. But Matthew illustrated the steps very clearly and as we broke for lunch I was left thinking that actually, with a little guidance, it probably wouldn’t be quite so bad.

Speakers on their experiences in the digital preservation field

During the afternoon, speakers gave presentations on their experiences in the digital preservation field. The speakers were Adrian Brown from the Parliamentary Archives, Glenn Cumiskey from the British Museum and Edith Halvarsson from the Bodleian Libraries. It was fascinating to learn how diverse the day-to-day working lives of digital archivists can be, and how often, as Glenn Cumiskey remarked, you may be the first digital archivist there has ever been within a given organisation, providing a unique opportunity for you to pave the way for its digital future.

Adrian Brown on his digital preservation experience at the Parliamentary Archive

The final speaker of the day, Dave Thomson, explained why it is up to students and new professionals to be ‘disruptive change agents’ and further illustrated the point that digital preservation is a relatively new field. We now have a chance to be the change and make digital preservation something that is at the forefront of business’s minds, helping them avoid the loss of important information due to complacency.

The conference closed with the speakers taking questions from attendees. There was lively discussion over whether postgraduate university courses in archiving and records management are teaching the skills needed for careers in digital preservation. It was decided that although some universities do teach this subject better than others, digital archivists have to make a commitment to life-long learning – not just one postgraduate course. This is a field where the technology and methods are constantly changing, so we need to be continuously developing our skills in accordance with these changes. The discussion certainly left me with lots to think about when considering postgraduate courses this year.

If you are new to the archiving field and want to gain an insight into digital preservation, I would highly recommend the annual conference. I left London with plenty of information, ideas and resources to further my knowledge of the subject, starting my commitment to life-long learning in the area of digital preservation!