All posts by iramsafdar

Attending the ARA Annual Conference 2018

27 September 201821st century, Event, Modern, Uncategorized#Diversity #ARA, #SkillsForTheFutureiramsafdar

ARA Annual Conference 2018, Grand Central Hotel, Glasgow

Having been awarded the Diversity Bursary for BME individuals, sponsored by Kevin J Bolton Ltd., I was able to attend the ARA Annual Conference 2018 held in Glasgow in August.

Capitalising on the host city’s existing ubiquitous branding of People Make Glasgow, the Conference Committee set People Make Records as this year’s conference theme. This was then divided into three individual themes, one for each day of the conference:

People in Records
People Using Records
People Looking After Records

Examined through the lens of the above themes over the course of three days, this year’s conference addressed three keys areas within the sector: representation, diversity and engagement.

Following an introduction from Kevin Bolton (@kevjbolton), the conference kicked off with Professor Gus John (@Gus_John) delivering the opening keynote address, entitled “Choices of the Living and the Dead”. With People Make Records the theme for the day, Professor John gave a powerful talk discussing how people are impacting the records and recordkeeping of African (and other) diaspora in the UK, enabling the airbrushing of the history of oppressed communities. Professor John noted yes people make records, but we also determine what to record, and what to do with it once it has been recorded.

Noting the ignorance surrounding racial prejudice and violence, citing the Notting Hill race riots, the Windrush generation, and Stephen Lawrence as examples, Professor John illustrated how the commemoration of historical events is selective: while in 2018 the 50th anniversary of the Race Relations Act received much attention, in comparison the 500th anniversary of the start of the Transatlantic Slave Trade was largely ignored, by the sector and the media alike. This culture of oppression, and omission, he said, is leading to ignorance amongst young people about major defining events, contributing to a removal of context to historically oppressed groups.

In response to questions from the audience, Professor John noted that one of the problems facing the sector is the failure to interrogate the ‘business as usual’ climate, and that it may be ‘too difficult to consider what an alternative route might be’. Professor John challenged us to question the status quo: ‘Why is my curriculum white? Why isn’t my lecturer black? What does “de-colonising” the curriculum mean? This is what we must ask ourselves’.

Following Professor John’s keynote and his ultimate call to action, there was a palpable atmosphere of engagement amongst the delegates, with myself and those around me eager to spend the next three days learning from the experiences of others, listening to new perspectives and extracting guidance on the actions we may take to develop and improve our sector, in terms of representation, diversity and engagement.

Various issues relating to these areas were threaded throughout many of the presentations, and as a person of colour at the start of my career in this sector, and recipient of the Diversity Bursary, I was excited to hear more about the challenges facing marginalised communities in archives and records, including some I could relate to on a personal and professional level, and, hopefully, also take away some proposed solutions and recommendations.

I attended an excellent talk by Adele Patrick (@AdelePAtrickGWL), of Glasgow Women’s Library, who discussed the place for feminism within the archive, noting GWL’s history in resistance, and insistence on a plural representation, when women’s work, past and present, is eclipsed. Dr Alan Butler (@AButlerArchive), Coordinator at Plymouth LGBT Community Archive, discussed his experiences of trying to create a sense of community within a group that is inherently quite nebulous. Nevertheless, Butler illustrated the importance of capturing LGBTQIA+ history, as people today are increasingly removed from the struggles that previous generations have had to overcome, echoing a similar point Professor Gus John made earlier.

A presentation which particularly resonated with me came from Kirsty Fife (@DIYarchivist) and Hannah Henthorn (@hanarchovist), on the issue of diversity in the workforce. Fife and Henthorn presented the findings from their research, including their survey of experiences of marginalisation in the UK archive sector, highlighting the structural barriers to diversifying the archive sector workforce. Fife and Henthorn identified several key themes which are experienced by marginalised communities in the sector, including: the feeling of isolation and otherness in both workplace and universities; difficulties in gaining qualifications, perhaps due to ill health/disability/financial barriers/other commitments; feeling unsafe and under confident in professional spaces and a frustration at the lack of diversity in leadership roles.

As a Graduate Trainee Digital Archivist, I couldn’t abandon my own focus on digital preservation and digital archiving, and as such attended various digital-related talks, including “Machines make records: the future of archival processing” by Jenny Bunn (@JnyBn1), discussing the impact of taking a computational approach to archival processing, “Using digital preservation and access to build a sustainable future for your archive” led by Ann Keen of Preservica, with presentations given by various Preservica users, as well as a mini-workshop led by Sarah Higgins and William Kilbride, on ethics in digital preservation, asking us to consider if we need our own code of conduct in digital preservation, and what this could look like.

Image of William Kilbride and Sarah Higgins running their workshop "Encoding ethics: professional practice digital preservation", ARA Annual Conference 2018, Glasgow

William Kilbride and Sarah Higgins running their workshop “Encoding ethics: professional practice digital preservation”, ARA Annual Conference 2018, Glasgow

I have only been able to touch on a very small amount of what I heard and learnt at the many and varied talks, presentations and workshops at the ARA conference, however, one thing I took away from the conference was the realisation that archivists and recordkeepers have the power to challenge structural inequalities, and must act now, in order to become truly inclusive. As Michelle Caswell (@professorcaz), 2nd keynote speaker said, we must act with sensitivity, acknowledge our privileges and, above all empower not marginalise. This conference felt like a call to action to the archive and recordkeeping community, in order to include the ‘hard to reach’ communities, or alternatively as Adele Patrick noted, the ‘easy to ignore’. As William Kilbride (@William Kilbride) said, this is an exciting time to be in archives.

I want to thank Kevin Bolton for sponsoring the Diversity Bursary, which enabled me to attend an enriching, engaging and informative event, which otherwise would have been inaccessible for me.

________________________________________
Because every day is a school day, as homework for us all, I made a note of some of the recommendations made by speakers throughout the conference, compiled into this very brief list which I thought I would share:

Reading list

Keys, David. (2018) Details of horrific first voyages in transatlantic slave trade revealed. 17 August 2018. The Independent.
Cobain, Ian. (2016) The History Thieves: Secrets, Lies and the Shaping of a Modern Nation. ISBN: 9781846275852
Eichhorn, Kate. (2013) The Archival Turn in Feminism: Outrage in Order. ISBN-13: 978-1439909515
McIntosh, Peggy. (1988) White Privilege: Unpacking the Invisible Knapsack. Essay excerpt from ”White Privilege and Male Privilege: A Personal Account of Coming To See Correspondences through Work in Women’s Studies” Working Paper 189.
Greene, M. A. & Meissner, D. (2005) More Product, Less Process: Revamping Traditional Archival Processing. The American Archivist: Fall/Winter 2005, Vol. 68, No. 2, pp. 208-263.
Fife, K. & Henthorn, H. (2017) Decentring Qualification: A Radical Examination of Archival Employment Possibilities. Radical Voices Conference, 3 March 2017.
Brilmyer, Gracen. (2016) Identifying & Dismantling White Supremacy in Archives. Poster produced in Michelle Caswell’s Archives, Records and Memory class, Fall 2016, UCLA.

The UK Web Archive: Mental Health, Social Media and the Internet Collection

9 March 201821st century, Modern, Web Archives#explorearchives, #SkillsForTheFuture, Archives & Modern Manuscripts, digital preservation, Web Archivesiramsafdar

The UK Web Archive hosts several Special Collections, curating material related to a particular theme or subject. One such collection is on Mental Health, Social Media and the Internet.

Since the advent of Web 2.0, people have been using the Internet as a platform to engage and connect, amongst other things, resulting in new forms of communication, and consequently new environments to adapt to – such as social media networks. This collection aims to illustrate how this has affected the UK, in terms of the impact on mental health. This collection will reflect the current attitudes displayed online within the UK towards mental health, and how the Internet and social media are being used in contemporary society.

We began curating material in June 2017, archiving various types of web content, including: research, news pieces, UK based social media initiatives and campaigns, charities and organisations’ websites, blogs and forums.

Material is being collected around several themes, including:

Body Image
Over the past few years, there has been a move towards using social media to discuss body image and mental health. This part of the collection curates material relating to how the Internet and social media affect mental health issues relating to body image. This includes research about developing theory in this area, news articles on various individuals experiences, as well as various material posted on social media accounts discussing this theme.

Cyber-bullying
This theme curates material, such as charities and organisations’ websites and social media accounts, which discuss, raise awareness and tackle this issue. Furthermore, material which examines the impact of social media and Internet use on bullying such as news articles, social media campaigns and blog posts, as well as online resources created to aid with this issue, such as guides and advice, are also collected.

Addiction

This theme collects material around gaming and other Internet-based activities that may become addictive such as social media, pornography and gambling. It includes recent UK based research, studies and online polls, social media campaigns, online resources, blogs and news articles from individuals and organisations. Discourse, discussions, opinion and actions regarding different aspects of Internet addition are all captured and collected in this overarching catchment term of addiction, including social media addiction.

The Mental Health, Social Media and the Internet Special Collection, is available via the new UK Web Archive Beta Interface!

Co authored with Carl Cooper

DPC Email Preservation: How Hard Can It Be? Part 2

30 January 201821st century, Digital archives, Event, Event, Modern, Uncategorized#SkillsForTheFutureiramsafdar

Source: https://lu2cspjiis-flywheel.netdna-ssl.com/wp-content/uploads/2015/09/email-marketing.jpg

In July last year my colleague Miten and I attended a DPC Briefing Day titled Email Preservation: How Hard Can It Be? which introduced me to the work of the Task Force on Technical Approaches to Email Archives and we were lucky enough to attend the second session last week.

Arranging a second session gave Chris Prom (@chrisprom), University of Illinois at Urbana-Champaign and Kate Murray (@fileformatology), Library of Congress, co-chair’s of the Task Force the opportunity to reflect upon and add the issues raised from the first session to the Task Force Report, and provided the event attendees with an update on their progress overall, in anticipation of their final report scheduled to be published some time in April.

“Using Email Archives in Research”

The first guest presentation was given by Dr. James Baker (@j_w_baker), University of Sussex, who was inspired to write about the use of email archives within research by two key texts; Born-digital archives at the Wellcome Library: appraisal and sensitivity review of two hard drives (2016), an article by Victoria Sloyan, and Dust (2001) a book by Carolyn Steedman.

These texts led Dr. Baker to think of the “imagination of the archive” as he put it, the mystique of archival research, stemming from the imagery of 19th century research processes. He expanded on this idea, stating “physically and ontologically unique; the manuscript, is no longer what we imagine to be an archive”.

However, despite this new platform for research, Dr. Baker stated that very few people outside of archive professionals know that born-digital archives exist, yet alone use them. This is an issue, as archives require evidence of use, therefore, we need to encourage use.

To address this, Dr. Baker set up a Born-Digital Access Workshop, at the Wellcome Library in collaboration with their Collections Information Team, where he gathered people who use born-digital archives and the archivists who make them, and provided them with a set of 4 varying case-studies. These 4 case-studies were designed to explore the following:

A) the “original” environment; hard drive files in a Windows OS
B) the view experience; using the Wellcome’s Viewer
C) levels of curation; comparing reformatted and renamed collections with unaltered ones
D) the physical media; asking does the media hold value?

Several interesting observations came out of this workshop, which Dr. Baker organised in to three areas:

Levels of description; filenames are important, and are valuable data in themselves to researchers. Users need a balance between curation and an authentic representation of the original order.
“Bog-standard” laptop as access point; using modern technology that is already used by many researchers as the mode of access to email and digital archives creates a sense of familiarity when engaging with the content.
Getting the researcher from desk to archive; there is a substantial amount of work needed to make the researcher aware of the resources available to them and how – can they remote access, how much collection level description is necessary?

Dr. Baker concluded that even with outreach and awareness events such as the one we were all attending, born-digital archives are not yet accessible to researchers, and this has made me realise the digital preservation community must push for access solutions, and get these out to users, to enable researchers to gain the insights they might from our digital collections.

“Email as a Corporate Record”

The third presentation of the day was given by James Lappin (@JamesLappin), Loughborough University, who discussed the issues involved in applying archival policies to emails in a governmental context.

His main point concerned the routine deletion of email that happens in governments around the world. He said there are no civil servants email accounts scheduled to be saved past the next 3 – 4 years – but, they may be available via a different structure; a kind of records management system. However, Lappin pointed out the crux in this scenario: government departments have no budget to move and save many individuals email accounts, and no real idea of the numerics: how much to save, how much can be saved?

“email is the record of our age” – James Lappin

Lappin suggested an alternative: keep the emails of the senior staff only, however, this begs the questions, how do we filter out sensitive and personal content?

Lappin posits that auto-deletion is the solution, aiming to spare institutions from unmanageable volumes of email and the consequential breach of data protection.
Autodeletion encourages:

governments to kickstart email preservation action,
the integration of tech for records management solutions,
actively considering the value of emails for long-term preservation

But how do we transfer emails to a EDRMS, what structures do we use, how do we separate individuals, how do we enforce the transfer of emails? These issues are to be worked out, and can be, Lappin argues, if we implement auto-deletion as tool to make email preservation less daunting , as at the end of the day, the current goal is to retain the “important” emails, which will make both government departments and historians happy, and in turn, this makes archivists happy. This does indeed seem like a positive scenario for us all!

However, it was particularly interesting when Lappin made his next point: what if the very nature of email, as intimate and immediate, makes governments uncomfortable with the idea of saving and preserving governmental correspondence? Therefore, governments must be more active in their selection processes, and save something, rather than nothing – which is where the implementation of auto-deletion, could, again, prove useful!

To conclude, Lappin presented a list of characteristics which could justify the preservation of an individuals government email accounts, which included:

The role they play is of historic interest
They expect their account to be permanently preserved
They are given the chance to flag or remove personal correspondence
Access to personal correspondence is prevented except in case of overriding legal need

I, personally, feel this fair and thorough, but only time will tell what route various governments take.

On a side note: Lappin runs an excellent comic-based blog on Records Management which you can see here.

Conclusions
One of the key issues that stood out for me today was, maybe surprisingly, not to do with the technology used in email preservation, but how to address the myriad issues email preservation brings to light, namely the feasibility of data protection, sensitivity review and appraisal, particularly prevalent when dealing in such vast quantities of material.

Email can only be preserved once we have defined what constitutes ’email’ and how to proceed ethically, morally and legally. Then, we can move forward with the implementation of the technical frameworks, which have been designed to meet our pre-defined requirements, that will enable access to historically valuable, and information rich, email archives, that will yield much in the name of research.

In the tweet below, Evil Archivist succinctly reminds us of the importance of maintaining and managing our digital records…

I didn't realize it was Electronic Records Day. I must have deleted the email. #HailDarkivists #archives #libraries

— Evil Archivist (@EvilArchivist) October 11, 2017

PASIG 2017: “Sharing my loss to protect your data” University of the Balearic Islands

27 September 201721st century, Digital archives, Digitisation, Event, Uncategorized#SkillsForTheFutureiramsafdar

Eduardo del Valle…. we will all be telling your tale of loss for years to come #pasig17 pic.twitter.com/xgCLxiVDeG

— somaya langley (@criticalsenses) September 12, 2017

Last week I was lucky enough to be able to attend the PASIG 2017 (Preservation and Archiving Special Interest Group) conference, held at the Oxford University Museum of Natural History, where over the course of three days the digital preservation community connected to share, experiences, tools, successes and mishaps.

The story of one such mishap came from Eduardo del Valle, Head of the Digitization and Open Access Unit at the University of the Balearic Islands (UIB), in his presentation titled “Sharing my loss to protect your data: A story of unexpected data loss and how to do real preservation”. In 2013 the digitisation and digital preservation workflow pictured below was set up by the IT team at UIB.

2013 Digitisation and Digital Preservation Workflow (Eduardo del Valle, 2017)

Del Valle was told this was a reliable system, with fast retrieval. However, he found this was not the case, with slow retrieval and the only means of organisation consisting of an excel spreadsheet used to contain the storage locations of the data.

In order to assess their situation they used the NDSA Levels of Digital Preservation, a tiered set of recommendations on how organisations should build their digital preservation activities, developed by the National Digital Stewardship Alliance (NDSA) in 2012. The guidelines are organised into five functional areas that lie at the centre of digital preservation:

Storage and geographic location
File fixity and data integrity
Information security
Metadata
File formats

These five areas then have four columns (Levels 1-4) which set tiered recommendations of action, from Level 1 being the least an organisation should do, to Level 4 being the most an organisation can do. You can read the original paper on the NDSA Levels here.

The slide below shows the extent to which the University met the NDSA Levels. They found there was an urgent need for improvement.

NDSA Levels of Preservation UIB compliance (Eduardo del Valle, 2017)

“Anything that can go wrong, will go wrong” – Eduardo del Valle

In 2014 the IT team decided to implement a new back up system. While the installation and configuration of the new backup system (B) was completed, the old system (A) remained operative.

On the 14th and 15th November 2014, a backup was created for the digital material generated during the digitisation of 9 rare books from the 14th century in the Tape Backup System (A) and notably, two confirmation emails were received, verifying the success of the backup. By October 2015, all digital data had been migrated from System (A) to the new System (B), spanning UIB projects from 2008-2014.

However, on 4th November 2014, a loss of data was detected…

The files corresponding to the 9 digitised rare books were lost. This loss was detected a year after the initial back up of the 9 books in System A, and therefore the contract for technical assistance had finished. This meant there was no possibility of obtaining financial compensation, if the loss was due to a hardware or software problem. The loss of these files, unofficially dubbed “the X-files”, meant the loss of three months of work and it’s corresponding economic loss. Furthermore, the rare books were in poor condition, and to digitise them again could cause serious damage. Despite a number of theories, the University is yet to receive an explanation for the loss of data.

The digitised 14th century rare book from UIB collection (Eduardo del Valle, 2017)

To combat issues like this, and to enforce best practice in their digital preservation efforts, the University acquired Libsafe, a digital preservation solution offered by Libnova. Libsafe is OAIS and ISO 14.721:2012 compliant, and encompasses advanced metadata management with a built-in ISAD(g) filter, with the possibility to import any custom metadata schema. Furthermore, Libsafe offers fast delivery, format control, storage of two copies in disparate locations, and a built-in catalogue. With the implementation of a standards compliant workflow, the UIB proceeded to meet all four levels of the 5 areas of the NDSA Levels of Digital Preservation.

The ISO 14.721:2012 Space Data and Information Transfer Systems – Open Archival Information System – Reference Model (OAIS) provides a framework for implementing the archival concepts needed for long-term digital preservation and access, and for describing and comparing architectures and operations of existing and future archives, as well as describing roles, processes and methods for long-term preservation.

The use of these standards facilitates the easy access, discovery and sharing of digital material, as well as their long-term preservation. Del Valle’s story of data loss reminds us of the importance of implementing standards-based practices in our own institutions, to minimise risk and maximise interoperability and access, in order to undertake true digital preservation.

With thanks to Eduardo del Valle, University of the Balearic Islands.

PDF/A: Challenges Meeting the ISO 19005 Standard

14 August 201721st century, Digital archives, Event, Modern#SkillsForTheFuture, Archives & Modern Manuscriptsiramsafdar

Anna Oates (MSLIS Candidate, University of Illinois at Urbana-Champaign and NDNP Coordinator Graduate Assistant, Preservation Services) explaining the differences between PDF and PDF/A

We were excited to attend the recent project presentation entitled: ‘A Case Study on Theses in Oxford’s Institutional Repository: Challenges Meeting the ISO 19005 Standard’ given by Anna Oates, a student involved in the Oxford-Illinois Digital Libraries Placement Programme.

The presentation focused initially on the PDF/A format: PDF/A differs from standard PDF in that it avoids common long term access issues associated with PDF. For example, a PDF created today may look and behave differently in 50 years time. This is because many visual aspects of the PDF are not saved into the file itself, (for example, PDFs may use font linking instead of font embedding) the standardised PDF/A format attempts to remedy this by embedding metadata within the file and restricting certain aspects commonly found in PDF which could inhibit long term preservation.

Aspects excluded from PDF/A include :

Audio and video content
JavaScript executable files
All forms of PDF encryption

PDF/A is better suited therefore for the long term preservation of digital material as it maintains the integrity of the information included in the source files, be this textual or visual. Oates described PDF/A as having multiple ‘flavours’, PDF/A-1 published in 2005 including conformance level A (Accessible – maintains the structure of the file) and B (Basic – maintains the visual aspects only). Versions 2 and 3 published later in 2011 and 2012, were developed to encompass conformance level U (Unicode – enabling the embedding of Unicode information) alongside other features such as JPEG 2000 compression and the embedding of arbitrary file formats within PDF/A documents.

Oates specified that different types of documents benefited from different ‘flavours’ of PDF/A, for example, digitised documents were better suited to conformance level B whereas born digital documents were better suited to level A.

Whilst specifying the benefits of PDF/A, Oates also highlighted the myriad of issues associated with the format. Firstly, while experimenting with creating and conforming PDF/A documents, she noted the conformed documents had slight differences, such as changes to the colour pixels of embedded image files (PDF/A format showed less difference in the colour of pixels with programs like PDF Studio), this showcased a clear alteration of the authenticity of the original source file.

Oates compared source images to PDF/A converted images and found obvious visual differences.

Secondly, Oates noted that when converting files from PDF to PDF/A-1b, smart software would change the decode filter of the image (e.g. changing from JPXDecode used for JPEG2000 to DCTDecode accepted by ISO 19005) in order to ensure it would conform to ISO 19005. However, she noted that despite the positives of avoiding non-conformance the software had increased the file size of the PDF by 65%. The file size increase poses obvious issues in regards to storage and cost considerations for organisations using PDF/A.

Oates’ workflow for creation and conformance checking of PDF/A files using different PDF/A software

Format uptake was also discussed by Oates. She found that PDF/A had not been widely utilised by Universities for long term preservation of dissertations and thesis in the UK. However, Oates provided examples of users of PDF/A for Electronic Theses and Dissertations Repositories that included: Concordia University, Johns Hopkins University, McGill University, Rutgers University, University of Alberta, University of Oulu and Virginia Tech. Alongside this it was mentioned that uptake amongst Research and Cultural Heritage Institutions included: the Archaeology Data Service (ADS), British Library, California Digital Library, Data Archiving and Networked Services (DANS), the Library of Congress and the U.S. National Archives and Records Administration (NARA).

“Adobe Preflight has failed to recognize most of the glyph errors. As such, veraPDF will remain our final tool for validation.” (Anna Oates)

Oates therefore concluded that PDF/A was not the best solution to PDF preservation, she mentioned that the new ISO standard would cause new issues and considerations for PDF/A users.

Following the presentation the audience debated whether PDF/A should still be used. Some considered whether other solutions existed to PDF preservation; an example of a proposed solution was to keep both PDF/A and the original PDFs. However, many still felt that PDF/A provided the best solution available despite its various drawbacks.

Hopefully Oates’ findings will highlight the various areas needed for improvement in both PDF/A conversion/ validation software and conformance aspects of the ISO 19005 Standard used by PDF/A to ensure it is up to the task of digital preservation.

To learn more about PDF/A have a look at PDF Association’s own e-book PDF/A In a Nutshell.

Alice, Ben and Iram (Trainee Digital Archivists)

Email Preservation: How Hard Can it Be? DPC Briefing Day

19 July 201721st century, Digital archives, Event, Modern#SkillsForTheFuture, Archives & Modern Manuscripts, emailiramsafdar

Miten and I outside the National Archives, looking forward to a day of learning and networking

Last week I had the pleasure of attending a Digital Preservation Coalition (DPC) Briefing Day titled Email Preservation: How Hard Can it Be?

In 2016 the DPC, in partnership with the Andrew W. Mellon Foundation, announced the formation of the Task Force on Technical Approaches to Email Archives to address the challenges presented by email as a critical historical source. The Task Force delineated three core aims:

Articulating the technical framework of email
Suggesting how tools fit within this framework
Beginning to identify missing elements.

The aim of the briefing day was two-fold; to introduce and review the work of the task force thus far in identifying emerging technical frameworks for email management, preservation and access; and to discuss more broadly the technical underpinnings of email preservation and the associated challenges, utilising a series of case studies to illustrate good practice frameworks.

The day started with an introductory talk from Kate Murray (Library of Congress) and Chris Prom (University of Illinois Urbana-Champaign), who explained the goals of the task force in the context of emails as cultural documents, which are worthy of preservation. They noted that email is a habitat where we live a large portion of our lives, encompassing both work and personal. Furthermore, when looking at the terminology, they acknowledged email is an object, several objects and a verb – and it’s multi-faceted nature all adds to the complexity of preserving email. Ultimately, it was said email is a transactional process whereby a sender transmits a message to a recipient, and from a technical perspective, a protocol that defines a series of commands and responses that operate in a manner like a computer programming language and which permits email processes to occur.

From this standpoint, several challenges of email preservation were highlighted:

Capture: building trust with donors, aggregating data, creating workflows and using tools
Ensuring authenticity: ensuring no part of the email (envelope, header, and message data etc.) have been tampered with
Working at scale: email
Addressing security concerns: malicious content leading to vulnerability, confidentiality issues
Messages and formats
Preserving attachments and linked/networked documents: can these be saved and do we have the resources?
Tool interoperability

The first case study of the day was presented by Jonathan Pledge from the British Library on “Collecting Email Archives”, who explained born-digital research began at the British Library in 2000, and many of their born-digital archives contain email. The presentation was particularly interesting as it included their workflow for forensic capture, processing and delivery of email for preservation, providing a current and real life insight into how email archives are being handled. The British Library use Aid4Mail Forensic for their processing and delivery, however, are looking into ePADD as a more holistic approach. ePADD is a software package developed by Standford University which supports archival processes around the appraisal, ingest, processing, discovery and delivery of email archives. Some of the challenges they experienced surrounded the issue of email as often containing personal information. A possible solution would be the redaction of offending material, however they noted this could lead to the loss of meaning, as well as being an extremely time-consuming process.

Next we heard from Anthea Seles (The National Archives) and Greg Falconer (UK Government Cabinet Office) who spoke about email and the record of government. Their presentation focused on the question of where the challenge truly lies for email – suggesting that, opposed to issues of preservation, the challenge lies in capture and presentation. They noted that when coming from a government or institutional perspective, the amount of email created increases hugely, leaving large collections of unstructured records. In terms of capture, this leads to the challenge of identifying what is of value and what is sensitive. Following this, the major challenge is how to best present emails to users – discoverability and accessibility. This includes issues of remapping existing relationships between unstructured records, and again, the issue of how to deal with linked and networked content.

The third and final case study was given by Michael Hope, from Preservica; an “Active Preservation” technology, providing a suite of (Open Archival Information System) compliant workflows for ingest, data management, storage, access, administration and preservation for digital archives.

Following the case studies, there was a second talk from Kate Murray and Chris Prom on emerging Email Task Force themes and their Technology Roadmap. In June 2017 the task force released a Consultation Report Draft of their findings so far, to enable review, discussion and feedback, and the remainder of their presentation focused on the contents and gaps of the draft report. They talked about three possible preservation approaches:

Format Migration: copying data from one type of format to another to ensure continued access
Emulation: recreating user experience for both message and attachments in the original context
Bit Level Preservation: preservation of the file, as it was submitted (may be appropriate for closed collections)

They noted that there are many tools within the cultural heritage domain designed for interoperability, scalability, preservation and access in mind, yet these are still developing and improving. Finally, we discussed what the possible gaps of the draft report, and issues such as the authenticity of email collections were raised, as well as a general interest in the differing workflows between institutions. Ultimately, I had a great time at The National Archives for the Email Preservation: How Hard Can it Be? Briefing Day – I learnt a lot about the various challenges of email preservation, and am looking forward to seeing further developments and solutions in the near future.

Archives and Manuscripts at the Bodleian Library

A Bodleian Libraries blog