Like other archives, the Bodleian Libraries has been searching for ways to optimize the conservation of our digital collections. The need to find a solution has become increasingly pressing as the Bodleian Electronic Archives and Manuscripts (BEAM), our digital repository service for the management of born-digital archives and manuscripts acquired by the Special Collections, now contains roughly 13TB worth of digital objects, with much more waiting in the wings.
In order to help us manage the ingest of digital objects within our collections, the Bodleian Libraries undertook an options review as part of its DPOC project. This lead to a decision to conduct a proof of concept of Archivematica. This proof of concept included the installation of a QA and DEV environment with the help of Artefactual followed by an extensive testing period and a gap analysis.
In November 2018 we started testing the system to establish whether or not Archivematica met our acceptance criteria. We mainly focussed on three areas:
- Overall performance/ functionality: Is the system user friendly? Can it successfully process all the different file types and sizes that we have in our collection?
- Metadata: Can Archivematica extract the metadata from the Excel sheets that we have created over time? What technical metadata does Archivematica automatically extract from ingested files?
- File extraction and normalization: Are disk images extracted properly? Is the content of a transfers normalized to the right file type?
Whilst testing, we also reached out to and visited other organisations that had already implemented Archivematica as well, including the International Institute of Social History in Amsterdam, the University of Edinburgh, the National Library of Wales and the Wellcome Trust.
Based on the outcomes of the tests we conducted, and the conversations we had with other institutions, we identified five gap areas:
- Performance: The Archivematica instance we configured for the Proof of Concept struggled with transfers over 200GB or transfers that contain over 5000+ files.
- Error reporting: It was often unclear what a particular error code and message meant. The error logs used by system administrators are also verbose, making it hard for them to pinpoint the error.
- Metadata: Here we identified two gaps. Firstly, there is the verbosity of the metadata. Because Archivematica records individual PREMIS events for each digital file, the resulting METS file becomes unwieldy, compromising the system’s performance. Secondly, we require a workflow to migrate our spreadsheet-held legacy pre-ingest capture metadata and file-level metadata into Archivematica, and to go on including this pre-ingest metadata, which will continue to be recorded in spreadsheet form for the foreseeable, in future ingests.
- User/ access management: Archivematica does not offer a way to manage access to collections or Archive Information Packages, and allows all users to alter the system work-flow. We are a multi-user organisation, and wish to have tighter controls on access to collections and workflow configurations.
- General reporting: Archivematica currently does not offer many reports to monitor progress, content and growth of collections.
Once we identified these gaps we had an intensive two day workshop with Artefactual to pinpoint possible solutions, which we subsequently presented to the wider Archivematica community during the Archivematica Camp in London in July 2019.
We will use all the input gathered from the proof of concept to inform our initial implementation of Archivematica, which will begin in January 2020. The project will focus on the performance and metadata gaps identified during the proof of concept, allowing us to bring Archivematica into production use 2021. We are keen to work with the Archivematica community, so do get in touch at firstname.lastname@example.org if you’re interested in finding out more about our work.