Category Archives: Digital archives

UK Web Archive mini-conference 2020

On Wednesday 19th November I attended the UK Web Archive (UKWA) mini-conference 2020, my first conference as a Graduate Trainee Digital Archivist. It was hosted by Jason Webber, Engagement Manager at the UKWA and, as normal in these COVID times, it was hosted on Zoom (my first ever Zoom experience!)

The conference started with an introduction and demonstration of the UKWA by Jason Webber. Starting in 2005 the UKWA’s mission is to collect the entire UK webspace, at least once per year, and preserve the websites for future generations. As part of my traineeship I have used the UKWA but it was interesting to hear about the other functions and collections it provides. Along with being able to browse different versions of UK websites it also includes over 100 curated collections on themes ranging from Food to Brexit to Online Enthusiast Communities in the UK. It also features the SHINE tool, which was developed as part of the ‘Big UK Data Arts and Humanities’ project and contains over 3.5 billion items which have been full-text indexed so that every word is searchable. It allows users to perform searches and trend analysis on subjects over a huge range of websites, all you need to use this tool is a bit a Python knowledge. My Python knowledge is a bit basic but Caio Mello, during his researcher talk, provided a useful link for online python tutorials aimed at historians to aid in their research.

In his talk, Caio Mello (School of Advanced Study, University of London) discussed how he used the SHINE tool as part of his work for the CLEOPATRA Project. He was specifically looking at the Olympic legacy of the 2012 Olympics, how it was defined and how the view of the legacy changed over time. He explained the process he used to extract the information and the ways the information can be used for analysis, visualisation and context. My background is in mathematics and the concept of ‘Big Data’ came up frequently during my studies so it was fascinating to see how it can be used in a research project and how the UKWA is enabling research to be conducted over such a wide range of subjects.

The next researcher talk by Liam Markey (University of Liverpool and the British Library) showed a different approach to using the UKWA for his research project into how Remembrance in 20th Century Britain has changed. He explained how he conducted an analysis of archived newspaper articles, using specific search terms, to identify articles that focused on commemoration which he could then use to examine how the attitudes changed over time. The UKWA enabled him to find websites that focused on the war and compare these with mainstream newspapers to see how these differ.

The Keynote speaker was Paul Gooding (University of Glasgow) and was about the use and users of Non-Print Legal Deposit Libraries. His research as part of the Digital Library Futures Project, with the Bodleian Libraries and Cambridge University Library as case study partners, looked at how Academic Deposit libraries were impacted by e-Legal Deposit. It was an interesting discussion around some of the issues of the system, such as balancing the commercial rights with access for users and how highly restrictive access conditions are at odds with more recent legislation, such as the provision for disabled users and 2014 copyright exception for data and text mining for non-commercial uses.

Being new to the digital archiving world, my first conference was a great introduction to web archiving and provided context to the work I am doing. Thank you to the organisers and speakers for giving me insight into a few of the different ways the web archive is used and I have come away with a greater understanding of the scope and importance of digital archiving (as well as a list of blog posts and tutorials to delve into!)

Some Useful Links:

https://www.webarchive.org.uk/

https://programminghistorian.org/

https://blogs.bl.uk/webarchive/2020/11/how-remembrance-day-has-changed.html

http://cleopatra-project.eu/

 

#WeMissiPRES: Preserving social media and boiling 1.04 x 10^16 kettles

This year the annual iPRES digital preservation conference was understandably postponed and in its place the community hosted a 3-day Zoom conference called #WeMissiPRES. As two of the Bodleian Libraries’ Graduate Trainee Digital Archivists, Simon and I were in attendance and blogged about our experiences. This post contains some of my highlights.

The conference kicked off with a keynote by Geert Lovink. Geert is the founding director of the Institute of Network Cultures and the author of several books on critical Internet studies. His talk was wide-ranging and covered topics from the rise of so-called ‘Zoom fatigue’ (I guarantee you know this feeling by now) to how social media platforms affect all aspects of contemporary life, often in negative ways. Geert highlighted the importance of preserving social media in order to allow future generations to be able to understand the present historical moment. However, this is a complicated area of digital preservation because archiving social media presents a host of ethical and technical challenges. For instance, how do we accurately capture the experience of using social media when the content displayed to you is largely dictated by an algorithm that is not made public for us to replicate?

After the keynote I attended a series of talks about the ARCHIVER project. João Fernandes from CERN explained that the goal of this project is to improve archiving and digital preservation services for scientific and research data. Preservation solutions for this type of data need to be cost-effective, scalable, and capable of ingesting amounts of data within the petabyte range. There were several further talks from companies who are submitting to the design phase of this project, including Matthew Addis from Arkivum. Matthew’s talk focused on the ways that digital preservation can be conducted on the industrial scale required to meet the brief and explained that Arkivum is collaborating with Google to achieve this, because Google’s cloud infrastructure can be leveraged for petabyte-scale storage. He also noted that while the marriage of preserved content with robust metadata is important in any digital preservation context, it is essential for repositories dealing with very complex scientific data.

In the afternoon I attended a range of talks that addressed new standards and technologies in digital preservation. Linas Cepinskas (Data Archiving and Networked Services (DANS)) spoke about a self-assessment tool for the FAIR principles, which is designed to assess whether data is Findable, Accessible, Interoperable and Reusable. Later, Barbara Sierman (DigitalPreservation.nl) and Ingrid Dillo (DANS) spoke about TRUST, a new set of guiding principles that are designed to map well with FAIR and assess the reliability of data repositories. Antonio Guillermo Martinez (LIBNOVA) gave a talk about his research into Artificial Intelligence and machine learning applied to digital preservation. Through case studies, he identified that AI is especially good at tasks such as anomaly detection and automatic metadata generation. However, he found that regardless of how well the AI performs, it needs to generate better explanations for its decisions, because it’s hard for human beings to build trust in automated decisions that we find opaque.

Paul Stokes from Jisc3C gave a talk on calculating the carbon costs of digital curation and unfortunately concluded that not much research has been done in this area. The need to improve the environmental sustainability of all human activity could not be more pressing and digital preservation is no exception, as approximately 3% of the world’s electricity is used by data centres. Paul also offered the statistic that enough power is consumed by data centres worldwide to boil 10,400,000,000,000,000 kettles – which is the most important digital preservation metric I can think of.

This conference was challenging and eye-opening because it gave me an insight into (complicated!) areas of digital preservation that I was not familiar with, particularly surrounding the challenges of preserving large quantities of scientific and research data. I’m very grateful to the speakers for sharing their research and to the organisers, who did a fantastic job of bringing the community together to bridge the gap between 2019 and 2021!

#WeMissiPRES: A Bridge from 2019 to 2021

Every year, the international digital preservation community meets for the iPRES conference, an opportunity for practitioners to exchange knowledge and showcase the latest developments in the field. With the 2020 conference unable to take place due to the global pandemic, digital preservation professionals instead gathered online for #WeMissiPRES to ensure that the global community remained connected. Our graduate trainee digital archivist Simon Mackley attended the first day of the event; in this blog post he reflects on some of the highlights of the talks and what they tell us about the state of the field.

How do you keep the global digital preservation community connected when international conferences are not possible? This was the challenge faced by the organisers of #WeMissIPres, a three-day online event hosted by the Digital Preservation Coalition. Conceived as a festival of digital preservation, the aim was not to try and replicate the regular iPRES conference in an online format, but instead to serve as a bridge for the digital preservation community, connecting the efforts of 2019 with the plans for 2021.

As might be expected, the impact of the pandemic loomed large in many of the talks. Caylin Smith (Cambridge University Library) and Sara Day Thomson (University of Edinburgh) for instance gave a fascinating paper on the challenge of rapidly collecting institutional responses to coronavirus, focusing on the development of new workflows and streamlined processes. The difficulties of working from home, the requirements of remote access to resources, and the need to move training online likewise proved to be recurrent themes throughout the day. As someone whose own experience of digital preservation has been heavily shaped by the pandemic (I began my traineeship at the start of lockdown!) it was really useful to hear how colleagues in other institutions have risen to these challenges.

I was also struck by the different ways in which responses to the crisis have strengthened digital preservation efforts. Lynn Bruce and Eve Wright (National Records of Scotland) noted for instance that the experience of the pandemic has led to increased appreciation of the value of web-archiving from stakeholders, as the need to capture rapidly-changing content has become more apparent. Similarly, Natalie Harrower (Digital Repository of Ireland) made the excellent point that the crisis had not only highlighted the urgent need for the sharing of medical research data, but also the need to preserve it: Coronavirus data may one day prove essential to fighting a future pandemic, and so there is therefore a moral imperative for us to ensure that it is preserved.

As our keynote speaker Geert Lovink (Institute of Network Cultures) reminded us, the events of the past year have been momentous quite apart from the pandemic, with issues such as the distorting impacts of social media on society, the climate emergency, and global demands for racial justice all having risen to the forefront of society. It was great therefore to see the role of digital preservation in these challenges being addressed in many of the panel sessions. A personal highlight for me was the presentation by Daniel Steinmeier (KB National Library of the Netherlands) on diversity and digital preservation. Steinmeier stressed that in order for diversity efforts to be successful, institutions needed to commit to continuing programmes of inclusion rather than one-off actions, with the communities concerned actively included in the archiving process.

So what challenges can we expect from the year ahead? Perhaps more than ever, this year this has been a difficult question to answer. Nonetheless, a key theme that struck me from many of the discussions was that the growing challenge of archiving social media platforms was matched only by the increasing need to preserve the content hosted on them. As Zefi Kavvadia (International Institute of Social History) noted, many social media platforms actively resist archiving; even when preservation is possible, curators are faced with a dilemma between capturing user experiences and capturing platform data. Navigating this challenge will surely be a major priority for the profession going forward.

While perhaps no substitute for meeting in person, #WeMissiPRES nonetheless succeeded in bringing the international digital preservation community together in a shared celebration of the progress being made in the field, successfully bridging the gap between 2019 and 2021, and laying the foundations for next year’s conference.

 

#WeMissiPRES was held online from 22nd-24th September 2020. For more information, and for recordings of the talks and panel sessions, see the event page on the DPC website.

Newly available: Recollecting Oxford Medicine oral history project

Born digital material from the Recollecting Oxford Medicine oral history project has been donated to the Weston Library since the early 2010s, and the project is still active today with further interviews planned. A selection of interviews from the project are now available to listen to online,  via University of Oxford podcasts.

The Recollecting Oxford Medicine oral history project comprises interviews with Oxford medics, which provide individual perspectives of both pre clinical and clinical courses at the Oxford Medical School, medical careers in Oxford and other locations, and give an insight into the evolution of clinical medicine at Oxford since the mid 1940s.

The interviewees have worked in a range of specialisms and departments including psychiatry, neurology, endocrinology and dermatology to name a few. Episode 18 comprises an interview with John Ledingham, former Director of Clinical Studies (a position he held twice!), recorded by Peggy Frith and Rosie Fitzherbert Jones in 2012.  In episodes 11-12 we can learn about Chris Winearls – a self proclaimed ‘accidental Rhodes Scholar’ from medical school in Cape Town – his journey into nephrology and how he later became Associate Professor of Medicine for the university.

Listen to the Recollecting Oxford Medicine oral history podcast series online at https://podcasts.ox.ac.uk/series/recollecting-oxford-medicine-oral-histories

In episode 1, John Spalding,  interviewed by John Oxbury  in 2011, discusses working under Hugh Cairns, firstly as a student houseman at the Radcliffe Infirmary during the second world war.  Spalding also recounts his experience of the initial conception of the East Radcliffe Ventilator, first being devised for use in treatment of Polio. In episode 13 we can listen to Derek Hockaday’s interview with Joan Trowell, former Deputy Director of Clinical Studies for Oxford Medical School, which amongst other topics covers her experience of roles held at the General Medical Council.

The majority of the interviews were undertaken by Derek Hockaday, former Oxford hospitals consultant physician and Emeritus Fellow of Brasenose College. The cataloguing and preservation of the oral history project is supported by Oxford Medical Alumni. The library acknowledges the donations of material and financial support by Derek Hockaday and OMA respectively.

Listeners may also be interested in the Sir William Dunn School of Pathology Oral Histories, of which the archive masters are also preserved in the Weston Library.

Archiving web content related to the University of Oxford and the coronavirus pandemic

Since March 2020, the scope of collection development at the Bodleian Libraries’ Web Archive has expanded to also focus on the coronavirus pandemic: how the University of Oxford, and wider university community have reacted and responded to the rapidly changing global situation and government guidance. The Bodleian Libraries’ Web Archive team have endeavoured (and will keep working) to capture, quality assess and make publicly available records from the web relating to Oxford and the coronavirus pandemic. Preserving these ephemeral records is important. Just a few months into what is sure to be a long road, what do these records show?

Firstly, records from the Bodleian Libraries’ Web Archive can demonstrate how university divisions and departments are continually adjusting in order to facilitate core activities of learning and research. This could be by moving planned events online or organising and hosting new events relevant to the current climate:

Capture of http://pcmlp.socleg.ox.ac.uk/ 24 May 2020 available through the Bodleian Libraries’ Web Archive. Wayback URL https://wayback.archive-it.org/2502/20200524133907/https://pcmlp.socleg.ox.ac.uk/global-media-policy-seminar-series-victor-pickard-on-media-policy-in-a-time-of-crisis/

Captures of websites also provide an insight to the numerous collaborations of Oxford University with both the UK government and other institutions at this unprecedented time; that is, the role Oxford is playing and how that role is changing and adapting. Much of this can be seen in the ever evolving news pages of departmental websites, especially those within Medical Sciences division, such as the Nuffield Department of Population Health’s collaboration with UK Biobank for the government department of health and social care announced on 17 May 2020.

The web archive preserves records of how certain groups are contributing to coronavirus covid-19 research, front line work and reviewing things at an extremely  fast pace which the curators at Bodleian Libraries’ Web Archive can attempt to capture by crawling more frequently. One example of this is the Centre for Evidence Based Medicine’s Oxford Covid-19 Evidence Service – a platform for rapid data analysis and reviews which is currently updated with several articles daily. Comparing two screenshots of different captures of the site, seven weeks apart, show us the different themes of data being reviewed, and particularly how the ‘Most Viewed’ questions change (or indeed, don’t change) over time.

Capture of https://www.cebm.net/covid-19/ 14 April 2020 available through the Bodleian Libraries’ Web Archive. Wayback URL https://wayback.archive-it.org/org-467/20200414111731/https://www.cebm.net/covid-19/

Interestingly, the page location has slightly changed, the eagle-eyed among you may have spotted that the article reviews are now under /oxford-covid-19-evidence-service/, which is still in the web crawler’s scope.

Capture of https://www.cebm.net/covid-19/ 05 June 2020 available through the Bodleian Libraries’ Web Archive. Wayback url https://wayback.archive-it.org/org-467/20200605100737/https://www.cebm.net/oxford-covid-19-evidence-service/

We welcome recommendations for sites to archive; if you would like to nominate a website for inclusion in the Bodleian Libraries’ Web Archive you can do so here. Meanwhile, the work to capture institutional, departmental and individual responses at this time continues.

New catalogue – Oxford Women in Computing: An Oral History project

The catalogue of the Oxford Women in Computing oral history project is now available online.

This oral history project captures the experiences of 10 pioneering women who were active in computing research, teaching and service provision between the 1950s and 1990s, not only in Oxford, but at national and international levels. The rationale for the project, funded by the Engineering and Physical Sciences Research Council, through grants held by Professor Ursula Martin, was that women had participated in very early stages of computing; aside from a few exceptions their stories had not been captured – or indeed told. Among the interviewees are Eleanor Dodson, methods developer for Protein Crystallography and former research technician for Dorothy Hodgkin and Linda Hayes, former Head of User Services at the Oxford University Computing Service – now University of Oxford IT services. Leonor Barroca left Portugal in 1982 as a qualified electrical engineer to follow a boyfriend to Oxford – later that year she was one of three women on the university’s MSc in Computing course. Leonor also worked briefly as a COBOL (common business-oriented language) programmer for the Bodleian Libraries.

Themes throughout the interviews, which were conducted in 2018 by author and broadcaster Georgina Ferry, include:

  • career opportunities and early interests in computing
  • gender splits in computing
  • the origins and development of computing teaching and research in Oxford
  • development of the University of Oxford’s Computing Service and the commercial software house the Numerical Algorithms Group (NAG).

The Oxford Women in Computing oral histories serve as a source for insight into nearly half a century of women’s involvement in computing at Oxford and beyond.  The collection will particularly be of use to those interested in gender studies and the history of computing.

The interviews can be listened to online though University of Oxford podcasts here.

Communications programmer Esther White in the early days of the University of Oxford’s Computing Service. © University of Oxford

 

 

Web Archiving & Preservation Working Group: Social Media & Complex Content

On January 16 2020, I had the pleasure of attending the first public meeting of the Digital Preservation Coalition’s Web Archiving and Preservation Working Group. The meeting was held in the beautiful New Records House in Edinburgh.

We were welcomed by Sara Day Thomson who in her opening talk gave us a very clear overview of the issues and questions we increasingly run into when archiving complex/ dynamic web or social media content. For example, how do we preserve apps like Pokémon Go that use a user’s location data or even personal information to individualize the experience? Or where do we draw the line in interactive social media conversations? After all, we cannot capture everything. But how do we even capture this information without infringing the rights of the original creators? These and more musings set the stage perfectly to the rest of the talks during the day.

Although I would love to include every talk held this day, as they were all very interesting, I will only highlight a couple of the presentations to give this blog some pretence at “brevity”.

The first talk I want to highlight was given by Giulia Rossi, Curator of Digital Publications at the British Library, on “Overview of Collecting Approach to Complex Publications”. Rossie introduced us to the emerging formats project; a two year project by the British Library. The project focusses on three types of content:

  1. Web-based interactive narratives where the user’s interaction with a browser based environment determines how the narrative evolves;
  2. Book as mobile apps (a.k.a. literary apps);
  3. Structured data.

Personally, I found Rossi’s discussion of the collection methods in particular very interesting. The team working on the emerging formats project does not just use heritage crawlers and other web harvesting tools, but also file transfers or direct downloads via access code and password. Most strikingly, in the event that only a partial capture can be made, they try to capture as much contextual information about the digital object as possible including blog posts, screen shots or videos of walkthroughs, so researchers will have a good idea of what the original content would have looked like.

The capture of contextual content and the inclusion of additional contextual metadata about web content is currently not standard practice. Many tools do not even allow for their inclusion. However, considering that many of the web harvesting tools experience issues when attempting to capture dynamic and complex content, this could offer an interesting work-around for most web archives. It is definitely an option that I myself would like to explore going forward.

The second talk that I would like to zoom in on is “Collecting internet art” by Karin de Wild, digital fellow at the University of Leicester. Taking the Agent Ruby – a chatbot created by Lynn Hershman Leeson – as her example, de Wild explored questions on how we determine what aspects of internet art need to be preserved and what challenges this poses. In the case of Agent Ruby, the San Francisco Museum of Modern Art initially exhibited the chatbot in a software installation within the museum, thereby taking the artwork out of its original context. They then proceeded to add it to their online Expedition e-space, which has since been taken offline. Only a print screen of the online art work is currently accessible through the SFMOMA website, as the museum prioritizes the preservation of the interface over the chat functionality.

This decision raises questions about the right ways to preserve online art. Does the interface indeed suffice or should we attempt to maintain the integrity of the artwork by saving the code as well? And if we do that, should we employ code restitution, which aims to preserve the original arts’ code, or a significant part of it, whilst adding restoration code to reanimate defunct code to full functionality? Or do we emulate the software as the University of Freiburg is currently exploring? How do we keep track of the provenance of the artwork whilst taking into account the different iterations that digital art works go through?

De Wild proposed to turn to linked data as a way to keep track of particularly the provenance of an artwork. Together with two other colleagues she has been working on a project called Rhizome in which they are creating a data model that will allow people to track the provenance of internet art.

Although this is not within the scope of the Rhizome project, it would be interesting to see how the finished data model would lend itself to keep track of changes in the look and feel of regular websites as well. Even though the layouts of websites have changed radically over the past number of years, these changes are usually not documented in metadata or data models, even though they can be as much of a reflection of social and cultural changes as the content of the website. Going forward it will be interesting to see how the changes in archiving online art works will influence the preservation of online content in general.

The final presentation I would like to draw attention to is “Twitter Data for Social Science Research” by Luke Sloan, deputy director of the Social Data Science Lab at the University of Cardiff. He provided us with a demo of COSMOS, an alternative to the twitter API, which  is freely available to academic institutions and not-for-profit organisations.

COSMOS allows you to either target a particular twitter feed or enter a search term to obtain a 1% sample of the total worldwide twitter feed. The gathered data can be analysed within the system and is stored in JSON format. The information can subsequently be exported to a .CVS or Excel format.

Although the system is only able to capture new (or live) twitter data, it is possible to upload historical twitter data into the system if an archive has access to this.

Having given us an explanation on how COSMOS works, Sloan asked us to consider the potential risks that archiving and sharing twitter data could pose to the original creator. Should we not protect these creators by anonymizing their tweets to a certain extent? If so,  what data should we keep? Do we only record the tweet ID and the location? Or would this already make it too easy to identify the creator?

The last part of Sloan’s presentation tied in really well with the discussion about the ethical approaches to archiving social media. During this discussion we were prompted to consider ways in which archives could archive twitter data, whilst being conscious of the potential risks to the original creators of the tweets. This definitely got me thinking about the way we currently archive some of the twitter accounts related to the Bodleian Libraries in our very own Bodleian Libraries Web Archive.

All in all, the DPC event definitely gave me more than enough food for thought about the ways in which the Bodleian Libraries and the wider community in general can improve the ways we capture (meta)data related to the online content that we archive and the ethical responsibilities that we have towards the creators of said content.

Because Digital Objects can Decay too: Conducting a Proof of Concept for Archivematica

Like other archives, the Bodleian Libraries has been searching for ways to optimize the conservation of our digital collections. The need to find a solution has become increasingly pressing as the Bodleian Electronic Archives and Manuscripts (BEAM), our digital repository service for the management of born-digital archives and manuscripts acquired by the Special Collections, now contains roughly 13TB worth of digital objects, with much more waiting in the wings.

In order to help us manage the ingest of digital objects within our collections, the Bodleian Libraries undertook an options review as part of its DPOC project. This lead to a decision to conduct a proof of concept of Archivematica. This proof of concept included the installation of a QA and DEV environment with the help of Artefactual followed by an extensive testing period and a gap analysis.

In November 2018 we started testing the system to establish whether or not Archivematica met our acceptance criteria. We mainly focussed on three areas:

  1. Overall performance/ functionality: Is the system user friendly? Can it successfully process all the different file types and sizes that we have in our collection?
  2. Metadata: Can Archivematica extract the metadata from the Excel sheets that we have created over time? What technical metadata does Archivematica automatically extract from ingested files?
  3. File extraction and normalization: Are disk images extracted properly? Is the content of a transfers normalized to the right file type?

Whilst testing, we also reached out to and visited other organisations that had already implemented Archivematica as well, including the International Institute of Social History in Amsterdam, the University of Edinburgh, the National Library of Wales and the Wellcome Trust.

Based on the outcomes of the tests we conducted, and the conversations we had with other institutions, we identified five gap areas:

  1. Performance: The Archivematica instance we configured for the Proof of Concept struggled with transfers over 200GB or transfers that contain over 5000+ files.
  2. Error reporting: It was often unclear what a particular error code and message meant. The error logs used by system administrators are also verbose, making it hard for them to pinpoint the error.
  3. Metadata: Here we identified two gaps. Firstly, there is the verbosity of the metadata. Because Archivematica records individual PREMIS events for each digital file, the resulting METS file becomes unwieldy, compromising the system’s performance. Secondly, we require a workflow to migrate our spreadsheet-held legacy pre-ingest capture metadata and file-level metadata into Archivematica, and to go on including this pre-ingest metadata, which will continue to be recorded in spreadsheet form for the foreseeable, in future ingests.
  4. User/ access management: Archivematica does not offer a way to manage access to collections or Archive Information Packages, and allows all users to alter the system work-flow. We are a multi-user organisation, and wish to have tighter controls on access to collections and workflow configurations.
  5. General reporting: Archivematica currently does not offer many reports to monitor progress, content and growth of collections.

Once we identified these gaps we had an intensive two day workshop with Artefactual to pinpoint possible solutions, which we subsequently presented to the wider Archivematica community during the Archivematica Camp in London in July 2019.

We will use all the input gathered from the proof of concept to inform our initial implementation of Archivematica, which will begin in January 2020. The project will focus on the performance and metadata gaps identified during the proof of concept, allowing us to bring Archivematica into production use 2021. We are keen to work with the Archivematica community, so do get in touch at beam@bodleian.ox.ac.uk if you’re interested in finding out more about our work.

A new project in Archives and Modern Manuscripts: the conversion of the Bodleian’s Summary Catalogue of Western Manuscripts

The summer of 2019 saw the beginning of an exciting and much anticipated new project in Archives and Modern Manuscripts: the conversion of the Bodleian’s Summary Catalogue of Western Manuscripts into machine-readable format, ready for greater online accessibility through the newly launched Bodleian Archives & Manuscripts website.

What is the Summary Catalogue of Western Manuscripts?


The Summary Catalogue of Western Manuscripts edited by Richard W. Hunt, Falconer Madan and P. D. Record (1915)

The Summary Catalogue of Western Manuscripts is key to accessing our collections. The ten volumes were compiled to list all of the Western manuscripts held by the Library, as a summary of the collection (they are aptly named), and a finding aid for researchers and readers. The first seven volumes, edited by Richard W. Hunt, Falconer Madan and P. D. Record, provide an overview of manuscripts acquired before 1915. The last three volumes, edited by Mary Clapinson and T. D. Rogers, were published in 1991 and describe acquisitions made between 1916 and 1975.

Together, the volumes of the Summary Catalogue of Western Manuscripts describes approximately 56,000 shelfmarks (physical places within our archival storage), and thus a substantial part of our vast and eclectic collection. The material ranges from manuscripts acquired singly such as an Album of genealogical tables of ruling families of Europe and the Middle East from classical times to the 20th century, to large archives such as the archive of John Locke (full catalogue coming soon).

If you want to learn more about the Summary Catalogue of Western Manuscripts and the acquisition of material at the Bodleian Libraries, alongside our interesting history, we highly recommend William Dunn Macray’s Annals of the Bodleian Library, Oxford, A.D. 1598-A.D. 1867, which you can read online here. William Dunn Macray worked here at the Bodleian during the nineteenth century.

How can you discover what’s in the Summary Catalogue now?

The volumes of the Summary Catalogue of Western Manuscripts are accessible in paper format in the Weston library and have also been digitised to be accessible remotely. Digitised scans, in PDF form, are available via SOLO: the first seven volumes are accessible here, and the last three volumes there.  The first few Summary Catalogue descriptions that we’ve converted since the project began in September have been published in Bodleian Archives & Manuscripts. You can find details of what’s been published so far on our New Additions page.

Meet the team:

We are two archivists working exclusively on the project: Alice Whichelow and Pauline Soum-Paris. Our colleague Kelly Burchmore also devotes some of her time to the project.

Alice Whichelow – Hi! I qualified as an archivist in September 2019, gaining my qualification in Archives and Records Management from University College London. As a history enthusiast, getting to explore some of the lesser known treasures of the Bodleian Libraries’ collection is great, and getting to share them is even better!

Pauline Soum-Paris – After completing my Master of Archives and Records Management at the University of Liverpool, I became a qualified archivist in September 2019. With interests in languages, history and religions, I can only see the collection held by the Bodleian Libraries as a goldmine and I am looking forward to sharing a few of the gems I come across every day!

Kelly Burchmore – As a project archivist who qualified in Digital Curation in March 2019, I work mostly on modern collections. Therefore, through the conversion process I enjoy learning about the physical characteristics of more traditional archive material; it’s interesting to read about the binding of the manuscripts, and see the meticulous methods by which they were catalogued. It’s great to work with Alice and Pauline to share the value of this project, and indeed, the collections and items themselves.

What you can expect from us:

The conversion of these Summary Catalogue descriptions into machine-readable form for online discovery is now well-underway, and new descriptions will be added regularly to Bodleian Archives & Manuscripts over the course of 2020 and 2021. We will be using this blog to keep you updated on what we find, sharing blog posts about items and collections from the Summary Catalogue of Western Manuscripts which have sparked our interests. Likewise, if you have used the Summary Catalogue of Western Manuscripts and have suggestions regarding items that fascinated you, do let us know in the comments. So, keep an eye out and enjoy!

“All the kick, the go, the cheese”: Lady Clarendon’s letters in Bodleian Student Editions

This term, the Bodleian Student Editions workshops have entered their fourth year.

Students at the 30 October workshop get acquainted with Lady Clarendon’s diaries

They continue to attract students from across the university, undergraduate and postgraduate, arts and science students. This year we have been editing the letters of Katharine, Countess of Clarendon (1810-1874), to her sister-in-law, [Maria] Theresa Lewis, and these letters are proving to be as fascinating as the very popular Penelope Maitland correspondence.  Some of the letters have been uploaded into our ongoing catalogue on Early Modern Letters Online.

Students working on Lady Clarendon’s letters

Staff and students grapple with tricky handwriting, 6 Nov 2018

These letters fulfil the criteria that we have laid down for suitable material for the workshops – they are in good condition, unpublished, interesting, readable for non-specialists, have no copyright complications, and are in a format that allows the letters to be distributed among the students in the workshop. As the students work in pairs, we require six  or seven individual letters in each workshop, with more in reserve should the transcripts be completed quickly. The perfect format is the fascicule which makes the letters much easier to handle – one fascicule can be given to each pair. Inevitably, most of the good runs of letters that fulfil these requirements tend to be in 19th-century collections of papers that were never bound. This allows us to make a virtue of necessity, because there are very large collections of 19th-century letters acquired relatively recently (i.e. post-1970) that are well worth exploring for their historical interest.

Lady Clarendon’s letters in fascicules

Selection of the Lady Clarendon letters was undertaken by myself and Balliol student Stephanie Kelley, the Balliol-Bodley scholar in early 2018, who also provided digital photographs of many of the letters. Though the workshops give access to original papers, digital images are also made available for detailed checking of difficult words.

The letters were purchased by the Bodleian in 1982, to add to the archive of her husband the 4th Earl of Clarendon already deposited here in 1949 (the 4th Earl’s papers were transferred to Library ownership in 2013). The choice of Lady Clarendon as a subject for the workshops is fortunate in that this year we have been joined by Andrew Cusworth, who is placed in the Bodleian in connection with the Prince Albert Digitisation Project. The Earl and Countess of Clarendon were intimate with Queen Victoria and Prince Albert, and court gossip is one of the interesting aspects of the letters.

Lady Clarendon to Theresa Lewis, Vice Regal Lodge, Dublin, 14 Dec 1847

George Villiers, 4th Earl of Clarendon (1800-1870), was a major political figure of the mid-Victorian period, and his wife’s letters are of considerable political interest as she was his confidante in many matters. In the period covered by the letters, Clarendon was Lord Lieutenant of Ireland from 1847 to 1852, and then Foreign Secretary from 1853 to 1858. His career therefore coincided with major events including the Irish Famine, the Young Ireland rebellion of 1848, the Crimean War and the Indian uprising known as the ‘Mutiny’. The recipient of Lady Clarendon’s letters was Maria Theresa Lewis (nee Villiers), Clarendon’s sister, and the wife of George Cornewall Lewis (1806-1863), another Liberal politician who served as Under-Secretary of State for Home Affairs from 1847 to 1850, Chancellor of the Exchequer 1855 to 1858, Home Secretary 1859 to 1861, and War Secretary from 1861 to 1863. The letters do not only discuss politics however. There is a great deal about family matters, the activities, and above all the illnesses of children, parents and other family members. Lady Clarendon’s lively style provides a very accessible glimpse of aristocratic Victorian life and preoccupations, and the student editions will provide a very useful adjunct to the catalogues of the various parts of the extensive Clarendon archives in the Bodleian.

The workshops have been kept entertained by Lady Clarendon’s fascinating take on mid-Victorian life. Here are just a few examples of her inimitable style – more extracts will follow so watch this space! All letter are to her sister-in-law Theresa Lewis.  Look out for a follow-up Blog with further extracts.

Vice Regal Lodge, 22 Sep 1847 – on the arrival of her mother-in-law in Ireland

Here is Mrs. George sick, tired, but having had a good short passage … she has blue pilled and Speedimanis’d … [Speediman’s pills were a Victorian remedy for stomach complaints]

Vice Regal Lodge, 14 Dec 1847 – on Irish troubles

Lord Clancarty told me … that Bishop Derry the Catholic Bishop of Clonfort had inadvertently let out before Lord Sligo dining out somewhere that the landlords who had been shot deserved it richly!!!! – this Bishop is a Jesuit, I believe a clever and a wily man, but saying this was a great slip…

Vice Regal Lodge, 17 Dec 1847 – forgets to report the birth of her sixth child!

George Lewis’s Board of Controul office, his most excellent début in Parliament, on your side the water, and our dreadful murders and George’s administrative atchievements on this side have been deeply interesting to us both – only think of my not mentioning George Patrick Hyde’s birth too amongst the remarkable events!!

Vice Regal Lodge, 1 Jan 1848 – ‘my unavailing head’

 … George depends upon me for writing to you for him too as tho’ always busy he is particularly overwhelmed to-day and at this moment I hear the murmuring voices of Attorney Generals and Lord Chief Justices in his room settling all sorts of coercive and improvement measures and I don’t venture even to pop my ‘unavailing’ head (as he calls it) in…

[in the same letter] – a present that is ‘all “the kick, the go, the cheese”’

… Mama is leaving us with Robert this afternoon … – they take two small parcels to London. There is a small locket of blue enamel and rose diamonds with George’s and my hair in it, which we present with a joint kiss to you as a little Xmas souvenir– There is a chatelaine in steel which is all “the kick, the go, the cheese” and which I send to Thérèse as my birthday present …

OED  chatelaine: ‘an ornamental appendage worn by ladies at their waist … consists of a number of short chains attached to the girdle or belt … bearing articles of household use and ornament, as keys, corkscrews, scissors, penknife, pin-cushion, thimble-case, watch etc …’

OED the kick: the fashion, the newest style

OED the go: the height of fashion; the ‘in’ thing, the ‘rage’.

OED the cheesecolloquialObsolete. The right, correct, or best thing; something first-rate, genuine, or exemplary.

Students share an amusing anecdote with staff.

Bodleian Student Editions workshops are organised by Helen Brown (DPhil candidate in English), Andrew Cusworth, Chris Fletcher, Miranda Lewis (Cultures of Knowledge), Olivia Thompson (DPhil candidate in Ancient History), and Mike Webb, as a collaboration between the Department of Special Collections, Centre for Digital Scholarship, and Cultures of Knowledge. All photographs by Olivia Thompson