Category Archives: Digital archives

A new project in Archives and Modern Manuscripts: the conversion of the Bodleian’s Summary Catalogue of Western Manuscripts

The summer of 2019 saw the beginning of an exciting and much anticipated new project in Archives and Modern Manuscripts: the conversion of the Bodleian’s Summary Catalogue of Western Manuscripts into machine-readable format, ready for greater online accessibility through the newly launched Bodleian Archives & Manuscripts website.

What is the Summary Catalogue of Western Manuscripts?


The Summary Catalogue of Western Manuscripts edited by Richard W. Hunt, Falconer Madan and P. D. Record (1915)

The Summary Catalogue of Western Manuscripts is key to accessing our collections. The ten volumes were compiled to list all of the Western manuscripts held by the Library, as a summary of the collection (they are aptly named), and a finding aid for researchers and readers. The first seven volumes, edited by Richard W. Hunt, Falconer Madan and P. D. Record, provide an overview of manuscripts acquired before 1915. The last three volumes, edited by Mary Clapinson and T. D. Rogers, were published in 1991 and describe acquisitions made between 1916 and 1975.

Together, the volumes of the Summary Catalogue of Western Manuscripts describes approximately 56,000 shelfmarks (physical places within our archival storage), and thus a substantial part of our vast and eclectic collection. The material ranges from manuscripts acquired singly such as an Album of genealogical tables of ruling families of Europe and the Middle East from classical times to the 20th century, to large archives such as the archive of John Locke (full catalogue coming soon).

If you want to learn more about the Summary Catalogue of Western Manuscripts and the acquisition of material at the Bodleian Libraries, alongside our interesting history, we highly recommend William Dunn Macray’s Annals of the Bodleian Library, Oxford, A.D. 1598-A.D. 1867, which you can read online here. William Dunn Macray worked here at the Bodleian during the nineteenth century.

How can you discover what’s in the Summary Catalogue now?

The volumes of the Summary Catalogue of Western Manuscripts are accessible in paper format in the Weston library and have also been digitised to be accessible remotely. Digitised scans, in PDF form, are available via SOLO: the first seven volumes are accessible here, and the last three volumes there.  The first few Summary Catalogue descriptions that we’ve converted since the project began in September have been published in Bodleian Archives & Manuscripts. You can find details of what’s been published so far on our New Additions page.

Meet the team:

We are two archivists working exclusively on the project: Alice Whichelow and Pauline Soum-Paris. Our colleague Kelly Burchmore also devotes some of her time to the project.

Alice Whichelow – Hi! I qualified as an archivist in September 2019, gaining my qualification in Archives and Records Management from University College London. As a history enthusiast, getting to explore some of the lesser known treasures of the Bodleian Libraries’ collection is great, and getting to share them is even better!

Pauline Soum-Paris – After completing my Master of Archives and Records Management at the University of Liverpool, I became a qualified archivist in September 2019. With interests in languages, history and religions, I can only see the collection held by the Bodleian Libraries as a goldmine and I am looking forward to sharing a few of the gems I come across every day!

Kelly Burchmore – As a project archivist who qualified in Digital Curation in March 2019, I work mostly on modern collections. Therefore, through the conversion process I enjoy learning about the physical characteristics of more traditional archive material; it’s interesting to read about the binding of the manuscripts, and see the meticulous methods by which they were catalogued. It’s great to work with Alice and Pauline to share the value of this project, and indeed, the collections and items themselves.

What you can expect from us:

The conversion of these Summary Catalogue descriptions into machine-readable form for online discovery is now well-underway, and new descriptions will be added regularly to Bodleian Archives & Manuscripts over the course of 2020 and 2021. We will be using this blog to keep you updated on what we find, sharing blog posts about items and collections from the Summary Catalogue of Western Manuscripts which have sparked our interests. Likewise, if you have used the Summary Catalogue of Western Manuscripts and have suggestions regarding items that fascinated you, do let us know in the comments. So, keep an eye out and enjoy!

“All the kick, the go, the cheese”: Lady Clarendon’s letters in Bodleian Student Editions

This term, the Bodleian Student Editions workshops have entered their fourth year.

Students at the 30 October workshop get acquainted with Lady Clarendon’s diaries

They continue to attract students from across the university, undergraduate and postgraduate, arts and science students. This year we have been editing the letters of Katharine, Countess of Clarendon (1810-1874), to her sister-in-law, [Maria] Theresa Lewis, and these letters are proving to be as fascinating as the very popular Penelope Maitland correspondence.  Some of the letters have been uploaded into our ongoing catalogue on Early Modern Letters Online.

Students working on Lady Clarendon’s letters

Staff and students grapple with tricky handwriting, 6 Nov 2018

These letters fulfil the criteria that we have laid down for suitable material for the workshops – they are in good condition, unpublished, interesting, readable for non-specialists, have no copyright complications, and are in a format that allows the letters to be distributed among the students in the workshop. As the students work in pairs, we require six  or seven individual letters in each workshop, with more in reserve should the transcripts be completed quickly. The perfect format is the fascicule which makes the letters much easier to handle – one fascicule can be given to each pair. Inevitably, most of the good runs of letters that fulfil these requirements tend to be in 19th-century collections of papers that were never bound. This allows us to make a virtue of necessity, because there are very large collections of 19th-century letters acquired relatively recently (i.e. post-1970) that are well worth exploring for their historical interest.

Lady Clarendon’s letters in fascicules

Selection of the Lady Clarendon letters was undertaken by myself and Balliol student Stephanie Kelley, the Balliol-Bodley scholar in early 2018, who also provided digital photographs of many of the letters. Though the workshops give access to original papers, digital images are also made available for detailed checking of difficult words.

The letters were purchased by the Bodleian in 1982, to add to the archive of her husband the 4th Earl of Clarendon already deposited here in 1949 (the 4th Earl’s papers were transferred to Library ownership in 2013). The choice of Lady Clarendon as a subject for the workshops is fortunate in that this year we have been joined by Andrew Cusworth, who is placed in the Bodleian in connection with the Prince Albert Digitisation Project. The Earl and Countess of Clarendon were intimate with Queen Victoria and Prince Albert, and court gossip is one of the interesting aspects of the letters.

Lady Clarendon to Theresa Lewis, Vice Regal Lodge, Dublin, 14 Dec 1847

George Villiers, 4th Earl of Clarendon (1800-1870), was a major political figure of the mid-Victorian period, and his wife’s letters are of considerable political interest as she was his confidante in many matters. In the period covered by the letters, Clarendon was Lord Lieutenant of Ireland from 1847 to 1852, and then Foreign Secretary from 1853 to 1858. His career therefore coincided with major events including the Irish Famine, the Young Ireland rebellion of 1848, the Crimean War and the Indian uprising known as the ‘Mutiny’. The recipient of Lady Clarendon’s letters was Maria Theresa Lewis (nee Villiers), Clarendon’s sister, and the wife of George Cornewall Lewis (1806-1863), another Liberal politician who served as Under-Secretary of State for Home Affairs from 1847 to 1850, Chancellor of the Exchequer 1855 to 1858, Home Secretary 1859 to 1861, and War Secretary from 1861 to 1863. The letters do not only discuss politics however. There is a great deal about family matters, the activities, and above all the illnesses of children, parents and other family members. Lady Clarendon’s lively style provides a very accessible glimpse of aristocratic Victorian life and preoccupations, and the student editions will provide a very useful adjunct to the catalogues of the various parts of the extensive Clarendon archives in the Bodleian.

The workshops have been kept entertained by Lady Clarendon’s fascinating take on mid-Victorian life. Here are just a few examples of her inimitable style – more extracts will follow so watch this space! All letter are to her sister-in-law Theresa Lewis.  Look out for a follow-up Blog with further extracts.

Vice Regal Lodge, 22 Sep 1847 – on the arrival of her mother-in-law in Ireland

Here is Mrs. George sick, tired, but having had a good short passage … she has blue pilled and Speedimanis’d … [Speediman’s pills were a Victorian remedy for stomach complaints]

Vice Regal Lodge, 14 Dec 1847 – on Irish troubles

Lord Clancarty told me … that Bishop Derry the Catholic Bishop of Clonfort had inadvertently let out before Lord Sligo dining out somewhere that the landlords who had been shot deserved it richly!!!! – this Bishop is a Jesuit, I believe a clever and a wily man, but saying this was a great slip…

Vice Regal Lodge, 17 Dec 1847 – forgets to report the birth of her sixth child!

George Lewis’s Board of Controul office, his most excellent début in Parliament, on your side the water, and our dreadful murders and George’s administrative atchievements on this side have been deeply interesting to us both – only think of my not mentioning George Patrick Hyde’s birth too amongst the remarkable events!!

Vice Regal Lodge, 1 Jan 1848 – ‘my unavailing head’

 … George depends upon me for writing to you for him too as tho’ always busy he is particularly overwhelmed to-day and at this moment I hear the murmuring voices of Attorney Generals and Lord Chief Justices in his room settling all sorts of coercive and improvement measures and I don’t venture even to pop my ‘unavailing’ head (as he calls it) in…

[in the same letter] – a present that is ‘all “the kick, the go, the cheese”’

… Mama is leaving us with Robert this afternoon … – they take two small parcels to London. There is a small locket of blue enamel and rose diamonds with George’s and my hair in it, which we present with a joint kiss to you as a little Xmas souvenir– There is a chatelaine in steel which is all “the kick, the go, the cheese” and which I send to Thérèse as my birthday present …

OED  chatelaine: ‘an ornamental appendage worn by ladies at their waist … consists of a number of short chains attached to the girdle or belt … bearing articles of household use and ornament, as keys, corkscrews, scissors, penknife, pin-cushion, thimble-case, watch etc …’

OED the kick: the fashion, the newest style

OED the go: the height of fashion; the ‘in’ thing, the ‘rage’.

OED the cheesecolloquialObsolete. The right, correct, or best thing; something first-rate, genuine, or exemplary.

Students share an amusing anecdote with staff.

Bodleian Student Editions workshops are organised by Helen Brown (DPhil candidate in English), Andrew Cusworth, Chris Fletcher, Miranda Lewis (Cultures of Knowledge), Olivia Thompson (DPhil candidate in Ancient History), and Mike Webb, as a collaboration between the Department of Special Collections, Centre for Digital Scholarship, and Cultures of Knowledge. All photographs by Olivia Thompson

Old Ideas, New Technologies: Historical and Vintage Festivals in the UK Web Archive

Festivals are wonderful events that can often involve thousands of people, united by their shared love for a common activity or theme. The UK Web Archive seeks to capture, and record these often colourful and creative demonstrations of human culture and creativity.

Some Festivals are very large and documented, such as Glastonbury which often attracts over a 100,000 people. However, there are also a number of smaller and more specific festivals which are less well known outside of their local communities and networks, such as the Shelswell History Festival. However, the internet has helped level the playing field, and given these smaller festivals an opportunity to publicise their events far beyond the reaches of their traditional borders and boundaries. And this has allowed archivists such as myself to find and add these festivals to the UK Web Archive.

(The Festivals Icon on the UK Web Archive Website)

Historical and Vintage Festivals

One of the most personally intriguing parts of the UK Web Archive festivals collection for me is Historical and Vintage festivals. These festivals rarely attract the level of media attention that a high profile music festival featuring the world’s biggest pop stars would enjoy. However, the UK Web Archive, is about diversity, inclusivity, and finding value in all parts of society. People who attend, organise, and take part in historical and vintage festivals form part of a collective effort which often results in a website that helps chronicle their enthusiasm.

Thus far we have found forty eight different historical and vintage festivals that take place in the United Kingdom. These festivals are broad and varied, and celebrate a multitude of things. This includes Newport Rising which celebrates the 1839 Chartist rebellion, the Lupton House Festival of History which celebrates a historic house, and Frock Me! Which is a vintage fashion fair. Every single one of these festivals is unique and specific in their own way, but they do have something in common. They all celebrate history and the past, and are characterised by a charming sense of nostalgia and remembrance.

While the website is no substitute for attending in person, they often include:

• Basic information about the festival’s time, place, and theme.
• An array of photographs.
• Anecdotes about the events.
• Information about the festivals donors and supporters.
• And additional information, such as attendance policies and rules etc.

A notable feature of these websites is how they use relatively new technologies to organise events which celebrate old events, places, and themes. This indicates a fantastic synergy between the heritage sector, and modern technology.

Online Enthusiast Communities in the UK Web Archive

There is a saying that ‘variety is the spice of life’ and this is certainly true when you think of the types of hobbies and interests the UK public engages in. There are the hobbies we have all probably heard of such as train spotting or metal detecting and there are the more obscure ones such as Poohsticks or Hand Dryer appreciation.  Websites are a useful tool for enthusiasts to communicate and share their passion with the world. At the UK Web Archive (UKWA) the Online Enthusiast Communities  collection aims to:

‘Capture how UK based public forums are used to discuss hobbies and activities and serve as a place for enthusiasts to converse with others sharing similar interests.’

This collection includes such a diverse and wonderful selection of websites and forums. I can honestly say that curating this collection has truly been a joy – there are probably very few jobs that allow you to look at The Letter Box Study Group (a website about the history and development of British roadside letter boxes) as part of your tasks for the day.

Differences I have noticed

As a curator you get to explore lots of sites and you begin to notice differences and similarities between websites. It is interesting to see the variety in website design and levels of expertise and to me it feels like this is reflected in the websites that are archived.

I have noticed lots of online communities using a variety of website builders. The huge diversity in tools appear to have made it easier to create more professional looking sites with ease. Compared to older sites, you notice:

  • the increased use of images
  • cleaner feel
  • neutral backgrounds
  • minimal text
  • occasional e-commerce sections

However, it is nostalgic to see some of the older more ‘blocky’ sites, as I do remember the days of dial-up internet access and early web sites. To me, forums tend to have a similar feel and the designs does not deviate greatly from each other.

I have also found how often a website updates intriguing. Some are regularly updated whereas others appear to have been untouched for several years. This may reflect that many websites are run by volunteers balancing other commitments. Regularity of updates is an important factor as it will contribute to deciding how often we capture the site – it is the skill of a web archivist to judge this accordingly however these frequencies can be updated.

Some of my Favourite sites

One of the joys of curating this collection is that you get to experience sites that are really unique that you would not normally explore. I wanted to highlight a few of the sites that particularly caught my attention, specifically from the ‘Miscellaneous’ sub section as this is my personal favourite.

Pylon of the Month

Pylon of the month (February 2018) from Sweden. Image Credit: Kristin Allardh, 2018

This is a site dedicated to electricity pylons highlighting a monthly winner. These could include current pylons or historic images and entries can come from the UK and beyond. Images are usually accompanied by some interesting history or facts.

Modernist Britain

Odeon cinema Leicester, Leicestershire. Image Credit: Richard Coltman, 2010

This site is beautifully designed and celebrates modernist architecture in Britain. There are fifty illustrated images with accompanying information about the history of the buildings and photographs taken by Richard Coltman.

Cloud appreciation society

A Lenticular cloud. Image Credit: © José Ramón Sáez, 2019

This site was launched in 2005 with the aim of ‘bringing together people who love the sky’. It has an international membership with members submitting images from all over the world. They also run events, cloud related news and in 2019 they are contributing to the non-profit FogQuest project.

The online enthusiast community is also very witty, there are some fantastically named sites and forums such as:

  • Planet of the Vapes – a forum about vaping
  • DIYnot Forum – a forum about DIY
  • Frit-Happens! – an online community for glass blowing and glass crafting

Curating the online enthusiast collection has been incredibly enjoyable. Having to actively seek new sites has made me more aware of the variety of hobbies and diversity of interests the public engage in.

As this collection develops, more sites relating to the variety of hobbies and interests will be captured and persevered for future generations explore, enjoy and research. However, due to the size, complexity and technological challenges of archiving all UK websites, some may get missed or we just do not know about them . If there is a site that you think should be included then you can nominate it on the ‘Save a UK website‘ page of the UKWA.

Developing collections on Gender Equality at the UK Web Archive

The Gender Equality collection

The UK web archive Gender Equality collection and its themed subsections provide a rich insight into attitudes and approaches towards gender equality in contemporary UK society and culture. This was previously discussed in my last blog post about the collection, which you can read here.

Curating the collection

A great deal of the discussion and activity relating to gender equality occurs predominantly in an online space. This means that as a curator for the Gender Equality collection, the harvest is plenty! The type of content being collected by the UK Web Archive includes:

Of course there is some crossover, not only regarding the type of content but also within subsections of the gender equality collection.

This image is made available and reproduced by CC-BY-NC-SA 2.0. [https://creativecommons.org/licenses/by-nc-sa/2.0/legalcode]

Specifically, I find the event sites in the collection really interesting. As well as documenting that the event(s) even existed and happened in the first place, they can give us a snapshot of who organised the event, as well as who the intended audience were. Also, the collection exhibits the evolution of websites related to gender equality over time (which can be very speedy indeed when it comes to sites like twitter accounts!), and the changing priorities, trends, initiatives and more that can tell us about attitudes towards gender equality in the UK. These kinds of websites are being created by and engaged with by humans right now.

Nominate a website!

The endeavour of the UK Web Archive never stops – if you would like to help grow the Gender Equality collection (or indeed, any other collections) click here to nominate a website to save. Go on…whilst you’re at it, you can explore the UK Web Archive’s funky new interface!

 

Image reference: Workers Solidarity Movement (2012) March for Choice

 

Festivals in the UK Web Archive

Live events are funny things; can their spirit be captured or do you have to “be there to get it”? Personally I don’t think you can, so why are we archiving festival websites?

Running throughout the year, though most tend to be clustered around the short UK summer, festivals form a huge part of the UK’s contemporary cultural scene.  While it’s often the big music festivals that come to mind such as Glastonbury and Reading or perhaps the more local CAMRA sponsored beer and cider festivals; these days there is a festival for pretty much everything under the sun.

UK Web Archive topics and themes

In part this explosion of festivals from the very local and niche to the mainstream and brand sponsored has been helped by the internet. You can now find festivals dedicated to anything from bird watching to meat grilling to vintage motors.

With the number of tools and platforms available for website creation and event and bookings management and the rise of social media, it seems anyone with an idea can put on a festival. More importantly with increasing connectedness that the web gives us, the reach of these home grown festivals has become potentially global.

Of course most will remain small local events that go on until the organisers lose interest or money such as Blissfields in Winchester which had to cancel their 2018 event due to poor ticket sales. But some will make it big like Neverworld which started in 2006 in Lee Denny’s back garden while his parents were away for the week but now 10+ years on has sold out the 5000 capacity festival venue it has relocated to.

The UK Web Archive‘s Festivals collection attempts to capture the huge variety of UK festivals taking place each year and currently has around 1200 events being archived that are loosely categorised based around 15 common themes, though of course there is a great deal of crossover as they can be found combining themes such as:

In this collection of UK festivals sites, while we cannot capture the spirit of a live event we can still try to capture their transient nature. Here you can see their rise and fall, the photographs and comments left in their wake, and their impact on local communities over time. Hopefully these sites and their contents can still give future researchers a sometimes surprising and often candid snapshot of contemporary British culture.

Emily Chen

Oxford LibGuides: Web Archives

Web archives are becoming more and more prevalent and are being increasingly used for research purposes. They are fundamental to the preservation of our cultural heritage in the interconnected digital age. With the continuing collection development on the Bodleian Libraries Web Archive and the recent launch of the new UK Web Archive site, the web archiving team at the Bodleian have produced a new guide to web archives. The new Web Archives LibGuide includes useful information for anyone wanting to learn more about web archives.

It focuses on the following areas:

  • The Internet Archive, The UK Web Archive and the Bodleian Libraries Web Archive.
  • Other web archives.
  • Web archive use cases.
  • Web archive citation information.

Check out the new look for the Web Archives LibGuide.

 

 

Introducing the new UK Web Archive website

Until recently, if you wanted to search the vast UK Legal Deposit Web Archive (containing the whole UK Web space), then you would need to travel to the reading room of a UK Legal Deposit Library to see if what you needed was there. For the first time, the new UK Web Archive website offers:

  • The ability to search the Legal Deposit web archive from anywhere.
  • The ability to search the Legal Deposit web archive alongside the ‘Open’ UK Web Archive (15,000 or so publicly available websites collected since 2005).
  • The opportunity to browse over 100 curated collections on a wide range of topics.

Who is the UK Web Archive?
UKWA is a partnership of all the UK Legal Deposit Libraries – The British Library, National Library of Scotland, National Library of Wales, the Bodleian Libraries, Cambridge University Libraries, and Trinity College, Dublin. The Legal Deposit Web Archive is available in the reading rooms of all the Libraries.

How much is available now?
At the time of writing, everything that a human (curators and collaborators) has selected since 2005 is searchable. This constitutes many thousands of websites and millions of individual web pages. The huge yearly Legal Deposit domain crawls will be added over the coming year.

This includes over 100 curated collections of websites on a wide range of topics and themes. Recent collections curated by the Bodleian Libraries include:

Do the websites look and work as they did originally?
Yes and no. Every effort is made so that websites look how they did originally and internal links should work. However, for a variety of technical  issues many websites will look different or some elements may be missing. As a minimum, all of the text in the collection is searchable and most images should be there. Whilst we collect a considerable amount of video, much of this will not play back.

Is every UK website available?
We aim to collect every website made or owned by a UK resident, however, in reality it is extremely difficult to be comprehensive! Our annual Legal Deposit collections include every .uk (and .london, .scot, .wales and .cymru) plus any website on a server located in the UK. Of course, many websites are .com, .info etc. and on servers in other countries.

If you have or know of a UK website that should be in the archive we encourage you to nominate them via the website.

Another version of this post was first published on the UK Web Archive blog.

Archives Unleashed – Vancouver Datathon

On the 1st-2nd of November 2018 I was lucky enough to attend the  Archives Unleashed Datathon Vancouver co-hosted by the Archives Unleashed Team and Simon Fraser University Library along with KEY (SFU Big Data Initiative). I was very thankful and appreciative of the generous travel grant from the Andrew W. Mellon Foundation that made this possible.

The SFU campus at the Habour Centre was an amazing venue for the Datathon and it was nice to be able to take in some views of the surrounding mountains.

About the Archives Unleashed Project

The Archives Unleashed Project is a three year project with a focus on making historical internet content easily accessible to scholars and researchers whose interests lay in exploring and researching both the recent past and contemporary history.

After a series of datathons held at a number of International institutions such as the British Library, University of Toronto, Library of Congress and the Internet Archive, the Archives Unleashed Team identified some key areas of development that would enable and help to deliver their aim of making petabytes of valuable web content accessible.

Key Areas of Development
  • Better analytics tools
  • Community infrastructure
  • Accessible web archival interfaces

By engaging and building a community, alongside developing web archive search and data analysis tools the project is successfully enabling a wide range of people including scholars, programmers, archivists and librarians to “access, share and investigate recent history since the early days of the World Wide Web.”

The project has a three-pronged approach
  1. Build a software toolkit (Archives Unleashed Toolkit)
  2. Deploy the toolkit in a cloud-based environment (Archives Unleashed Cloud)
  3. Build a cohesive user community that is sustainable and inclusive by bringing together the project team members with archivists, librarians and researchers (Datathons)
Archives Unleashed Toolkit

The Archives Unleashed Toolkit (AUT) is an open-source platform for analysing web archives with Apache Spark. I was really impressed by AUT due to its scalability, relative ease of use and the huge amount of analytical options it provides. It can work on a laptop (Mac OS, Linux or Windows), a powerful cluster or on a single-node server and if you wanted to, you could even use a Raspberry Pi to run AUT. The Toolkit allows for a number of search functions across the entirety of a web archive collection. You can filter collections by domain, URL pattern, date, languages and more. Create lists of URLs to return the top ten in a collection. Extract plain text files from HTML files in the ARC or WARC file and clean the data by removing ‘boilerplate’ content such as advertisements. Its also possible to use the Stanford Named Entity Recognizer (NER) to extract names of entities, locations, organisations and persons. I’m looking forward to seeing the possibilities of how this functionality is adapted to localised instances and controlled vocabularies – would it be possible to run a similar programme for automated tagging of web archive collections in the future? Maybe ingest a collection into ATK , run a NER and automatically tag up the data providing richer metadata for web archives and subsequent research.

Archives Unleashed Cloud

The Archives Unleashed Cloud (AUK) is a GUI based front end for working with AUT, it essentially provides an accessible interface for generating research derivatives from Web archive files (WARCS). With a few clicks users can ingest and sync Archive-it collections, analyse the collections, create network graphs and visualise connections and nodes. It is currently free to use and runs on AUK central servers.

My experience at the Vancouver Datathon

The datathons bring together a small group of 15-20 people of varied professional backgrounds and experience to work and experiment with the Archives Unleashed Toolkit and the Archives Unleashed Cloud. I really like that the team have chosen to minimise the numbers that attend because it created a close knit working group that was full of collaboration, knowledge and idea exchange. It was a relaxed, fun and friendly environment to work in.

Day One

After a quick coffee and light breakfast, the Datathon opened with introductory talks from project team members Ian Milligan (Principal Investigator), Nick Ruest (Co-Principal Investigator) and Samantha Fritz (Project Manager), relating to the project – its goals and outcomes, the toolkit, available datasets and event logistics.

Another quick coffee break and it was back to work – participants were asked to think about the datasets that interested them, techniques they might want to use and questions or themes they would like to explore and write these on sticky notes.

Once placed on the white board, teams naturally formed around datasets, themes and questions. The team I was in consisted of  Kathleen Reed and Ben O’Brien  and formed around a common interest in exploring the First Nations and Indigenous communities dataset.

Virtual Machines were kindly provided by Compute Canada and available for use throughout the Datathon to run AUT, datasets were preloaded onto these VMs and a number of derivative files had already been created. We spent some time brainstorming, sharing ideas and exploring datasets using a number of different tools. The day finished with some informative lightning talks about the work participants had been doing with web archives at their home institutions.

Day Two

On day two we continued to explore datasets by using the full text derivatives and running some NER and performing key word searches using the command line tool Grep. We also analysed the text using sentiment analysis with the Natural Language Toolkit. To help visualise the data, we took the new text files produced from the key word searches and uploaded them into Voyant tools. This helped by visualising links between words, creating a list of top terms and provides quantitative data such as how many times each word appears. It was here we found that the word ‘letter’ appeared quite frequently and we finalised the dataset we would be using – University of British Columbia – bc-hydro-site-c.

We hunted down the site and found it contained a number of letters from people about the BC Hydro Dam Project. The problem was that the letters were in a table and when extracted the data was not clean enough. Ben O’Brien came up with a clever extraction solution utilising the raw HTML files and some script magic. The data was then prepped for geocoding by Kathleen Reed to show the geographical spread of the letter writers, hot-spots and timeline, a useful way of looking at the issue from the perspective of engagement and the community.

Map of letter writers.

Time Lapse of locations of letter writers. 

At the end of day 2 each team had a chance to present their project to the other teams. You can view the presentation (Exploring Letters of protest for the BC Hydro Dam Site C) we prepared here, as well as the other team projects.

Why Web Archives Matter

How we preserve, collect, share and exchange cultural information has changed dramatically. The act of remembering at National Institutes and Libraries has altered greatly in terms of scope, speed and scale due to the web. The way in which we provide access to, use and engage with archival material has been disrupted. All current and future historians who want to study the periods after the 1990s will have to use web archives as a resource. Currently issues around accessibility and usability have lagged behind and many students and historians are not ready. Projects like Archives Unleashed will help to furnish and equip researchers, historians, students and the community with the necessary tools to combat these problems. I look forward to seeing the next steps the project takes.

Archives Unleashed are currently accepted submissions for the next Datathon in March 2019, I highly recommend it.

Higher Education Archive Programme Network Meeting on Research Data Management

On 22nd June 2018 I attended the Higher Education Archive Programme (#HEAP) network meeting on Research Data Management (RDM) at the National Archives at Kew Gardens. This allowed me to learn about some of the current thinking in research data management from colleagues and peers currently working in this area through hearing about their own personal experiences.

The day consisted of a series of talks from presenters with a variety of backgrounds (archivists, managers, PhD students) giving their experiences of RDM from their different perspectives (design/implementation of systems, use). I will aim to briefly summarise the main message from a few of them. This was followed by a question and answers session and concluded with a workshop run by John Kaye from JISC.

Having had very little exposure to RDM in my career, it was a great way for me to understand what it is and what is being done in this sector. I have undertaken quantitative research myself during my PhD and so have an understanding of how research data is created, but until my recent move into the archival profession, I rather foolishly gave little thought as to how this data is managed. Events like this help to make people aware of the challenges archivists, information professionals and researchers face.

What is HEAP?

The Higher Education Archive Programme (#HEAP) is part of The National Archives’ continuing programme of engagement and sector support with particular archival constituencies. It is a mixture of strategic and practical work encompassing activity across The National Archives and the wider sector including guidance and training, pilot projects and advocacy. They also run network meetings for anyone involved in university archives, special collections and libraries with a variety of themes.

What is Research Data Management?

Susan Worrall, from University of Birmingham, started the day by explaining to us, what is research data management and why is it of interest to archivists? Put simply, it is the organisation, structuring, storage, care and use of data generated by research. It is important to archivists as these are all common themes of digital archiving and digital preservation, therefore, it suffers from similar issues, such as:

  • Skills gap in the sector
  • Fear of the unknown
  • Funding issues
  • Training

She presented a case study using a Brain imaging experiment, which highlighted the challenges of consent and managing huge amounts of highly specialised data. There are, however, opportunities for archivists; RDM and digital archiving are two sides of the same coin, digital archivists already do a lot of the RDM processes and so have many transferable skills. Online training is also available, University of Edinburgh and The University of North Carolina at Chapel Hill collaborated to create a course on Coursera.

A Digital Archivist’s Perspective

Jenny Mitcham, from University of York, gave us an insight into RDM from her experience as a digital archivist. She highlighted how RDM requires skills from the Library, Archival and IT sectors. Within a department, you may have all of these skills however the roles and responsibilities are not always clear, which can cause issues. She described a fantastic project called ‘Filling the Digital Preservation Gap’ which explored the potential of archivematica for RDM. It was a finalist in the 2016 Digital Preservation Awards and more information about the project can be found on the blog.

Planning, Designing and Implementing an RDM system

Laurain Williamson, from University of Leicester, spoke about how to plan and implement a research data management service. Firstly, she described the current situation within the university and what the project brief involved. Any large scale project will require a large amount of preparation and planning, however she noted that certain elements, such as considering all viable technical solutions was incredibly time consuming, however, it was essential to get the best fit for the institution. Through interviews and case study’s they analysed the thoughts and wants from a variety of stakeholders. 

Their research community wanted:

  • Expertise
  • Knowledge about copyright/publishing
  • Bespoke advice and a flexible service.

Challenges faced by the RDM team were:

  • To manage expectations (they will never be able to do everything, so they must collaborate and prioritise their resources)
  • Last minute requests from researchers
  • Liaising with researchers at an early stage of the project is vital (helping researchers think about file formats early on to aid the preservation process).

Conclusion

Whilst RDM to a layperson may seem simple at first (save it on the cloud or a hard drive) when you delve into the archival theories of correct digital preservation, this becomes an absurdly simplified view. Managing large amounts of data from such specialised experiments (producing niche file formats) requires a huge amount of knowledge, collaboration and expertise.

(CC BY 4.0) Bryant, Rebecca, Brian Lavoie, and Constance Malpas. 2018. Incentives for Building University RDM Services. The Realities of Research Data Management, Part 3. Dublin, OH: OCLC Research. doi:10.25333/C3S62F.

Data produced by universities can be seen as a commodity. The increase in the scholarly norms for open science and sharing data puts higher emphasis on RDM. It is important for the institutions/individuals creating the data (if there is any potential future scholarly or financial gain) and also for scientific integrity (allowing others in the community to review and confirm the results). But not everyone will want to make it open and actually not all of it has or should be open; creating a system and workflow that accounts for both is vital.

An OCLC research report recently stated ‘It would be a mistake to imagine that there is a single, best model of RDM service capacity, or a simple roadmap to acquiring it’. As with most things in the digital sector, this is a fast moving area and new technologies and theories are continually being developed. It will be exciting to see how these will be implemented in the future.