Engaging with Digitisation.

By Duncan Jones, Hannah Hickman and Sarah Arkle

The world we live in is changing or, really, has changed because of the internet. This has had an astounding impact on libraries, from the advent of online catalogues (although the EFL still have their card catalogue from 1989 if you want to get really old school) right the way through to having fully digitised resources like e-journals, e-books and digitised manuscripts. As a result, libraries have had to adapt to these changes in order to keep up with the demand for easy and instantaneous access to content that the internet has afforded us all.

Aquiles Alencar-Brayner from the British Library visited Oxford last week to talk about ‘Widening Access to Collections and Services’, giving examples of digital projects the British Library has embarked upon using their vast collections. The opportunity to expand access and facilitate collaborative research were two of the key driving factors behind the British Library’s digitisation work — two motivations that run through the Bodleian’s digital work as well. This was an interesting experience for the trainees in attendance, as we had recently had a session with some of the Bodleian Digital Library Systems & Services (BDLSS) team regarding E-Developments within the Bodleian Libraries.

Getting down and digital at the Bodleian Libraries  

In our acronym-filled E-Developments training session a few weeks ago, we were talked through a number of different projects, including the Digital Manuscript Toolkit, EEBO-TCP, and Sprint for Shakespeare, by some of the team from the BDLSS.

The Digital Manuscript Toolkit (DMT for short) project aims to create new ways to use digital manuscripts; instead of having a static catalogue of images to simply look at, the intention is to build a programme that would allow the manuscripts to be used, developed, and repurposed. The DMT looks to be an incredibly exciting and rich way to approach digitised material – we were given a run-through of some of the ways it hopes to compare different editions of the same text or different manuscripts from the same workshop, or to bring together dispersed or fragmented manuscripts across international collections, or change the sequences of the leaves… Importantly, the DMT project will involve its users from the beginning, offering mini-grants to scholars to discover the desiderata before developing functionality. The toolkit works to the standard of the International Image Interoperability Framework which promotes access to digital resources through the use of Linked Data to share information across collections. Another project focused on the digitisation of more recent, ephemeral material, is the John Johnson Archive. Again, the delivery focused on the users, with the ability to create a lightbox allowing you to store images in a personal collection, but the site also has a curated side, with six themes drawn out from the items ranging from “popular print” to “crimes, murders, and executions”. The accompanying blogs are here and well worth reading.

 Not part of the DMT project, but from a Bodleian manuscript (Bodley 764). From the BodLibs Pinterest page

Not part of the DMT project, but from a Bodleian manuscript (Bodley 764). From the BodLibs Pinterest page

EEBO-TCP (or Early English Books Online Text Creation Partnership) is a joint venture between the University of Oxford, University of Michigan, and ProQuest, which aims to produce a fully transcribed and searchable database of every unique title in the English language early modern corpus, using TEI (Text Encoding Initiative) guidelines to tag certain structural features: stage directions, epilogues, letters, and so on. It’s kind of a mind-blowingly huge project – especially since OCR (Optical Character Recognition) could not be used due to the variation and complexity of the texts, so only human transcribers could be used — and will have a huge impact on the use value of an already highly used resource. The first phase of texts (a whopping 25,363 of them) will become freely available to the public as of January 2015.

Sprint for Shakespeare was a 2013 crowdsourcing project to find money to fund the digitisation of an unused First Folio. Unused?, I hear you cry. Well, unstudied. The First Folio in this case is the only Folio extant with its original binding – which is so fragile that it cannot be given to readers or even sent to conservation. So, the target of Sprint for Shakespeare was to raise enough money to digitise the text and make it accessible online in order for it to get the scholarly attention it deserves — and they managed it. The digitisation itself was publicly funded, and the release of the corresponded full text funded privately. you can check it out here. As for Pip’s ‘Elephant’ – that is a pun that would lose its comedy in the explanation. But trust me, it’s solid library humour.

Imaging technology in use -- not an instrument of torture (from Sprint for Shakespeare) http://shakespeare.bodleian.ox.ac.uk/files/2012/07/Grazer.reduced.jpg
Imaging technology in use — not an instrument of torture (from Sprint for Shakespeare)

What the British Library did next…

In comparison to this, although the British Library projects were about encouraging collaborative research, it was not necessarily with such an academic thrust. Where the Bodleian E-Developments were very much working toward improving scholarly research opportunities, the British Library’s projects seemed more focused on creating and encouraging collaborative research communities without an obvious agenda for the participants they crowdsourced.  With one project, they released maps on Twitter to be Georeferenced by anybody who wanted to be involved – in less than a week the maps were complete.  Crowdsourcing is fast becoming a solid means of making things happen and it’s amazing to see how many volunteers you can get for digitisation projects if you merely ask.  Another successful use of crowdsourcing was for the British Library’s Europeana 1914-1918 exhibition, where people brought their own untold stories of wartime to life through artefacts, letters and other ephemera which the British Library have turned into a digital collection.

Another really quite delightful project is the Mechanical Curator, which uses a code to find images from early modern British Library holdings at random and posts them online with metadata to contextualise exactly what it is the ‘curator’ has decided to spit up at us.  There is no obvious scholarly agenda behind this website. It’s a tumblr site – hardly something to be referenced in an academic context – but it reaches out and brings libraries forward into the 21st century by connecting with a new generation of potential library users – the all tweeting tumblr- pinterest-post-myspace generation who might learn something interesting from following something like the curator online.  This is something that is really great to see an institution such as the British Library working on, because it really shows the value that libraries still maintain in society, and it’s also just very impressive in terms of innovation.  The Mechanical Curator may seem gimmicky to some, but it’s improving access to collections that people will otherwise never be able to see – a share here, retweet there and before you know it, more people have seen a digitisation of an image from an 18th Century book in 24 hours than have seen it in real life in 10 years, hypothetically.  An interesting side effect of this sort of crowdsourcing is the curation which users undertake themselves when they assign tags in order to group relevant images together.  In this way, digital collections spring up and increase the opportunities for discovery and exploration.

An image from the Mechanical Curator - Not an acceptable way to treat a book! (http://41.media.tumblr.com/c80cbf35db05df103820a910995ef78e/tumblr_nfx68gpxOa1sjzy3lo1_1280.jpg)
An image from the Mechanical Curator – Not an acceptable way to treat a book!

Back to the Bod – the innovative use of Crowdsourcing to widen access to collections. 

The Bodleian are also utilising crowdsourcing as a tool to widen access to collections, which is actually very much in sync with what the British Library projects are trying to achieve. Using the ‘citizen science web portal’ known as Zooniverse, the Bodleian has been finding volunteers to help transcribe their vast collection of music scores to make them more accessible – publicly. 38, 127 sheets of music have been transcribed so far, which amounts to 44% of the collection.

Where do we go from here? Issues with digital-born content and the future….

However, when working with digital content, certain issues do arise that are not so evident with physical items. One of the topics which Aquiles raised during his talk was the challenge that information services face in handling so called ‘digital-born’ content in an archival context. Historically, a writer’s papers might contain multiple drafts of a piece with opportunities for a scholar interested in analysing the creative process.  Word selections and alterations could be seen in scored out lines, with arrows denoting the rearrangement of sentences or paragraphs.  A copy of this digital document would only show the polished product and not the processes of collaborative revision (although perhaps Google has all of that filed away somewhere?).  That sort of data can be extracted from computers using forensic software but there are privacy implications in its analysis.

Furthermore, it is easy to imagine that websites and digital data have a sort of timeless permanence on the internet but this is not the case.  Hosting all of this data requires physical computer servers and the reality is that something like 75% of web pages are deleted within a year of creation.  Many others will alter hugely in style and content over that time period.  The Internet Archive is a project which takes snapshots of any and every web-page at set intervals and offers, amongst other things, opportunities to see how media coverage changes on a day by day basis around a significant event or simply to access material on a site which is no longer actively hosted.  The Bodleian has a focused project called the Bodleian Libraries Web Archive which is concerned with archiving the University’s pages and also anything that could be considered useful to research.  These sorts of projects and technologies are really still in their infancy with little idea of their usefulness in later research.  It would be interesting to know if future generations consider that we have succeeded in preserving records of our digital lives.

It’s really great to be given the opportunity to learn about these projects and see that the work that academic libraries do is not always with the view of keeping things locked away strictly for those privileged enough to go to university. There appears to be one goal underpinning all these projects, and that’s access – making information freely available to as many people as possible, utilising as many people as possible to do this along the way, connecting people, collections and information, creating a huge research community, and kind of just making the world a better place as a result. Moreover, from a career perspective it is also very interesting as few of this year’s trainees can really remember a time when they didn’t have access to the internet. it is therefore easy enough to feel like we live in a society where we can’t really go that much further with digital developments, but this is far from the case, and it is an exciting time to be coming into librarianship as a career as there’s no real certainty regarding where we can go in the future – the possibilities may well be endless.

More information…

Listen to Aquiles Alencar-Brayner’s talk for yourself here

More about digitisation at the British Library

BDLSS website – general enquiries to digitalsupport@bodleian.ox.ac.uk

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.