Engaging with Digitisation.

By Duncan Jones, Hannah Hickman and Sarah Arkle


The world we live in is changing or, really, has changed because of the internet. This has had an astounding impact on libraries, from the advent of online catalogues (although the EFL still have their card catalogue from 1989 if you want to get really old school) right the way through to having fully digitised resources like e-journals, e-books and digitised manuscripts. As a result, libraries have had to adapt to these changes in order to keep up with the demand for easy and instantaneous access to content that the internet has afforded us all.

Aquiles Alencar-Brayner from the British Library visited Oxford last week to talk about ‘Widening Access to Collections and Services’, giving examples of digital projects the British Library has embarked upon using their vast collections. The opportunity to expand access and facilitate collaborative research were two of the key driving factors behind the British Library’s digitisation work — two motivations that run through the Bodleian’s digital work as well. This was an interesting experience for the trainees in attendance, as we had recently had a session with some of the Bodleian Digital Library Systems & Services (BDLSS) team regarding E-Developments within the Bodleian Libraries.

Getting down and digital at the Bodleian Libraries  

In our acronym-filled E-Developments training session a few weeks ago, we were talked through a number of different projects, including the Digital Manuscript Toolkit, EEBO-TCP, and Sprint for Shakespeare, by some of the team from the BDLSS.

The Digital Manuscript Toolkit (DMT for short) project aims to create new ways to use digital manuscripts; instead of having a static catalogue of images to simply look at, the intention is to build a programme that would allow the manuscripts to be used, developed, and repurposed. The DMT looks to be an incredibly exciting and rich way to approach digitised material – we were given a run-through of some of the ways it hopes to compare different editions of the same text or different manuscripts from the same workshop, or to bring together dispersed or fragmented manuscripts across international collections, or change the sequences of the leaves… Importantly, the DMT project will involve its users from the beginning, offering mini-grants to scholars to discover the desiderata before developing functionality. The toolkit works to the standard of the International Image Interoperability Framework which promotes access to digital resources through the use of Linked Data to share information across collections. Another project focused on the digitisation of more recent, ephemeral material, is the John Johnson Archive. Again, the delivery focused on the users, with the ability to create a lightbox allowing you to store images in a personal collection, but the site also has a curated side, with six themes drawn out from the items ranging from “popular print” to “crimes, murders, and executions”. The accompanying blogs are here and well worth reading.

 Not part of the DMT project, but from a Bodleian manuscript (Bodley 764). From the BodLibs Pinterest page

Not part of the DMT project, but from a Bodleian manuscript (Bodley 764). From the BodLibs Pinterest page

EEBO-TCP (or Early English Books Online Text Creation Partnership) is a joint venture between the University of Oxford, University of Michigan, and ProQuest, which aims to produce a fully transcribed and searchable database of every unique title in the English language early modern corpus, using TEI (Text Encoding Initiative) guidelines to tag certain structural features: stage directions, epilogues, letters, and so on. It’s kind of a mind-blowingly huge project – especially since OCR (Optical Character Recognition) could not be used due to the variation and complexity of the texts, so only human transcribers could be used — and will have a huge impact on the use value of an already highly used resource. The first phase of texts (a whopping 25,363 of them) will become freely available to the public as of January 2015.

Sprint for Shakespeare was a 2013 crowdsourcing project to find money to fund the digitisation of an unused First Folio. Unused?, I hear you cry. Well, unstudied. The First Folio in this case is the only Folio extant with its original binding – which is so fragile that it cannot be given to readers or even sent to conservation. So, the target of Sprint for Shakespeare was to raise enough money to digitise the text and make it accessible online in order for it to get the scholarly attention it deserves — and they managed it. The digitisation itself was publicly funded, and the release of the corresponded full text funded privately. you can check it out here. As for Pip’s ‘Elephant’ – that is a pun that would lose its comedy in the explanation. But trust me, it’s solid library humour.

Imaging technology in use -- not an instrument of torture (from Sprint for Shakespeare) http://shakespeare.bodleian.ox.ac.uk/files/2012/07/Grazer.reduced.jpg
Imaging technology in use — not an instrument of torture (from Sprint for Shakespeare)

What the British Library did next…

In comparison to this, although the British Library projects were about encouraging collaborative research, it was not necessarily with such an academic thrust. Where the Bodleian E-Developments were very much working toward improving scholarly research opportunities, the British Library’s projects seemed more focused on creating and encouraging collaborative research communities without an obvious agenda for the participants they crowdsourced.  With one project, they released maps on Twitter to be Georeferenced by anybody who wanted to be involved – in less than a week the maps were complete.  Crowdsourcing is fast becoming a solid means of making things happen and it’s amazing to see how many volunteers you can get for digitisation projects if you merely ask.  Another successful use of crowdsourcing was for the British Library’s Europeana 1914-1918 exhibition, where people brought their own untold stories of wartime to life through artefacts, letters and other ephemera which the British Library have turned into a digital collection.

Another really quite delightful project is the Mechanical Curator, which uses a code to find images from early modern British Library holdings at random and posts them online with metadata to contextualise exactly what it is the ‘curator’ has decided to spit up at us.  There is no obvious scholarly agenda behind this website. It’s a tumblr site – hardly something to be referenced in an academic context – but it reaches out and brings libraries forward into the 21st century by connecting with a new generation of potential library users – the all tweeting tumblr- pinterest-post-myspace generation who might learn something interesting from following something like the curator online.  This is something that is really great to see an institution such as the British Library working on, because it really shows the value that libraries still maintain in society, and it’s also just very impressive in terms of innovation.  The Mechanical Curator may seem gimmicky to some, but it’s improving access to collections that people will otherwise never be able to see – a share here, retweet there and before you know it, more people have seen a digitisation of an image from an 18th Century book in 24 hours than have seen it in real life in 10 years, hypothetically.  An interesting side effect of this sort of crowdsourcing is the curation which users undertake themselves when they assign tags in order to group relevant images together.  In this way, digital collections spring up and increase the opportunities for discovery and exploration.

An image from the Mechanical Curator - Not an acceptable way to treat a book! (http://41.media.tumblr.com/c80cbf35db05df103820a910995ef78e/tumblr_nfx68gpxOa1sjzy3lo1_1280.jpg)
An image from the Mechanical Curator – Not an acceptable way to treat a book!

Back to the Bod – the innovative use of Crowdsourcing to widen access to collections. 

The Bodleian are also utilising crowdsourcing as a tool to widen access to collections, which is actually very much in sync with what the British Library projects are trying to achieve. Using the ‘citizen science web portal’ known as Zooniverse, the Bodleian has been finding volunteers to help transcribe their vast collection of music scores to make them more accessible – publicly. 38, 127 sheets of music have been transcribed so far, which amounts to 44% of the collection.

Where do we go from here? Issues with digital-born content and the future….

However, when working with digital content, certain issues do arise that are not so evident with physical items. One of the topics which Aquiles raised during his talk was the challenge that information services face in handling so called ‘digital-born’ content in an archival context. Historically, a writer’s papers might contain multiple drafts of a piece with opportunities for a scholar interested in analysing the creative process.  Word selections and alterations could be seen in scored out lines, with arrows denoting the rearrangement of sentences or paragraphs.  A copy of this digital document would only show the polished product and not the processes of collaborative revision (although perhaps Google has all of that filed away somewhere?).  That sort of data can be extracted from computers using forensic software but there are privacy implications in its analysis.

Furthermore, it is easy to imagine that websites and digital data have a sort of timeless permanence on the internet but this is not the case.  Hosting all of this data requires physical computer servers and the reality is that something like 75% of web pages are deleted within a year of creation.  Many others will alter hugely in style and content over that time period.  The Internet Archive is a project which takes snapshots of any and every web-page at set intervals and offers, amongst other things, opportunities to see how media coverage changes on a day by day basis around a significant event or simply to access material on a site which is no longer actively hosted.  The Bodleian has a focused project called the Bodleian Libraries Web Archive which is concerned with archiving the University’s pages and also anything that could be considered useful to research.  These sorts of projects and technologies are really still in their infancy with little idea of their usefulness in later research.  It would be interesting to know if future generations consider that we have succeeded in preserving records of our digital lives.

It’s really great to be given the opportunity to learn about these projects and see that the work that academic libraries do is not always with the view of keeping things locked away strictly for those privileged enough to go to university. There appears to be one goal underpinning all these projects, and that’s access – making information freely available to as many people as possible, utilising as many people as possible to do this along the way, connecting people, collections and information, creating a huge research community, and kind of just making the world a better place as a result. Moreover, from a career perspective it is also very interesting as few of this year’s trainees can really remember a time when they didn’t have access to the internet. it is therefore easy enough to feel like we live in a society where we can’t really go that much further with digital developments, but this is far from the case, and it is an exciting time to be coming into librarianship as a career as there’s no real certainty regarding where we can go in the future – the possibilities may well be endless.

More information…

Listen to Aquiles Alencar-Brayner’s talk for yourself here

More about digitisation at the British Library

BDLSS website – general enquiries to digitalsupport@bodleian.ox.ac.uk

BIALL, CLSIG, SLA Europe Open Day 2013 part 1

Kat Steiner here again, one of the graduate trainees at the Bodleian Law Library. On Wednesday, Frankie Marsden and I headed down to London for the BIALL, CLSIG, SLA Europe Open Day, a day of presentations and tours based at the CILIP headquarters near Russell Square. We thought we’d give you a few of our thoughts on the day, especially on what we individually will take away from it.

A few acronym explanations before we start. BIALL is the British and Irish Association of Law Librarians, CILIP is the Chartered Institute of Library and Information Professionals, CLSIG is a special interest group within CILIP standing for Commercial, Legal and Scientific Information Group, and SLA Europe is the European and UK division of the Special Libraries Association. Still with me? Just the names alone were a lot to take in!

Copyright Wellcome Library
The Wellcome Library

Over the day, we heard 9 speakers, whose places of work included London law firms, the Law library of City University, the Wellcome Library, the British Medical Association, the Inner TempleLinex (a company offering current awareness tools and aggregation for subscribers), and the British Library. It was fascinating to hear the stories of how they had reached their current jobs (often by a combination of luck, enthusiasm and perseverance), and their varied positions. It particularly stood out to me how many people mentioned TFPL, a recruitment agency, as being invaluable in helping them find jobs. I hadn’t heard of them, but I will definitely be looking into them now!

There was also the opportunity to go on a tour of either the Wiener Library, a collection for the study of the holocaust & genocide, the library of the London School of Hygiene & Tropical Medicine, or the library of the Institute of Advanced Legal Studies. As Law Bod trainees, Frankie and I both chose the IALS, and enjoyed a detailed tour and talk by David Gee, the Deputy Librarian. As the library takes three graduate trainees every year, he had a lot of insight and suggestions for what to do afterwards if you are thinking of going into law librarianship.

Several speakers were also from law firm libraries, or law librarians in other institutions, and it was very interesting to hear about their jobs in detail. I hadn’t personally thought much about specialising, or moving away from academic librarianship (I’m hoping to stay at the Bodleian while I do my library school masters), but there definitely seemed to be a lot to recommend ‘special libraries’. The chance to do real legal research was very attractive to me as an academic challenge (at the Law Bod, students are expected to do their own research, although there are lots of classes to help them learn how to do it). However, I’m not sure I could cope with the increased pressure, longer hours and difficult deadlines that come along with it. The rather better pay might sweeten the pill, though.

Copyright Inner Temple Library
The Inner Temple Library

The talk that really stood out for me was from Simon Barron, a Project Analyst at the British Library. He focused on the concept of  ‘digital librarians’, and the way that technology is transforming the information profession and will continue to do so. In the days of ‘big data‘ (a current buzzword that I’m still not hugely clear on – in my understanding, it can mean data sets so large that they allow statistical programs to crunch through them and draw remarkably accurate conclusions without any attempt at explaining how the causation between the conclusions and the data works), librarians who can code, use technology, and be willing to learn new technological skills will be more and more in demand. He described his current project with the British Library and the Qatar Foundation to create a digital National Library of Qatar. This is an ambitious project, involving huge numbers of documents to be digitised, including 14th- and 15th-century Arabic manuscripts. Simon’s job seemed to involve a lot of technological problem-solving, for example ‘how do we get this data out of this piece of software and into this other piece of software without losing it, or having to do it by hand’. He explained that his coding knowledge was entirely self-taught through Codecademy and that, although he didn’t consider it his crowning achievement, his colleagues were still very impressed when he made a spreadsheet where the boxes change colour depending on the data you enter.

Simon’s talk made a big impression on me, and really confirmed my feeling that the MSc in Information Science is for me. I have some basic experience with coding good practice (a 10-week internship at a software company, writing code in Perl), and the main thing I took away is that it’s really not that hard or scary, it just requires logic, perseverance (read: stubbornness even when it doesn’t work), and the willingness to have a go even if you’re not sure what you’re doing. I believe anyone who really wants to can learn to use technology, but they may not see the point. Simon emphasised the use of technology to automate what would be fairly simple human processes. This is a great point – if you can automate a simple action on a computer (for example, removing formatting from a text file, or averaging each row in a spreadsheet), you not only save time, you make the process scaleable to much larger sets of data, which would take humans far too long to deal with, and you reduce the possibility of human error, as long as your code actually works!

Anyway, you can see that this made quite an impression. Another thing I will take away is how many things are worth joining to get more involved in the information profession. You can join CILIP for £38 a year if you’re a student or graduate trainee, definitely worth doing! You can join SLA (of which SLA Europe is a chapter) for $40 a year if you’re a student (even part-time, but I’m not sure about graduate trainees). You can join BIALL for £17 a year if you are a full-time student. You might want to consider registering with TFPL. SLA Europe offers an Early Career Conference Award, which three of the speakers had won, allowing them to go to amazing conferences in San Diego, Chicago and Philadelphia. BIALL also offers an award for the best library school dissertation on a legal topic. And, finally, Information Architect is a job title it might be worth looking out for.

That’s pretty much all I have to say for this post (I’ve waffled for more than long enough). Frankie will be talking about the aspects of the day that she really liked, and I’m sure they will be very different! I just want to thank everyone who helped organise the conference – it gave me loads to think about, allowed me to meet plenty of other graduate trainees, and generally have a great time. For anyone who wants a more general idea of the day – the slides from the presentations that everyone gave can be found on the CLSIG website.

Exploring the Libraries of London

On July 11th, the graduate trainees visited some of the interesting and eclectic libraries in London. We each attended a tour and talk at two of the following: the London Library, the Guardian Newspaper Library, the British Library and the Natural History Museum Library and Archives. I am pleased to report that everyone enjoyed the visits, none of us got lost, and no one embarrassed Oxford too much…

TheThe entrance to the London Library - 14 St James's Square London Library is a members-only lending library, founded in 1841 by Thomas Carlyle. Membership is open to anyone (£445 per year), and the library is funded almost entirely by these fees. As books are never withdrawn, the constant battle for space is more acute than at most libraries. As you wander around, lost or otherwise, you notice that each extension has a distinct style, ranging from utilitarian steel shelves to wood paneling, indicative of the era it was built in. The unique classification system groups books alphabetically by subject, including, under ‘Science & Miscellany’, topics such as Love, Genius and Duelling.

The Guardian Library serves the research and fact-checking needs of the newspaper’s journalists, compiling timelines, background information and previous press coverage to support big stories. The library staff also maintains a section of the Guardian website, showcasing articles froInside the Guardian newspaper officesm this day in past years, and putting up interesting features, such as a timeline of Guardian articles about Harry Potter to celebrate the books’ 15th anniversary. The library’s physical collection amounts to one wall in the archive office, and consists mainly of dictionaries and reference works such as Who’s Who. However, the library also provides and promotes many online reference tools for reporters.

The British Library is impressively large with over 150 million items, on-site space for 1200 readers and a glass tower containing the library of King George III in the centre of the building.  The talk at the British Library was about their Greek Manuscripts Digitisation Project – its highlights, challenges and workflows.  Both the tour and the talk highlighted the collaborative, outreach and digitisation work of the library. We concluded our day with a visit to the highly recommended Writing Britain: Wastelands to Wonderlands exhibition.

Inside the Natural History Museum Library
From left to right: Rebecca (futureArch), Vicky (All Souls), Jayne (MLFL), Charlotte (Nuffield), Janine (SSL), Lizzie (RSl), Matt (Bodleian), Michelle (University Archives), Rebecca (EFL).

The Natural History Museum Library is open to the public as a reference library, but is primarily a lending library for scientists working at the NHM, who can borrow an unlimited number of books. Loans are still recorded on paper, which means the library is closed for two weeks each year while the borrowing slips are checked against the sometimes hundreds of books in scientists’ offices… The archive holds records relating to all activities of the museum, from the late 18th century to the present day, and includes such documents as accession registers of specimens received by the museum, and letters from a collector about having his arm bitten off by a cheetah.

London Library and Guardian Newspaper Library by Evelyn (Union), British Library and Natural History Museum Library by Lizzie (RSL).

British Library Study Day: Engaging through the Online World

On Thursday 17th February, we attended a study day at the British Library, along with other Bodleian staff in the social science division and representatives from LSE and the British Library. The objective was to discuss prospects and problems for social science librarians and researchers regarding engaging through the online world.

After a nice cup of tea, but sadly no biscuits, Jude England, Head of Social Sciences, gave an introduction to the day and recent developments in the BL. This included their fittingly named ‘2020 Vision’, which will see, among many others, mass participation, crowd sourcing, and supporting research for economic and social benefit.

This introduction was then followed by a talk from guest speaker Karen Phillips, the Editorial Director from Sage Publications. She began by giving a brief outline of the picture of research around the world today, highlighting a growth of investment, a changing geographical and disciplinary spread as well as a move towards new models of publishing as technology changes, including open access and multi-media websites. To illustrate these, she offered the examples of SAGE Open and SAGE Research Methods Online. Overall her talk provided a unique insight into the relationship between publishers and libraries, at times inspiring almost heated debate. It was certainly interesting to see a publisher fend off a room full of librarians!

A little later than scheduled, presentations showcasing Online Developments at the British Library began with Linda Arnold-Stratford, Lead for Management and Business Studies, introducing the new Management and Business Studies Portal, which offers remote access to the BL’s business collections.

Following this, Gill Ridgley, Lead for Sport, Sociology and Cultural Studies, gave an overview of the new site Sport and Society: examining the Summer Olympics and Paralympics through the lens of Social Science. This site, divided into subject, showcases the BL’s collections on sport and Olympics. It includes contributions from academics, publishers, archivists, politicians and BL colleagues. The site also offers links and gateways to a range of related resources.

Next Luke McKernan, Lead for Moving Image, presented Video Server, part of the current Growing Knowledge exhibition. The moving image collection works with outside providers to create a comprehensive collection of news programmes, recorded selectively across 18 channels. These video clips are searchable by subtitle text, making television as searchable as a newspaper. Video Server will become a reading room service from September 2011, though unfortunately will not be made available for remote access due to copyright issues.

To round off the BL’s presentations of their online developments, Jonnie Robinson, Lead for Sociolinguistics and Education, gave an introduction to the popular BL exhibition Evolving English: one language, many voices. Including archival sound recordings, for example of accents and dialects or children’s playground games, this page is the most visited on the BL website, with 15,000 visits or plays per month, and figures suggest Evolving English will be the most visited BL exhibition ever. The exhibition demonstrates the social, cultural and historical influences on the English language through a collection of material, such as slang dictionaries and medieval manuscripts, as well as copious sound recordings. You can even leave your own recording at one of the ‘voice banks’, either by reading a chosen passage from Mr Tickle, or a list of six words. The website also offers the chance to ‘map your voice’, and listen to those who have already contributed. For both of us, this was definitely the highlight of the day.

After lunch, Matthew Shaw, Lead Curator of North American History, reflected on the Growing Knowledge exhibition, which showcases a variety of innovative research tools. The public are invited to try out these 25 different tools, with the most popular including Mendeley, the UK Web Archive, and joVE.

The day rounded off with a workshop session, discussing prospects and problems for researchers and librarians in an online world, as well as collaborative ideas for the coalition in the future. Although this was a rather terrifying experience, it was really interesting to hear discussion on such important topics in the information profession from people who have experienced these issues first hand.

As an optional extra, we were then given a tour of the Evolving English exhibition, a lovely way to round off the day despite having to make a dash for the train. We definitely recommend a visit!

Ruth and Lauren