The Bodleian First Folio on tour

Happy birthday to Shakespeare, 451 today!

FF-portrait-blogWhile our friends at the Folger Shakespeare Library are planning their nationwide First Folio tour, the Bodleian First Folio of Shakespeare’s plays (Arch. G c.7) has been the subject of a series of international presentations and discussions. Thanks to the generosity of the public, the Bodleian First Folio was conserved, digitized, and published online, open to anyone in the world with an internet connection.

The geography of the digital resource’s readers is widening, as word of it spreads. Public lecture venues have included Perth and Sydney (during a Short Stay Visiting Fellowship at the University of Western Australia’s Institute of Advanced Studies), and Oxford. The long history of this copy of the First Folio was the subject of research seminars at the University of Edinburgh, and at the Shakespeare Association of America conference in Vancouver, where it was considered alongside other digital resources including the excellent Meisei Shakespeare Folios Electronic Library.

The intersection of the digital and material in the First Folio has been the subject of experimentation in a collaboration between Pip Willcox and the Oxford e-Research Centre‘s Professor David De Roure. The results are discussed in a forthcoming article, and in two conference papers: at the University of Southampton’s Physical Archives in the Digital Age conference, Chawton House; and at the National Library of Ireland Galway‘s upcoming Digital Material conference.

David De Roure will be presenting on the long history of social machines of the First Folio at a Scholarly Communications Workshop Focusing on the Humanities in Boston next month.De-Roure-Willcox-materiality

You can see the Bodleian First Folio in the Weston Library‘s Marks of Genius exhibition, which is free and open to the public daily.

—Pip Willcox

ORA-Data: managing research data

oraWe’re pleased to announce that depositing in ORA-Data will now allow researchers and the University to comply with the EPSRC’s (Engineering and Physical Sciences Research Council) policy about research data management, which comes into effect from 1 May 2015.

The Bodleian Libraries recently launched a new service for the University, the Oxford Research Archive for Data (ORA-Data). A digital repository and catalogue for research data, ORA-Data offers a service to archive and enable the discovery, citation and sharing of data produced by researchers at Oxford.

ORA-Data is aimed especially at researchers who wish:

  • to deposit data that underpins publications
  • to deposit data that their funding body requires be preserved and made accessible
  • to add a record to the University’s catalogue of data

Any type of digital research data, from across all academic disciplines, may be deposited in ORA-Data, and we accept any file format. A permanent catalogue record is created for all data deposited in ORA-Data and a persistent unique identifier generated (a DOI, or Digital Object Identifier), which allows the dataset to be cited. Data and records will be discoverable through Google and other search engines, maximising the visibility and impact of the research. Researchers can also choose to set an embargo period for their files if they wish.

ORA-Data is running as a free pilot until July 2015. We’re keen for users to try it out, and would welcome any feedback to help us improve the service. ORA-Data can be accessed via the main ORA website: just select ‘Contribute’ followed by the ‘Data’ link. Our ‘How to deposit’ guide is available via our LibGuide and the ORA-Data team can be contacted at: – we would love to hear from anyone interested to find out more.

—Amanda Flynn, Digital Scholarship Support Officer

VIPR: Virtual and physical Infrastructure Portfolio Redesign

The Virtual and physical Infrastructure Portfolio Redesign (VIPR) is a highly resilient, cost-effective, large scale, general purpose IT infrastructure that serves the whole University. The need for it has been publicly acknowledged since the Vice-Chancellor’s 2013 Oration, and the current service, built incrementally over many years, is no longer fit for purpose:

Inevitably, virtual infrastructure has entirely tangible costs, and these too must be funded. So overseeing the University’s digital investment has necessarily become a significant feature of how we now approach planning and resource allocation. A new IT Committee reports directly to Council, an indication of the priority being accorded to this area and a recognition of the importance of the digital challenge – especially in an institutional culture as decentralised and varied as ours.

The University needs larger, preservation-level storage. We need dual stack. We need big data. We need reasonable speed.

It occurred to me that the limiting factor in all the infrastructures we run was network. All of them have a network that wasn’t really quite fast enough. Wasn’t really scaleable enough. Wasn’t really big enough. Didn’t have enough ports. Didn’t have enough VLAN capability. Didn’t have a fast enough connection to the outside world. Didn’t have enough reliability. The list went on.

Why don’t I build one? I spent 3 years hanging around with a bunch of networking supremos in my last job. How hard can it be? Turns out, fairly tricky actually. But undeniably a lot of fun.

Now, when designing an infrastructure that is going to take on and “absorb” other infrastructures, the only safe way to proceed, is to list all of the requirements from all of the smaller infrastructures and create a master list (in mathematics they call it a “superset“) of all the requirements. I did this.

The big one that jumped out and bit me was the need for dual-site failover. It’s a tricky thing to get right and I’d argue very few people have done it well. I’m not even sure we’ve got it entirely right with VIPR. But I think we are clear on how we will proceed and improve it. And we’ve done the best we can thus far with the technologies available to us.

The other major thing that jumped out at me was the “big data” requirement. Just how much cross-talk bandwidth between datacentres does a big data platform (or worse, two big data platforms, talking to each other) need? The answer to that is “it depends”. It depends on how much storage, how many disks, how localized your storage nodes, or dispersed, what storage technology you need to use, whether resilience and mirroring are factors, and so on. It’s a really non-trivial exercise to factor in.

I crunched the numbers and came to the conclusion that we needed 40GE interconnections between whatever our two data centres were. The next, most obvious question was “how far can I stretch 40GE?”

Once again: “it depends”. There are two modes of operation for fibre, long haul and short haul. Long haul 40GE is really, eye-wateringly expensive. Short-haul 40GE is affordable. Short-haul goes out to 2KM. Any more than that and you need long haul.

So, then I started looking for data centres that fitted the other VIPR requirements. They were, quite simply:

·         USDC is one of them

·         The other needs to be within 2KM as the fibre runs, of the USDC

·         There has to be adequate power

Last May we started talking collaboration with Adrian from NSMS. NSMS have two big hitters on their immediate horizon: the end of the ICT rig (support ends imminently and a new home needs to be found for all those VMs) and the Integrated Communications Project, which needs its own dedicated, and rather large cluster.

But could we build an infrastructure that could run it all?

Eyes automatically fell on IT Services in Banbury Road. Despite its projected closure and relocation of services, it is still a viable DC and, the only game in town. Fibre run distance from USDC is greater than 2KM though, so we had to look into long haul 40GE optics.

By the end of September 2014, we had a network design, not one, but two homes, power, a collaborative team, budgets and a way forward. What else does a boy need?

—Ashley Woltering

ORA: more than you might think

ORA doesn’t just take work published in academic journals!  We also include working papers, reports, Oxford doctoral theses and metadata records.

Our recent work with the Transport Studies Unit (TSU) is online.

We’ve been working with Computer Science to add another 5,000 records, including 1,300 full-text papers.

The working papers from the Oxford Institute for Energy Studies (OIES) as well as their publication the Oxford Energy Forum are in the final stages of being added.

Our Polonsky Foundation-funded theses digitization project is winding up and ORA already holds these records.

—Sarah Barkla

Early English Books Hackfest


On 9 March 2015, the Bodleian Libraries welcomed some 40 eager ‘hackers’ to the Early English Books Hackfest, organized to celebrate the release of more than 25,000 texts from the Early English Books Online-Text Creation Partnership (EEBO-TCP) project into the public domain. Attendees were invited to demonstrate innovative and creative approaches either to the full dataset or a number of subsets (relating to alchemy, drama and seventeenth-century newsbooks) provided by project staff, and to apply imaginative methodologies to text or subject matter which might include an element of ‘surprise’.  The main event took place in the Weston Library’s Blackwell Hall which, only days away from the public opening of the new Library, was a hive of activity with last-minute preparations, which lent proceedings a welcome air of informality.

The day began in the Lecture Theatre with a welcome from Kathryn Eccles, the University’s Digital Humanities Champion, and introductory talks from Michael Popham and Judith Siefring of the Bodleian Digital Library. Filing back into Blackwell Hall, coffee was swiftly followed by a ‘speed dating’ exercise, orchestrated with charm and calm authority by the event’s em-cee, Liz McCarthy of Bodleian Communications. This provided an opportunity for people to pitch ideas and find collaborators, firm up projects and groups, and request (or indeed recruit) technical help as necessary. Groups settled at a number of tables and proceeded to work furiously on their respective ideas for the next four hours or so, pausing only to sample the delicious spread laid on by Paul Burrows and his team from Tailor-Made Top Nosh.  Meanwhile, EEBO-TCP staff past and present ‘hovered’, providing project expertise and technical know-how as required.  At 4pm, everybody decamped to the Lecture Theatre where representatives from each group presented their work.


Projects included: a visualization of the relative frequency of rainbow colours in the full dataset; an analysis of the structural features of a newly-created subset of fictional works; comparison of page layouts and text types in the alchemy subset; an examination of the ratio of Latinate and Germanic words used in the full dataset; a tool which facilitates word-searching according to location on a given page; mapping pre-1666 bookstalls in Paul’s Cross Churchyard in London to determine which EEBO-TCP titles were for sale by different booksellers in the area over time; the application of a semantic alignment and linking tool which asserts whether or not different terms are related; an evaluation exercise to identify an ideal public (as opposed to academic) interface for EEBO-TCP; and the creation of a narrative interactive game in which the user’s responses to the questions posed and the evidence given in the transcript of a real-life witch’s trial determines whether the accused is acquitted or burnt at the stake.

Back in Blackwell Hall, a drinks reception saw Bodley’s Librarian Richard Ovenden bravely defy the ‘flu and distribute cash prizes to winners and copies of the Marks of Genius catalogue to runners-up.  Further cash prizes have since been awarded to entrants in the EEBO-TCP Ideas Hack* which ran until Easter, a competition open to all, whether or not they could attend the Hackfest itself.

Feedback from the Hackfest has been extremely positive, and might best be summed up by Sarah Cole of TIME/IMAGE, who attended the event and created the marvellous witches game described above: “All in all, a great day with some really interesting outputs. I hope to see more HackFests from the Bodleian in the future.

—David Tomkins

* Postscript: The Ideas Hack competition has closed. The standard of entries was extremely high, and prizes were awarded as follows:

1st place (£250): ‘The Posthumous HAK – Purchas his Pilgrimes (1625) and the East India Company‘, submitted by Robert Bachelor, with Murray Ruffner, Brandon Sharpe, Colin Hancock, Keimora Ellison and Raven Williams.

2nd place (£150): ‘If Music be the food of Loue – Sonifying Drama‘, submitted by Iain Emsley.

3rd place (£50): ‘EEBO-TCP Phase I – From Open Access to Accessible and Open‘, submitted by Sjoerd Levelt.


Resource Discovery Mini-Conference

On 27 January Resource Discovery Architect Simon McLeish organized a Resource Discovery Mini-Conference at the Weston Library to bring together interested parties from across the University to share their issues with resource discovery and to discuss possible ways forward. 

Representatives from the Ashmolean and the Museum of Natural History discussed what resource discovery means in the context of their institutions, the importance of representing collections online and the difficulties of consolidating finding aids and catalogues created over long periods of time according to many different standards. 

Laura Cracknell from Pembroke College emphasized the needs of undergraduates, whose needs are for immediate access with a strong preference for physical items. E-book provision has proved valuable in extending access when all the physical copies of an item are unavailable.  She also stressed the importance of encouraging user engagement with discovery systems so as to develop their confidence.  Cathy Scutt from Education gave a vivid impression of how fragmented and confusing the resource discovery experience can be for users. 

Mike Webb and Sarah Wheale from Bodleian Special Collections provided an excellent survey of the many and varied finding aids for their material, encompassing card catalogues in both their original and digitized Flipbook forms, printed catalogues, EAD files presented as HTML pages and so on.  They emphasized several cases in which the primary finding aid for particular collections was created decades if not centuries ago and noted the importance of staff and their knowledge of the collections as a vital resource discovery asset. 

Glenn Swafford and Sebastian Rahtz discussed the Blue Pages project and its collection and re-use of information about people, departments, projects, funders, publications and their sources.  The politics of the re-use of publically available data in different contexts was a particular difficulty.

David Howell, Head of Conservation Research at the Bodleian Libraries, then discussed his resource discovery experiences and needs as a researcher and discussed how his ideal resource discovery system might work.

Simon McLeish summed up the day, identifying a number of common themes and challenges: the many and varied demands of users; the multiplicity of finding aids; how to manage and present large amounts of digitized material; varying metadata standards; the sheer volume of the University’s collections and data; and finding funding to do anything about these issues.

At time of writing funding is being sought for a short scoping study on resource discovery which it is hoped will lead on to a much wider project seeking to address the issue in depth.

—Chris Hargreaves

We welcome James Mooney

We are delighted to introduce James Mooney who joined the BDLSS team in January 2015 as a Service Engineer for the various library IT services. James was previously an IT consultant working with Internet related businesses primarily using Linux in virtual environments, although he has a wide range of IT experience. James will be working alongside the Aleph and SOLO experts and he has been busy learning all about library systems so his expertise can be put to best use.

Cultures of Knowledge

emlo_logoIn January the University of Oxford’s Cultures of Knowledge project re-launched its union catalogue of early modern correspondence, Early Modern Letters Online (EMLO), with a sleek mobile-ready interface. The re-launched site also incorporates three new catalogues contributed by the project’s partners at the Circulation of Knowledge and Learned Practices in the seventeenth-century Dutch Republic (CKCC). EMLO now contains more than 81,000 records of letters from the 16th, 17th and 18th centuries.

—Emma Stanford

Polonsky Foundation Digitization Project


The Bodleian continues to digitize incunabula and Hebrew manuscripts as part of its collaboration with the Vatican Library, which will put a total of 1.5 million pages online. On 24 February 2015, the libraries celebrated having captured 1 million images for the project. To mark this milestone, the libraries chose two particularly special items from their collections to put online. The Bodleian Library chose Auct. L 3.33, a landmark in printing history produced by Sweynheym and Pannartz, and the Vatican Library chose Stamp. Barb. II. 41, a hugely popular fifteenth-century cookbook.

—Emma Stanfordstamp.barb_.bbb_.ii_.41