Category Archives: Digitisation

Catching butterflies

Archival Uncertainties: International Conference on Literary Archives at the British Library – 4 April 2016

This one-day conference focused on digital humanities, with papers from a spectrum of interested parties including academics working on digitisation projects, authors, translators, archivists and curators. I attended three panels on the day and the unifying theme was a contrary message of dispersal and amalgamation (and butterflies).

The first thing that has been dispersed or discarded is any idea of a literary canon. As plenary speaker and archivist Catherine Hobbs pointed out, scholarship now focuses less on established set texts and more on themes like “environmental literature”. Over the past few decades, in response to this, archives have collected more non-traditionally canonical literary papers but, Catherine reminded us, as archivists we can’t stop paying attention to the ways that literature continues to change. We need to keep tabs on what is going on in the literary world in order to document it, and this will include tackling new forms of experimental, avant-garde and self-published writing.

Caterpillar: Schwalbenschwanz (Raupe)

Caterpillars and collection development [By Eric Steinert – photo taken by Eric Steinert at Paussac, France, CC BY 2.5,]

As Catherine noted, it used to be easy to find the avant-garde – pretty much whoever was hanging out on the Left Bank – but now it’s up to archivists to not only collect this material, but to track it down in the first place, and not to default to the temptingly easy path of collecting only the papers of that tiny sliver of authors considered publishable by mainstream publishers.

Continue reading

What I learned in London…at the DPTP Digital Preservation Workshop!

A few months ago I applied for a scholarship through the DPC Leadership Programme to attend the DPTP 14-16 March course for those working in digital preservation: The Practice of Digital Preservation.

It was a three-day intermediate course for practitioners who wished to broaden their working knowledge and it covered a wide range of tools and information relating to digital preservation and how to apply them practically to their day-to-day work.

The course was hosted in one of the meeting rooms in the Senate House Library of the University of London, a massive Art Deco building in Bloomsbury (I know because I managed to get a bit lost between breaks!).

Senate House, University of London

The course was three full days of workshops that mixed lectures with group exercises and the occasional break. Amazingly this is the last year they’re doing it as a three day course and they’re going to compress it all into a single day next time (though everything they covered was useful, I don’t know what you’d cut to shorten it—lunch maybe?).

Each day had a different theme.

The first was on approaches to digital preservation. This was an overview of various policy frameworks and standards. The most well-known and accepted being OAIS.

No Google, not OASIS!


Oasis, Oman. Taken by Hendrik Dacquin aka loufi and licensed under CC BY 2.0.

After a brief wrestle with Google’s ‘suggestions’ let’s look at this OAIS Model and admire its weirdly green toned but elegant workflow. If you click through to Wikimedia Commons it even has annotations for the acronyms.


After introducing us to various frameworks, the day mostly focused on the ingest and storage aspect of digital preservation. It covered the 3 main approaches (bit-level preservation, emulation and migration) in-depth and discussed the pros and cons of each.

There are many factors to consider when choosing a method and depending on what your main constraint is: money, time or expertise, different approaches will be more suitable for different organisations and collections. Bit-level preservation is the most basic thing you can do. You are mostly hoping that if you ingest the material exactly as it comes, some future archivist (perhaps with pots of money!) will come along and emulate or migrate it in a way that is far beyond what your poor cash strapped institution can handle.

Emulation is when you create or acquire an environment (not the original one that your digital object was created or housed in) to run your digital object in that attempts to recreate its original look and feel.

Migration which probably works best with contemporary or semi-contemporary objects is used to transfer the object into a format that is more future-proof than its current one. This is an option that needs to be considered in the context of the technical constraints and options available. But perhaps you’re not sure what technical constraints you need to consider? Fear not!

These technical constraints were covered in the second day! This day was on ingestion and it covered file formats, useful tools and several metadata schemas. I’ve probably exhausted you with my very thorough explanation of the first day’s content (also I’d like to leave a bit of mystery for you) so I will just say that there are a lot of file formats and what makes them appealing to the end user can often be the same thing that makes a digital preservationist (ME) tear her hair out.

Thus those interested in preserving digital content have had to  develop (or beg and borrow!) a variety of tools to read, copy, preserve, capture metadata and what have you. They have also spent a lot of time thinking about (and disagreeing over) what to do with these materials and information. From these discussions have emerged various schemata to make these digital objects more…tractable and orderly (haha). They have various fun acronyms (METS, PREMIS, need I go on?) and each has its own proponents but I think everyone is in agreement that metadata is a good thing and XML is even better because it makes that metadata readable by your average human as well as your average computer! A very important thing when you’re wondering what the hell you ingested two months ago that was helpfully name bobsfile1.rtf or something equally descriptive.

The final day was on different strategies for tackling the preservation of more complex born-digital objects such as emails and databases (protip: it’s hard!) and providing access to said objects. This led to a roundup of different and interesting ways institutions are using digital content to engage readers.

There’s a lot of exciting work in this field, such as Stanford University’s ePADD Discovery:


Which allows you to explore the email archives of a collection in a user-friendly (albeit slow) interface. It also has links to the more traditional finding aids and catalogue records that you’d expect of an archive.

Or the Wellcome Library’s digital player developed by DigiratiMendel

Which lets you view digital and digitised content in a single integrated system. This includes, cover-to-cover books, as pictured above, archives, artwork, videos, audio files and more!

Everyone should check it out, it’s pretty cool and freely available for others to use. There were many others that I haven’t covered but these really stood out.

It was an intense but interesting three days and I enjoyed sharing my experiences with the other archivists and research data managers who came to attend this workshop. I think it was a good mix of theory and practical knowledge and will certainly help me in the future. Also I have to say Ed Pinsent and Steph Taylor did a great job!

Newly Digitized Arabic Astronomy Manuscript Now Online

The Bodleian Libraries’ important 12th-century copy of ʿAbd al-Raḥmān al-Ṣūfī’s Book of Fixed Stars, an illustrated Arabic treatise on the Constellations is now available online via Digital Bodleian and Fihrist.


MS. Huntington 212, folio 1r, detail

Bodleian Libraries MS. Huntington 212, an early copy of ʿAbd al-Raḥmān al-Ṣūfī‘s book Kitāb Ṣuwar al-kawākib al-thābitah or Book of the Constellations of the Fixed Stars was made in 566 AH/1170 CE for the treasury of Sayf al-Dīn Ghāzī II, Zangid Emir of Mosul, the largest city in northern Iraq. This is attested to by a gilded dedication panel on folio 1r. The panel is virtually illegible now to the naked eye as it was apparently defaced by a subsequent owner; possibly to efface the memory of a rival (see left).



The manuscript, which is part of a large collection bought by the Library in 1693 from the Orientalist Robert Huntington, is believed to be the fourth oldest surviving copy of the treatise and has recently been the object of a large scale conservation project by Robert Minte of the Conservation team at the Bodleian Libraries.

This copy’s importance and significance has increased since doubts were raised about the authenticity of the date of Bodleian Libraries MS. Marsh 144, the colophon of which states that it was made in 400 AH/1009 CE. It is likely to have been made more than 150 years later than this.

Al-Ṣūfī’s treatise was originally composed in about 964 CE and contains images of most of the 48 Classical Constellations both as they appear on the celestial sphere and on the celestial globe – each being a mirror image of the other –  together with tables of data on the position (latitude and longitude) and magnitude of each star which makes up the constellation. Al-Ṣūfī’s observations represent an advance on those made by Ptolemy in the 2nd century CE.

The Huntington Collection copy also contains two rare images of so-called Bedouin Constellations superimposed over the Ptolemaic ones, and these appear on folios 40r-40v, and also on folio 74v, where a constellation in the form of a camel appears drawn in red ink alongside the classical constellation of Andromeda  (see below).


A Bedouin Constellation in the form of a camel alongside the Classical Constellation of Andromeda.

Thanks to the conservation work done on the manuscript it is now available for scholarly study once again, and will also travel to an exhibition in New York later in 2016.

Digital.Bodleian + Wikipedia

For anyone looking to define Taijitu, Putso or Sangha, or to learn about Elizabeth Fry, the Junior wives of Krishna, or the Royal Ploughing Ceremony, one of the top internet search hits will be Wikipedia, the free encyclopedia. Articles about these, and hundreds of other topics, are now being improved using the Bodleian Libraries’ historic collections.

Images from Digital.Bodleian collection are being uploaded to Commons, the database of freely reusable digital files. From here they can be embedded in articles not just in English Wikipedia, but in other languages and in other educational projects. So far, more than six hundred articles, across many different languages, are illustrated with images from the Bodleian Libraries, reaching a total of nearly 1.5 million readers per month.

Military Insignia of the Late Roman Army (Insignia of the magister militum praesentalis. Folio 96 v of the manuscript Notitia dignitatum. Bodleian Library, MS. Canon. Misc. 378.) Licensed under CC BY 4.0 via Wikimedia Commons

Military Insignia of the Late Roman Army (Insignia of the magister militum praesentalis. Folio 96 v of the manuscript Notitia dignitatum. Bodleian Library, MS. Canon. Misc. 378.) Licensed under CC BY 4.0 via Wikimedia Commons

The Bodleian images come from many different countries and eras. The themes range from the serene watercolours of 19th century Burma (present-day Myanmar), via geometrical diagrams in an 11th century Arabic book, to the nightmarish demonic visions of the 14th century Book of Wonders.

A taste is given in an image gallery on Commons. Clicking on any of the images – here or in Wikipedia – and then on ‘More details’ will bring up a larger version, along with links and shelfmarks so that interested readers can track down the physical object.

Anyone is allowed to edit the entries for the images, for example to translate descriptions into other languages. However, these edits are monitored to make sure they respect the educational goals of the site.

This is just the start of an ongoing project: more files and more themes will be added over the next nine months. The Bodleian Libraries’ Wikimedian In Residence, Martin Poulter, welcomes enquiries – you can get in touch via the form below.


Transcribe at the arcHIVE

I do worry from time to time that textual analogue records will come to suffer from their lack of searchability when compared with their born-digital peers. For those records that have been digitised, crowd-sourcing transcription could be an answer. A rather neat example of just that is the arcHIVE platform from the National Archives of Australia. arHIVE is a pilot from NAA’s labs which allows anyone to contribute to the transcription of records. To get started they have chosen a selection of records from their Brisbane office which are ‘known to be popular’. Not too many of them just yet, but at this stage I guess they’re just trying to prove the concept works. All the items have been OCR-ed, and users can choose to improve or overwrite the results from the OCR process. There are lots of nice features here, including the ability to choose documents by a difficulty rating (easy, medium or hard) or by type (a description of the series by the looks of it). The competitive may be inspired by the presence of a leader board, while the more collaborative may appreciate the ability to do as much as you can, and leave the transcription for someone else to finish up later. You can register for access to some features, but you don’t have to either. Very nice.

-Susan Thomas