Category Archives: Digitisation

Making sense of uncertainty

On Monday 4th April 2016 I attended the International Conference on Literary Archives, held at the British Library under the heading ‘Archival Uncertainties’. The talks were insightful and varied, and generally had a theoretical rather than practical angle. This complimented the theme as it suggested that we are as yet unsure on the forms literary archives will take now and in the future, and how archivists can effectively preserve and provide access to them.

The panels I attended addressed the opportunities and issues afforded to archives in the digital world. The key ideas that came across were as follows:

  • Traditional archival descriptive fields and standards do not adequately express or represent the complexity of literary archives
    • Literary works can now take multimedia and multimodal forms. Catherine Hobbs, Literary Archivist at Library and Archives Canada, suggested that archivists need to be open to literary aesthetics in order to preserve the ‘multiple-canonical perspective’ literature is created in now. The archivist needs to be aware of the techniques used to create literary works and the technology needed to sustain it. Hobbs asserted that the exposure and publicity of a work, as well as the audiences it reaches, has changed in the digital world. Traditional archival descriptive fields no longer adequately express the content, its iterations and context.
    • Alexandra Kardoski Carter, Special Collections Librarian at the Thomas Fisher Rare Book Library spoke on the difficulty of making legacy finding aids and descriptions available using open-source archival description and access software because of their reliance on archival standards which do not effectively represent the scope of the literary archives in their collections. They also found that an intellectual structure was imposed on the material.
  • Technology enables new ways of presenting archives
    • Jeremy Boggs and Purdom Lindblad from the University of Virginia asserted that content management systems don’t always fit the material they should contain. They introduced the use of rich-prospect browsing as a way of presenting digital literary archives. This approach presents a whole collection through a representation of every item which can then be organised by the researcher.
  • Digital archives enable enhanced disaster planning
    • Having multiple digital copies of a work in different locations safeguards the work from being lost. Emmanuela Carbé from the University of Pavia said the university kept two encrypted copies of selected digital works in Pavia, and another copy over ninety kilometres away. This doesn’t help if the formats they are kept on become unreadable though – only migrating the digital object into a stable format will do that.
  • The original experience of a work can be lost in migration and emulation
    • Dene Grigar and John Barber from Washington State University both argued that digital archives do not always provide an authentic rendering of works. For me personally, this conjured up suppressed memories of studying Roland Barthes’ ‘The Death of the Author’. I would argue that as soon as a work is presented in the public sphere, the intended meaning of a work is lost and everything is open to interpretation and re-interpretation. Kate Pullinger from Bath Spa University suggested that not all works are produced for posterity, and in fact there may be something ‘beguiling’ about a lost work. While this may be so, and indeed there will always be a number of works that are lost because of their volume and our limitations, I do believe that if we preserve a work that at one stage was envisaged as ephemeral, this simply adds to the enduring lifecycle and meaning of the work rather than takes anything away.
  • Online projects bring together archives and expertise that could not be brought together physically
    • Members of the Victorian Lives and Letters Consortium, an online project to create interactive digital archives of Victorian life writing, affirmed that their digital collection of Victorian archives could never have been consulted together in one space.  The collaborative nature of the project meant that resources could be shared and thus they could afford to do more and had the skillset to do it. They also asserted that each project has required new ways of thinking and presenting archives and digital initiatives have enabled them to continually adapt and progress in how they appropriately display the content.
  • Digitisation boosts awareness and visibility of the collection
    • Digitisation projects can provide remote access to collections meaning they can be viewed by a wider audience than could ever visit. It can also provide a surrogate for fragile or particularly-valuable items. Further, even if a collection cannot be made available online in full, significant parts of it can be which will give an idea of what the collection consists of. Anna St.Onge from York University, Toronto spoke about a project to digitise selected parts of the archives of Lady Victoria Welby. St.Onge consciously chose what she believed to be interesting parts of the archive, thereby moving away from the traditional view of archivists being impartial and instead attempting to actively shape and inspire research and interest in the archive. This was interesting as it showed how archiving is evolving and responding to new technology.

As is evident from the above, the conference gave me a lot to think about and broadened my knowledge of current digital initiatives as well as uncertainties surrounding how to keep digital archives. It is certainly an exciting time to be involved in archival practice as it attempts to move forward with technological advances.

Catching butterflies

Archival Uncertainties: International Conference on Literary Archives at the British Library – 4 April 2016

This one-day conference focused on digital humanities, with papers from a spectrum of interested parties including academics working on digitisation projects, authors, translators, archivists and curators. I attended three panels on the day and the unifying theme was a contrary message of dispersal and amalgamation (and butterflies).

The first thing that has been dispersed or discarded is any idea of a literary canon. As plenary speaker and archivist Catherine Hobbs pointed out, scholarship now focuses less on established set texts and more on themes like “environmental literature”. Over the past few decades, in response to this, archives have collected more non-traditionally canonical literary papers but, Catherine reminded us, as archivists we can’t stop paying attention to the ways that literature continues to change. We need to keep tabs on what is going on in the literary world in order to document it, and this will include tackling new forms of experimental, avant-garde and self-published writing.

Caterpillar: Schwalbenschwanz (Raupe)

Caterpillars and collection development [By Eric Steinert – photo taken by Eric Steinert at Paussac, France, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=338409]

As Catherine noted, it used to be easy to find the avant-garde – pretty much whoever was hanging out on the Left Bank – but now it’s up to archivists to not only collect this material, but to track it down in the first place, and not to default to the temptingly easy path of collecting only the papers of that tiny sliver of authors considered publishable by mainstream publishers.

Continue reading

What I learned in London…at the DPTP Digital Preservation Workshop!

A few months ago I applied for a scholarship through the DPC Leadership Programme to attend the DPTP 14-16 March course for those working in digital preservation: The Practice of Digital Preservation.

It was a three-day intermediate course for practitioners who wished to broaden their working knowledge and it covered a wide range of tools and information relating to digital preservation and how to apply them practically to their day-to-day work.

The course was hosted in one of the meeting rooms in the Senate House Library of the University of London, a massive Art Deco building in Bloomsbury (I know because I managed to get a bit lost between breaks!).

Senate House, University of London

The course was three full days of workshops that mixed lectures with group exercises and the occasional break. Amazingly this is the last year they’re doing it as a three day course and they’re going to compress it all into a single day next time (though everything they covered was useful, I don’t know what you’d cut to shorten it—lunch maybe?).

Each day had a different theme.

The first was on approaches to digital preservation. This was an overview of various policy frameworks and standards. The most well-known and accepted being OAIS.

No Google, not OASIS!

Oman-Oasis

Oasis, Oman. Taken by Hendrik Dacquin aka loufi and licensed under CC BY 2.0.

After a brief wrestle with Google’s ‘suggestions’ let’s look at this OAIS Model and admire its weirdly green toned but elegant workflow. If you click through to Wikimedia Commons it even has annotations for the acronyms.

OAIS-

After introducing us to various frameworks, the day mostly focused on the ingest and storage aspect of digital preservation. It covered the 3 main approaches (bit-level preservation, emulation and migration) in-depth and discussed the pros and cons of each.

There are many factors to consider when choosing a method and depending on what your main constraint is: money, time or expertise, different approaches will be more suitable for different organisations and collections. Bit-level preservation is the most basic thing you can do. You are mostly hoping that if you ingest the material exactly as it comes, some future archivist (perhaps with pots of money!) will come along and emulate or migrate it in a way that is far beyond what your poor cash strapped institution can handle.

Emulation is when you create or acquire an environment (not the original one that your digital object was created or housed in) to run your digital object in that attempts to recreate its original look and feel.

Migration which probably works best with contemporary or semi-contemporary objects is used to transfer the object into a format that is more future-proof than its current one. This is an option that needs to be considered in the context of the technical constraints and options available. But perhaps you’re not sure what technical constraints you need to consider? Fear not!

These technical constraints were covered in the second day! This day was on ingestion and it covered file formats, useful tools and several metadata schemas. I’ve probably exhausted you with my very thorough explanation of the first day’s content (also I’d like to leave a bit of mystery for you) so I will just say that there are a lot of file formats and what makes them appealing to the end user can often be the same thing that makes a digital preservationist (ME) tear her hair out.

Thus those interested in preserving digital content have had to  develop (or beg and borrow!) a variety of tools to read, copy, preserve, capture metadata and what have you. They have also spent a lot of time thinking about (and disagreeing over) what to do with these materials and information. From these discussions have emerged various schemata to make these digital objects more…tractable and orderly (haha). They have various fun acronyms (METS, PREMIS, need I go on?) and each has its own proponents but I think everyone is in agreement that metadata is a good thing and XML is even better because it makes that metadata readable by your average human as well as your average computer! A very important thing when you’re wondering what the hell you ingested two months ago that was helpfully name bobsfile1.rtf or something equally descriptive.

The final day was on different strategies for tackling the preservation of more complex born-digital objects such as emails and databases (protip: it’s hard!) and providing access to said objects. This led to a roundup of different and interesting ways institutions are using digital content to engage readers.

There’s a lot of exciting work in this field, such as Stanford University’s ePADD Discovery:

ePADD

Which allows you to explore the email archives of a collection in a user-friendly (albeit slow) interface. It also has links to the more traditional finding aids and catalogue records that you’d expect of an archive.

Or the Wellcome Library’s digital player developed by DigiratiMendel

Which lets you view digital and digitised content in a single integrated system. This includes, cover-to-cover books, as pictured above, archives, artwork, videos, audio files and more!

Everyone should check it out, it’s pretty cool and freely available for others to use. There were many others that I haven’t covered but these really stood out.

It was an intense but interesting three days and I enjoyed sharing my experiences with the other archivists and research data managers who came to attend this workshop. I think it was a good mix of theory and practical knowledge and will certainly help me in the future. Also I have to say Ed Pinsent and Steph Taylor did a great job!

Newly Digitized Arabic Astronomy Manuscript Now Online

The Bodleian Libraries’ important 12th-century copy of ʿAbd al-Raḥmān al-Ṣūfī’s Book of Fixed Stars, an illustrated Arabic treatise on the Constellations is now available online via Digital Bodleian and Fihrist.

sufif1rdetail

MS. Huntington 212, folio 1r, detail

Bodleian Libraries MS. Huntington 212, an early copy of ʿAbd al-Raḥmān al-Ṣūfī‘s book Kitāb Ṣuwar al-kawākib al-thābitah or Book of the Constellations of the Fixed Stars was made in 566 AH/1170 CE for the treasury of Sayf al-Dīn Ghāzī II, Zangid Emir of Mosul, the largest city in northern Iraq. This is attested to by a gilded dedication panel on folio 1r. The panel is virtually illegible now to the naked eye as it was apparently defaced by a subsequent owner; possibly to efface the memory of a rival (see left).

 

 

The manuscript, which is part of a large collection bought by the Library in 1693 from the Orientalist Robert Huntington, is believed to be the fourth oldest surviving copy of the treatise and has recently been the object of a large scale conservation project by Robert Minte of the Conservation team at the Bodleian Libraries.

This copy’s importance and significance has increased since doubts were raised about the authenticity of the date of Bodleian Libraries MS. Marsh 144, the colophon of which states that it was made in 400 AH/1009 CE. It is likely to have been made more than 150 years later than this.

Al-Ṣūfī’s treatise was originally composed in about 964 CE and contains images of most of the 48 Classical Constellations both as they appear on the celestial sphere and on the celestial globe – each being a mirror image of the other –  together with tables of data on the position (latitude and longitude) and magnitude of each star which makes up the constellation. Al-Ṣūfī’s observations represent an advance on those made by Ptolemy in the 2nd century CE.

The Huntington Collection copy also contains two rare images of so-called Bedouin Constellations superimposed over the Ptolemaic ones, and these appear on folios 40r-40v, and also on folio 74v, where a constellation in the form of a camel appears drawn in red ink alongside the classical constellation of Andromeda  (see below).

sufi74v

A Bedouin Constellation in the form of a camel alongside the Classical Constellation of Andromeda.

Thanks to the conservation work done on the manuscript it is now available for scholarly study once again, and will also travel to an exhibition in New York later in 2016.

Digital.Bodleian + Wikipedia

For anyone looking to define Taijitu, Putso or Sangha, or to learn about Elizabeth Fry, the Junior wives of Krishna, or the Royal Ploughing Ceremony, one of the top internet search hits will be Wikipedia, the free encyclopedia. Articles about these, and hundreds of other topics, are now being improved using the Bodleian Libraries’ historic collections.

Images from Digital.Bodleian collection are being uploaded to Commons, the database of freely reusable digital files. From here they can be embedded in articles not just in English Wikipedia, but in other languages and in other educational projects. So far, more than six hundred articles, across many different languages, are illustrated with images from the Bodleian Libraries, reaching a total of nearly 1.5 million readers per month.

Military Insignia of the Late Roman Army (Insignia of the magister militum praesentalis. Folio 96 v of the manuscript Notitia dignitatum. Bodleian Library, MS. Canon. Misc. 378.) Licensed under CC BY 4.0 via Wikimedia Commons

Military Insignia of the Late Roman Army (Insignia of the magister militum praesentalis. Folio 96 v of the manuscript Notitia dignitatum. Bodleian Library, MS. Canon. Misc. 378.) Licensed under CC BY 4.0 via Wikimedia Commons

The Bodleian images come from many different countries and eras. The themes range from the serene watercolours of 19th century Burma (present-day Myanmar), via geometrical diagrams in an 11th century Arabic book, to the nightmarish demonic visions of the 14th century Book of Wonders.

A taste is given in an image gallery on Commons. Clicking on any of the images – here or in Wikipedia – and then on ‘More details’ will bring up a larger version, along with links and shelfmarks so that interested readers can track down the physical object.

Anyone is allowed to edit the entries for the images, for example to translate descriptions into other languages. However, these edits are monitored to make sure they respect the educational goals of the site.

This is just the start of an ongoing project: more files and more themes will be added over the next nine months. The Bodleian Libraries’ Wikimedian In Residence, Martin Poulter, welcomes enquiries – you can get in touch via the form below.

.

Transcribe at the arcHIVE

I do worry from time to time that textual analogue records will come to suffer from their lack of searchability when compared with their born-digital peers. For those records that have been digitised, crowd-sourcing transcription could be an answer. A rather neat example of just that is the arcHIVE platform from the National Archives of Australia. arHIVE is a pilot from NAA’s labs which allows anyone to contribute to the transcription of records. To get started they have chosen a selection of records from their Brisbane office which are ‘known to be popular’. Not too many of them just yet, but at this stage I guess they’re just trying to prove the concept works. All the items have been OCR-ed, and users can choose to improve or overwrite the results from the OCR process. There are lots of nice features here, including the ability to choose documents by a difficulty rating (easy, medium or hard) or by type (a description of the series by the looks of it). The competitive may be inspired by the presence of a leader board, while the more collaborative may appreciate the ability to do as much as you can, and leave the transcription for someone else to finish up later. You can register for access to some features, but you don’t have to either. Very nice.

-Susan Thomas