Tag Archives: dpc

What I learned in London…at the DPTP Digital Preservation Workshop!

A few months ago I applied for a scholarship through the DPC Leadership Programme to attend the DPTP 14-16 March course for those working in digital preservation: The Practice of Digital Preservation.

It was a three-day intermediate course for practitioners who wished to broaden their working knowledge and it covered a wide range of tools and information relating to digital preservation and how to apply them practically to their day-to-day work.

The course was hosted in one of the meeting rooms in the Senate House Library of the University of London, a massive Art Deco building in Bloomsbury (I know because I managed to get a bit lost between breaks!).

Senate House, University of London

The course was three full days of workshops that mixed lectures with group exercises and the occasional break. Amazingly this is the last year they’re doing it as a three day course and they’re going to compress it all into a single day next time (though everything they covered was useful, I don’t know what you’d cut to shorten it—lunch maybe?).

Each day had a different theme.

The first was on approaches to digital preservation. This was an overview of various policy frameworks and standards. The most well-known and accepted being OAIS.

No Google, not OASIS!


Oasis, Oman. Taken by Hendrik Dacquin aka loufi and licensed under CC BY 2.0.

After a brief wrestle with Google’s ‘suggestions’ let’s look at this OAIS Model and admire its weirdly green toned but elegant workflow. If you click through to Wikimedia Commons it even has annotations for the acronyms.


After introducing us to various frameworks, the day mostly focused on the ingest and storage aspect of digital preservation. It covered the 3 main approaches (bit-level preservation, emulation and migration) in-depth and discussed the pros and cons of each.

There are many factors to consider when choosing a method and depending on what your main constraint is: money, time or expertise, different approaches will be more suitable for different organisations and collections. Bit-level preservation is the most basic thing you can do. You are mostly hoping that if you ingest the material exactly as it comes, some future archivist (perhaps with pots of money!) will come along and emulate or migrate it in a way that is far beyond what your poor cash strapped institution can handle.

Emulation is when you create or acquire an environment (not the original one that your digital object was created or housed in) to run your digital object in that attempts to recreate its original look and feel.

Migration which probably works best with contemporary or semi-contemporary objects is used to transfer the object into a format that is more future-proof than its current one. This is an option that needs to be considered in the context of the technical constraints and options available. But perhaps you’re not sure what technical constraints you need to consider? Fear not!

These technical constraints were covered in the second day! This day was on ingestion and it covered file formats, useful tools and several metadata schemas. I’ve probably exhausted you with my very thorough explanation of the first day’s content (also I’d like to leave a bit of mystery for you) so I will just say that there are a lot of file formats and what makes them appealing to the end user can often be the same thing that makes a digital preservationist (ME) tear her hair out.

Thus those interested in preserving digital content have had to  develop (or beg and borrow!) a variety of tools to read, copy, preserve, capture metadata and what have you. They have also spent a lot of time thinking about (and disagreeing over) what to do with these materials and information. From these discussions have emerged various schemata to make these digital objects more…tractable and orderly (haha). They have various fun acronyms (METS, PREMIS, need I go on?) and each has its own proponents but I think everyone is in agreement that metadata is a good thing and XML is even better because it makes that metadata readable by your average human as well as your average computer! A very important thing when you’re wondering what the hell you ingested two months ago that was helpfully name bobsfile1.rtf or something equally descriptive.

The final day was on different strategies for tackling the preservation of more complex born-digital objects such as emails and databases (protip: it’s hard!) and providing access to said objects. This led to a roundup of different and interesting ways institutions are using digital content to engage readers.

There’s a lot of exciting work in this field, such as Stanford University’s ePADD Discovery:


Which allows you to explore the email archives of a collection in a user-friendly (albeit slow) interface. It also has links to the more traditional finding aids and catalogue records that you’d expect of an archive.

Or the Wellcome Library’s digital player developed by DigiratiMendel

Which lets you view digital and digitised content in a single integrated system. This includes, cover-to-cover books, as pictured above, archives, artwork, videos, audio files and more!

Everyone should check it out, it’s pretty cool and freely available for others to use. There were many others that I haven’t covered but these really stood out.

It was an intense but interesting three days and I enjoyed sharing my experiences with the other archivists and research data managers who came to attend this workshop. I think it was a good mix of theory and practical knowledge and will certainly help me in the future. Also I have to say Ed Pinsent and Steph Taylor did a great job!

DPC Student Conference: What I Wish I Knew Before I Started

The world of digital preservation can appear a bit daunting: a world full of checksums and programming and OAIS models, AIPs and DIPs, combined with the urgency of acting before it all becomes too late and technological obsolescence creates a black hole, swallowing up our digital heritage. The Digital Preservation Coalition’s What I Wish I Knew Before I Started  Student Conference provided an opportunity to meet others beginning to work in digital preservation, and hear advice and reassurance from a range of interesting expert speakers.

Fancy words and acronym bingo

The day began with an Introduction to Digital Preservation by the DPC’s Sharon McMeekin who introduced us to current models, methodologies and frameworks, which she warned could also be known as fancy words and acronym bingo. Her presentation was very practical and informed us about resources which will be invaluable when putting digital preservation into practice. Sharon emphasised the importance of active preservation: it isn’t only the digital materials which are vulnerable to obsolescence, but the digital preservation systems that they are stored in. Crucially, digital preservation needs to be embedded into day-to-day work to make it sustainable.

The need for active preservation was echoed by Steph Taylor from the University of London Computer Centre, who urged us all to learn to keep up to date and engage with the digital preservation community through twitter, blogs and forums. She counselled us to be prepared to explain again and again that digital preservation is really not the same thing as backing up files.

Matthew Addis from Arkivum then gave a technologist’s perspective, introducing us to a range of software and tools including the DROID file format identification tool; the POWRR Grid that maps preservation tools against types of content and stages of their lifecycle; the PRONOM registry of file formats; the Exactly checksum tool, among many others, carrying on the game of acronym bingo. The amount of choice of tools and standards can lead to what Matthew called preservation paranoia and then to preservation paralysis where the task seems so big and complex that it seems better to do nothing at all.

It’s people that are the biggest risk to digital content surviving into the future. People thinking that preservation is too hard, too expensive, or tomorrow’s problem and not today’s. (Addis, 2016)

Being a digital archivist = being an archivist with extra super powers

The afternoon sessions were launched by Adrian Brown from the Parliamentary Archives. The Parliamentary Archives hold a wide range of digital material, from the expected email and audio-visual records to the more surprising virtual reality tours and reconstructions of sinking ships. He emphasised that digital archiving was still essentially archiving, involving selection, appraisal, preservation, cataloguing and supporting users. Being a digital archivist, he said, is the same thing as being an archivist, only with extra super powers.

Next, Glenn Cumiskey, Digital Preservation Manager at the British Museum spoke about the importance of engaging with technology, decision makers and user communities. In the current environment, Glenn  illustrated through the roles associated with digital preservation: Archivist, Records Manager, Librarian, Information Technologist, Digital Humanities, and Software Programmer all at once, that you may need to be all of these things at once.

We then heard from Helen Hockx-Yu from the Internet Archive. Here at the Bodleian, the digital archive trainees are actively involved with the Bodleian Libraries Web Archive which uses the Internet Archive’s ‘Archive-It’ and ‘wayback machine’ services. It was interesting to hear from Helen about the redevelopment work she is involved in and how her own career developed in web archiving. Her final advice to us was to keep learning and not worry about being a perfectionist.

Ann MacDonald from the University of Kent inspired us with a talk about her own career began and developed over the last few years, and emphasised that technical innovations are not all about big machines and that small actions can go a long way in implementing digital preservation.

Only point of digital preservation is reuse of data. Nothing else.

Finally, Dave Thompson, Digital Curator at the Wellcome Collection, gave an entertaining presentation which made the point that digital preservation is not an exercise in technology  for its own sake.  He argued that the only point of digital preservation is the reuse of data, therefore data needs to be reusable, consumable and shareable. Digital preservation should be seized as a social opportunity to do this.

Overall, the DPC’s Student Conference: What I Wish I Knew Before I Started was an engaging mixture of reassurance, ideas and advice to prepare us to begin working practically with digital preservation. Key themes which emerged across the presentations were the importance of people in the process, the importance we must give to what users actually want from digital collections, and the importance of selling the benefits and opportunities that digital preservation can bring. It introduced us to technology, tools and processes, but at the same time stressed that you do not need to be a qualified programmer to work in digital preservation.

Preserving Social Media – a briefing day

This post is a bit late as the DPC briefing day on Preserving Social Media was almost a month ago, but our excuse is that there was a lot of food for thought!

As digital archives trainees Rachael and I have spent a lot of time thinking about preserving social media (a bit sad maybe, but true!). Everyone loves web 2.0: It’s dynamic and complex; it gives us the ability to communicate and interact across continents; and it’s a giant headache if you’re trying to archive it!

So as you can see we were quite excited about this briefing day, and it did not disappoint!

Throughout the day the talks were pretty evenly split between various means of capturing and curating social media and how researchers looked to access and use it, as well as the quality of datasets they were able to pull from it. They also touched on the legal ramifications of preserving it and there were a few case studies that discussed lessons learnt from institutions that are actively collecting social media.

Nathan Cunningham introduced us to the concept of the Big Data Network and the UK Data Archive. He talked about how much data and metadata the web was currently generating and the funding that the government was putting into it.

Sara Thomson’s keynote focused on different strategies for capturing and curating social media, such as: the pros and cons of Platform APIs, Data Resellers, Third-party Services and Platform Self-Archiving Services.  She also argued the need for better integration of Social Media with Web Archives in order to contextualize the social media; including preserving archived pages of content that URLs link to. She also focuses on more collaboration between institutions in terms of resources, access and methods/knowledge and within institutions with their own researchers and end users.

Stephen Daisley from STV talked about Social Media & Journalism, about how it provided diverse and up-to-date coverage through non-traditional channels and its use as a tool for those underrepresented in mainstream media.

After lunch we had Katrin Weller from GESIS discuss how social scientists were using social media (For research! Not lolcats!) and the challenges of collecting, sharing and documentation. Going back to the methods that Sara Thomson listed in her keynote, most involve a third party and have restrictions on how the data can be shared, what tools can be used on it, how much data they give you. She highlighted the difficulties this can cause when researchers want to replicate or expand upon another researcher’s work as well as other issues that come from using data that they researcher has not collected.

Tom Storrar from the National Archives rounded off the presentations with a talk on how the UK Government’s social media presence was being captured for posterity. His project was to capture the UK Government’s official Twitter presence. This involved deciding what would be in scope including content and metadata, how they would collect this data and finally how they would present it.


While I found Sara’s keynote interesting and quite informative—especially in terms of what is available out there and a balanced view of what they have to offer—it wasn’t as relevant as I had hoped as it was focused more on someone else providing the data to you rather than the tools you can use to collect what you are interested in. While there are many benefits to having authorised data resellers or the platform itself giving you archiving abilities (especially being able to harvest all the metadata associated with it) I like the flexibility and power that we get with Archive-IT (though of course in some ways it will be a much shallower collection as we only collect what the end-user sees) and the fact that we aren’t restricted to the data that the providers think we want.

I’m glad that she talked about the need for collaboration so that we don’t all try to reinvent the wheel. At the Bodleian we’re quite lucky because we work closely with other legal deposit libraries to capture web content (including social media) so we regularly have the opportunity to discuss and learn from each other’s experiences. We also have our own Bodleian Library Web Archive where we encourage our own researchers to use it as a repository and a resource that they can help us grow.

One thing that I found problematic was Stephen Daisley’s talk. Well not problematic, but perhaps a bit naïve? While I agreed with some of his points, I think he romanticises the notion of social media as the great equaliser. I can think off the top of my head at least one quite large group of underrepresented voices that are not getting their say in social media; the elderly. And I’m sure that there are many examples that you can come up with if you stop to think of it too. Just because the barrier to access is much lower than traditional news stations does not mean there is no barrier. The vast amount of data and metadata generated makes it tempting to believe that that is the whole of the story but I think we need to remember who isn’t part of the conversation.

I also really enjoyed Tom Storrar’s presentation because it highlights the need to have a clear collection policy, to realise you can’t and shouldn’t capture everything, and to make your decisions transparent so that researchers will know exactly what they do and do not have to work with.


Although the talks on Big Data and social science research were less relevant to our work on the Bodleian Libraries Web Archive, it was an eye-opening introduction to the sheer amount of digital data which is collected. This might be commercial research, profiting from the amount of information we can give to social media sites such as our name, nationality, photos, mobile number, address, and interests; or for forecasting purposes such as predicting results of political elections; or for academic study in areas such as activism, audiences, networks and crisis communication and response. I think Katrin Weller certainly succeeded in dismissing the claim that ‘99% of tweets are worthless babble’ – Weller, Social Media as Research Data, 27/10/2015.

Like Emily, I also enjoyed Tom Storrar’s presentation on the capture of government bodies’ Twitter and YouTube feeds. For me it really highlighted how complex the web of legislation is, requiring them to adapt to changing circumstances. If an organisation ceases to be a government body, the National Archives no longer has the right to capture its social media content. Because of these legal restrictions, no retweets or YouTube comments are captured, which means it is a one-way conversation. I think this is a shame, as we are losing that interaction which is so essential to social media. If YouTube comments are modern day equivalents to the letters sent to the government to comment on its policies, should we be preserving them?

Overall the day was full of fascinating talks and discussions on how to move forward in preserving social media. But, the best part of the briefing day was knowing we weren’t alone! We got to talk to people approaching preserving social media from very different angles; the BBC, the National Archives, etc. And even though we all had different mandates and different foci we still found a lot of common ground.

Day Of Digital Archives 2012

Yesterday was Day of Digital Archives 2012! (And yes, I’m a little late posting…)

This ‘Day’ was initiated last year to encourage those working with digital archives to use social media to raise awareness of digital archives: “By collectively documenting what we do, we will be answering questions like: What are digital archives? Who uses them? How are they created and managed? Why are they important?” . So in that spirit, here is a whizz through my week.

Coincidentally not only does this week include the Day of Digital Archives but it’s also the week that the Digital Preservation Coalition (or DPC) celebrated its 10th birthday. On Monday afternoon I went to the reception at the House of Lords to celebrate that landmark anniversary. A lovely event, during which the shortlist for the three digital preservation awards was announced. It’s great to see three award categories this time around, including one that takes a longer view: ‘the most outstanding contribution to digital preservation in the last decade’. That’s quite an accolade.

On the train journey home from the awards I found some quiet time to review a guidance document on the subject of acquiring born-digital materials. There is something about being on a train that puts my brain in the right mode for this kind of work. Nearing its final form, this guidance is the result of a collaboration between colleagues from a handful of archive repositories. The document will be out for further review before too long, and if we’ve been successful in our work it should prove helpful to creators, donors, dealers and repositories.

Part of Tuesday I spent reviewing oral history guidance drafted by a colleague to support the efforts of Oxford Medical Alumni in recording interviews with significant figures in the world of Oxford medicine. Oral histories come to us in both analogue and digital formats these days, and we try to digitise the former as and when we can. The development of the guidance is in the context of our Saving Oxford Medicine initiative to capture important sources for the recent history of medicine in Oxford. One of the core activities of this initiative is survey work, and it is notable that many archives surveyed include plenty of digital material. Web archiving is another element of the ‘capturing’ work that the Saving Oxford Medicine team has been doing, and you can see what has been archived to-date via Archive-It, our web archiving service provider.

Much of Wednesday morning was given over to a meeting of our building committee, which had very little to do with digital archives! In the afternoon, however, we were pleased to welcome visitors from MIT – Nancy McGovern and Kari Smith. I find visits like these are one of the most important ways of sharing information, experiences and know-how, and as always I got a lot out of it. I hope Nancy and Kari did too! That same afternoon, colleagues returned from a trip to London to collect another tranche of a personal archive. I’m not sure if this instalment contains much in the way of digital material, but previous ones have included hundreds of floppies and optical media, some zip discs and two hard disks. Also arriving on Wednesday, some digital Library records courtesy of our newly retired Executive Secretary; these supplement materials uploaded to BEAM (our digital archives repository) last week.

On Thursday, I found some time to work with developer Carl Wilson on our SPRUCE-funded project. Becky Nielsen (our recent trainee, now studying at Glasgow) kicked off this short project with Carl, following on from her collaboration with Peter May at a SPRUCE mashup in Glasgow. I’m picking up some of the latter stages of testing and feedback work now Becky’s started her studies. The development process has been an agile one with lots of chat and testing. I’ve found this very productive – it’s motivating to see things evolving, and to be able to provide feedback early and often. For now you can see what’s going on at github here, but this link will likely change once we settle on a name that’s more useful than ‘spruce-beam’ (doesn’t tell you much, does it?! Something to do with trees…) One of the primary aims of this tool is to facilitate collection analysis, so we know better what our holdings are in terms of format and content. We expect that it will be useful to others, and there will be more info. on it available soon.

Friday was more SPRUCE work with Carl, among other things. Also a few meetings today – one around funding and service models for digital archiving, and a meeting of the Bodleian’s eLegal Deposit Group (where my special interest is web archiving). The curious can read more about e-legal deposit at the DCMS website.  One fun thing that came out of the day was that the Saving Oxford Medicine team decided to participate in a Women in Science wikipedia editathon. This will be hosted by the Radcliffe Science Library on 26 October as part of a series of ‘Engage‘ events on social media organised by the Bodleian and the University’s Computing Services. It’s fascinating to contemplate how the range and content of Wikipedia articles change over time, something a web archive would facilitate perhaps.

For more on working with digital archives, go take a look at the great posts at the Day of Digital Archives blog!

-Susan Thomas