Tag Archives: digital preservation

iPRES 2016

Last month, I attended the 13th International Conference on Digital Preservation, this year hosted in Bern, Switzerland. The four days of papers, panels, posters and workshops were an intensive and exciting opportunity to meet with colleagues working in digital preservation around the world, share ideas, and hear about innovative projects and approaches. The topics ranged widely from technical systems and practices, to quality and risk assessment, and stewardship and sustainability. What follows are just a couple of highlights from a really fascinating week.

Networking wall

The post-it note networking wall: What do you know? What do you want to know?

Net-based and digital art

As email, digital documents and social media replace traditional forms of communication, it is crucial to be able to preserve born-digital material and make it accessible. An area which I hadn’t previously considered was the realm of net-based art. Here, the internet is used as an artistic medium, which of course has implications (and complications) for digital preservation.

In her key-note speech, Sabine Himmelsbach from the House of Electronic Arts in Basel, introduced us to this exciting field, showing artwork such as Olia Lialina’s ‘Summer’, 2013, shown below.

Summer, by Olia Lialina

Screenshot of Summer, Olia Lialina, 2013. Available at https://www.youtube.com/watch?v=SxvHoXdC4Uk

The artwork features an animated loop of Lialina swinging from the browser bar. Each frame is hosted by a different website, and the playback therefore depends on your connection speed. This creative use of technology creates enormous challenges for preservation. Here, rather than preserving artefacts, it is the preservation of behaviours which is crucial, and these behaviours are extremely vulnerable to obsolescence.

Marc Lee’s ‘TV Bot’ is another net-based artwork, which is automated to broadcast current news stories with live TV streams, radio streams and webcam images from around the world. Reliant on technical infrastructure in this way, the shift from Real Player to Adobe Flash Player was one such development which prevented ‘TV Bot’ from functioning. The artist then not only worked on technical migration, but re-interpreted the artwork, modernising the look and feel, resulting in ‘TV Bot 2.0’ in 2010. This process soon happened again, this time including a twitter stream, in ‘TV Bot 3.0’, 2016. In this way, the artist is working against cultural, as well as technical obsolescence.

Marc Lee, 'TV Bot 2.0', 2010. Image from http://ceaac.org/en/artistes/marc-lee

Marc Lee, ‘TV Bot 2.0’, 2010. Image from http://ceaac.org/en/artistes/marc-lee

The heavy involvement from the artist in this case has helped preserve the artwork, but this process cannot be sustained indefinitely. Himmelsbach ended her speech by stressing the need for collaboration and dialogue, which emerged as a central theme of the conference.

A new approach to web archiving

Another highlight was the workshop on Webrecorder lead by Dragan Espenschied from Rhizome. He introduced their new tool which departs from the usual crawling method to capture web content ‘symmetrically’, which results in incredibly high-fidelity captures. The demonstration of how the tool can capture dynamic and interactive content sparked gasps of amazement from the group!

Webrecorder not only captures social media, embedded video and complex javascript (often tricky with current tools), but can actually capture the essence of an individual’s interaction with the web-content.

How it works: Webrecorder records all the content you interact with during the recording session. Users are then able to interact with the content themselves, but anything that was not viewed during the recording session will not be available to them.

Current web archiving strategies aren’t able to capture the personalised nature of web use. How to use this functionality is still a big question, as a web recording in this way would be personal to the web archivist: showing what they decided to explore, unless a systematic approach was designed by an institution. This itself would be very resource-intensive, and is arguably not where the potential of Webrecorder lies: the ability to capture dynamic content, such as net-based artworks. However, the possibility of preserving not only web content, but our interaction with it, is a very exciting development.iPRES 2016 balloon

iPRES 2016 was a fantastic opportunity to gain insight into projects happening around the world to further digital preservation. It showed me that often there are no clear answers to ‘which file format is best for that?’ or ‘how do I preserve this?’ and that seeking advice from others, and experimenting, is often the way forward. What was really clear from attending was that the strength and support of the community is the most valuable digital preservation tool available.

 

Getting Started with Digital Preservation

On 28th April 2016 we attended the Digital Preservation Coalition’s workshop entitled ‘Getting Started with Digital Preservation’. It was a brilliant introduction into the processes involved in undertaking digital preservation, the tools currently available and the standards such work should adheres to.

The day began with an overview of what digital preservation is. Sharon McMeekin explained that digital preservation is an active process that needs to be integrated into daily business rather than being tackled as an aside or as a temporary initiative. It requires early intervention and work on an ongoing basis. She gave some specific advice on the kind of systems that should be used for digital preservation:

  • They need to be resilient, based on standards and be able to be tested
  • They must include error checking and refreshment, be multi-media compatible, self-reporting and backed-up
  • They must provide authenticity checks on data to show alterations by generating a checksum for each digital item. This is unique to the item and would only change if the item has changed (by, for example, degradation or alteration)

We then moved on to Assessing Institutional Readiness. This was described as a benchmarking process to determine the actions required for digital preservation, and revolves around asking the following questions:

  • How much of the organisation is in scope?
  • How much can you rely on others?
  • What will you gain from the exercise?

Sharon suggested using the NDSA Levels of Digital Preservation in this endeavour to create a document which identifies risks, sets objectives, prioritises developments and can be used as an advocacy tool.

We were introduced to the risks affecting digital material:

  • Media/technology obsolescence
  • Media/technology failure
  • Human error
  • Natural disaster
  • Malicious damage
  • Viruses
  • Network failure
  • Disassociation

One way to combat these issues is to migrate the data into a more technologically-stable, secure format and control access to the data through passwords and systems. Metadata which follows the PREMIS standard should also be recorded and preserved alongside the data so its context is not lost. The DPC recommends undertaking a risk assessment whereby the risk, its consequences, likelihood and impact are analysed and priorities are drawn out.

We were shown DROID, a characterisation tool developed by The National Archives which analyses the files on a system, recording how many files there are, how big they are and what formats they are in. It also generates a checksum for each file.

Digital Asset Registers were recommended to use as a coordination tool for actions. They gather all necessary information (Asset name/Location/Owner/Format etc.) into one place, assigning responsibilities and producing a document which can be shared, expanded and updated.

Some final advice we were given was:

  • Think of digital preservation as a shared activity, engaging stakeholders and assigning responsibilities to colleagues
  • Continually update your own skills and knowledge by attending training and identifying useful resources

Having attended this engaging and worthwhile workshop, we now have some foundation knowledge on what is involved in digital preservation and feel ready and excited to begin work on digital preservation at the Bodleian.

What I learned in London…at the DPTP Digital Preservation Workshop!

A few months ago I applied for a scholarship through the DPC Leadership Programme to attend the DPTP 14-16 March course for those working in digital preservation: The Practice of Digital Preservation.

It was a three-day intermediate course for practitioners who wished to broaden their working knowledge and it covered a wide range of tools and information relating to digital preservation and how to apply them practically to their day-to-day work.

The course was hosted in one of the meeting rooms in the Senate House Library of the University of London, a massive Art Deco building in Bloomsbury (I know because I managed to get a bit lost between breaks!).

Senate House, University of London

The course was three full days of workshops that mixed lectures with group exercises and the occasional break. Amazingly this is the last year they’re doing it as a three day course and they’re going to compress it all into a single day next time (though everything they covered was useful, I don’t know what you’d cut to shorten it—lunch maybe?).

Each day had a different theme.

The first was on approaches to digital preservation. This was an overview of various policy frameworks and standards. The most well-known and accepted being OAIS.

No Google, not OASIS!

Oman-Oasis

Oasis, Oman. Taken by Hendrik Dacquin aka loufi and licensed under CC BY 2.0.

After a brief wrestle with Google’s ‘suggestions’ let’s look at this OAIS Model and admire its weirdly green toned but elegant workflow. If you click through to Wikimedia Commons it even has annotations for the acronyms.

OAIS-

After introducing us to various frameworks, the day mostly focused on the ingest and storage aspect of digital preservation. It covered the 3 main approaches (bit-level preservation, emulation and migration) in-depth and discussed the pros and cons of each.

There are many factors to consider when choosing a method and depending on what your main constraint is: money, time or expertise, different approaches will be more suitable for different organisations and collections. Bit-level preservation is the most basic thing you can do. You are mostly hoping that if you ingest the material exactly as it comes, some future archivist (perhaps with pots of money!) will come along and emulate or migrate it in a way that is far beyond what your poor cash strapped institution can handle.

Emulation is when you create or acquire an environment (not the original one that your digital object was created or housed in) to run your digital object in that attempts to recreate its original look and feel.

Migration which probably works best with contemporary or semi-contemporary objects is used to transfer the object into a format that is more future-proof than its current one. This is an option that needs to be considered in the context of the technical constraints and options available. But perhaps you’re not sure what technical constraints you need to consider? Fear not!

These technical constraints were covered in the second day! This day was on ingestion and it covered file formats, useful tools and several metadata schemas. I’ve probably exhausted you with my very thorough explanation of the first day’s content (also I’d like to leave a bit of mystery for you) so I will just say that there are a lot of file formats and what makes them appealing to the end user can often be the same thing that makes a digital preservationist (ME) tear her hair out.

Thus those interested in preserving digital content have had to  develop (or beg and borrow!) a variety of tools to read, copy, preserve, capture metadata and what have you. They have also spent a lot of time thinking about (and disagreeing over) what to do with these materials and information. From these discussions have emerged various schemata to make these digital objects more…tractable and orderly (haha). They have various fun acronyms (METS, PREMIS, need I go on?) and each has its own proponents but I think everyone is in agreement that metadata is a good thing and XML is even better because it makes that metadata readable by your average human as well as your average computer! A very important thing when you’re wondering what the hell you ingested two months ago that was helpfully name bobsfile1.rtf or something equally descriptive.

The final day was on different strategies for tackling the preservation of more complex born-digital objects such as emails and databases (protip: it’s hard!) and providing access to said objects. This led to a roundup of different and interesting ways institutions are using digital content to engage readers.

There’s a lot of exciting work in this field, such as Stanford University’s ePADD Discovery:

ePADD

Which allows you to explore the email archives of a collection in a user-friendly (albeit slow) interface. It also has links to the more traditional finding aids and catalogue records that you’d expect of an archive.

Or the Wellcome Library’s digital player developed by DigiratiMendel

Which lets you view digital and digitised content in a single integrated system. This includes, cover-to-cover books, as pictured above, archives, artwork, videos, audio files and more!

Everyone should check it out, it’s pretty cool and freely available for others to use. There were many others that I haven’t covered but these really stood out.

It was an intense but interesting three days and I enjoyed sharing my experiences with the other archivists and research data managers who came to attend this workshop. I think it was a good mix of theory and practical knowledge and will certainly help me in the future. Also I have to say Ed Pinsent and Steph Taylor did a great job!

DPC Student Conference: What I Wish I Knew Before I Started

The world of digital preservation can appear a bit daunting: a world full of checksums and programming and OAIS models, AIPs and DIPs, combined with the urgency of acting before it all becomes too late and technological obsolescence creates a black hole, swallowing up our digital heritage. The Digital Preservation Coalition’s What I Wish I Knew Before I Started  Student Conference provided an opportunity to meet others beginning to work in digital preservation, and hear advice and reassurance from a range of interesting expert speakers.

Fancy words and acronym bingo

The day began with an Introduction to Digital Preservation by the DPC’s Sharon McMeekin who introduced us to current models, methodologies and frameworks, which she warned could also be known as fancy words and acronym bingo. Her presentation was very practical and informed us about resources which will be invaluable when putting digital preservation into practice. Sharon emphasised the importance of active preservation: it isn’t only the digital materials which are vulnerable to obsolescence, but the digital preservation systems that they are stored in. Crucially, digital preservation needs to be embedded into day-to-day work to make it sustainable.

The need for active preservation was echoed by Steph Taylor from the University of London Computer Centre, who urged us all to learn to keep up to date and engage with the digital preservation community through twitter, blogs and forums. She counselled us to be prepared to explain again and again that digital preservation is really not the same thing as backing up files.

Matthew Addis from Arkivum then gave a technologist’s perspective, introducing us to a range of software and tools including the DROID file format identification tool; the POWRR Grid that maps preservation tools against types of content and stages of their lifecycle; the PRONOM registry of file formats; the Exactly checksum tool, among many others, carrying on the game of acronym bingo. The amount of choice of tools and standards can lead to what Matthew called preservation paranoia and then to preservation paralysis where the task seems so big and complex that it seems better to do nothing at all.

It’s people that are the biggest risk to digital content surviving into the future. People thinking that preservation is too hard, too expensive, or tomorrow’s problem and not today’s. (Addis, 2016)

Being a digital archivist = being an archivist with extra super powers

The afternoon sessions were launched by Adrian Brown from the Parliamentary Archives. The Parliamentary Archives hold a wide range of digital material, from the expected email and audio-visual records to the more surprising virtual reality tours and reconstructions of sinking ships. He emphasised that digital archiving was still essentially archiving, involving selection, appraisal, preservation, cataloguing and supporting users. Being a digital archivist, he said, is the same thing as being an archivist, only with extra super powers.

Next, Glenn Cumiskey, Digital Preservation Manager at the British Museum spoke about the importance of engaging with technology, decision makers and user communities. In the current environment, Glenn  illustrated through the roles associated with digital preservation: Archivist, Records Manager, Librarian, Information Technologist, Digital Humanities, and Software Programmer all at once, that you may need to be all of these things at once.

We then heard from Helen Hockx-Yu from the Internet Archive. Here at the Bodleian, the digital archive trainees are actively involved with the Bodleian Libraries Web Archive which uses the Internet Archive’s ‘Archive-It’ and ‘wayback machine’ services. It was interesting to hear from Helen about the redevelopment work she is involved in and how her own career developed in web archiving. Her final advice to us was to keep learning and not worry about being a perfectionist.

Ann MacDonald from the University of Kent inspired us with a talk about her own career began and developed over the last few years, and emphasised that technical innovations are not all about big machines and that small actions can go a long way in implementing digital preservation.

Only point of digital preservation is reuse of data. Nothing else.

Finally, Dave Thompson, Digital Curator at the Wellcome Collection, gave an entertaining presentation which made the point that digital preservation is not an exercise in technology  for its own sake.  He argued that the only point of digital preservation is the reuse of data, therefore data needs to be reusable, consumable and shareable. Digital preservation should be seized as a social opportunity to do this.

Overall, the DPC’s Student Conference: What I Wish I Knew Before I Started was an engaging mixture of reassurance, ideas and advice to prepare us to begin working practically with digital preservation. Key themes which emerged across the presentations were the importance of people in the process, the importance we must give to what users actually want from digital collections, and the importance of selling the benefits and opportunities that digital preservation can bring. It introduced us to technology, tools and processes, but at the same time stressed that you do not need to be a qualified programmer to work in digital preservation.

Preserving Digital Sound and Vision: A Briefing 8th April 2011

Last Friday I went along to the DPC briefing Preserving Digital Sound and Vision. I was particularly interested in the event because of digital video files currently held on DVD media at the Bodleian.

After arriving at the British Library and collecting my very funky Save the Bits DPC badge I sat down to listen to a packed programme of speakers. The morning talks gave an overview of issues associated with preserving audio-visual resources. We began with Nicky Whitsed from the Open University who spoke about the nature of the problem of preserving audio-visual content; a particularly pertinent issue for the OU who have 40 years of audio-visual teaching resources to deal with. Richard Ranft then gave a fascinating insight into the history and management of the British Library Sound Archive. He played a speech from Nelson Mandela’s 1964 trial to emphasise the value of audio preservation. Next Stephen Gray from JISC Digital Media spoke about how students are using audio-visual content in their research. He mentioned the difficulties researchers find when citing videos, especially those on YouTube that may disappear at any time! To round off the morning John Zubrycki from BBC R and D spoke about Challenges and Solutions in Broadcast Archives. One of the many interesting facts that he mentioned was that subtitle files originally produced by the BBC for broadcast have been used as a tool for search and retrieval of video content.

After enjoying lunch and the beautiful sunny weather on the British Library terrace we moved onto the afternoon programme based on specific projects and tools. Richard Wright of the BBC spoke about the Presto Centre and the tools it has developed to help with audio-visual preservation. He also spoke about the useful digital preservation tools available online via Presto Space. Sue Allcock and James Alexander then discussed the Outcomes and Lessons learnt from the Access to Video Assets Project at the Open University which makes past video content from the Open University’s courses available to OU staff through a Fedora repository. Like the BBC, discovering subtitle files has allowed the OU to index their audio-visual collections. Finally Simon Dixon from the Centre for Digital Music Queen Mary University spoke about emerging tools for digital sound.

A final wide ranging discussion about collaboration and next steps followed which included discussion about storage as well as ideas for a future event addressing the contexts of audio-visual resources. I left the event with my mind full of new information and lots of pointers for places to look to help me consider the next steps for our digital video collections… watch this space.

-Emma Hancox

The Case for Digital Preservation

Now, I’m pretty sure there is no need for me to make the case to the good readers of this blog, but if you’re ever stuck for something to say about why your work is important – for example at parties – then the demise of the print edition of the OED seems a good candidate!

OK, so no one is about to ditch the Pocket version, or even the Shorter (I got one of those for a graduation present from my Grandma!), but even so…

The last print OED was published in 1989. I imagine, given the regular updates to the OED online, that there has been a substantial influx of words since 1989 and I guess (given how Chaucer looks now) English will undergo some significant changes in the future. Unless we (the DP community) decide to preserve the digital OED, we will condemn readers of 2489 to struggle on with an antique 1989 print copy and much will they wonder when they don’t find things like “Internet”…

(Mind you, the electricity might have all run out by then so it wont really matter…)

On the flip side, and no doubt something someone at the party will point out, this is also a case for continuing to print the OED – at least a few copies, kept in safe places… 😉

-Peter Cliff

Par2ty

This is probably an old and battered hat for you good folks (seeing as the Web site’s last “announcement” was in 2004!), but most days I still feel pretty new to this whole digital archiving business – not just with the “archive” bit, but also the “digital preservation”, um, bit so it was news to me… 😉

Perusing the latest Linux Format at the weekend, I chanced on an article by Ben Martin (I couldn’t find a Web site for him…) about parchive and specifically par2cmdline.

Par-what? I hear you ask? (Or perhaps “oh yeah, that old thing” ;-))

Par2 files are what the article calls “error correcting files”. A bit like checksums, only once created they can be used to repair the original file in the event of bit/byte level damage.

Curious.

So I duly installed par2 – did I mention how wonderful Linux (Ubuntu in this case) is? – the install was simple:

sudo apt-get install par2

Then tried it out on a 300MB Mac disk image – the new Doctor Who game from the BBC – and guess what? It works! Do some damage to the file with dd, run the verify again and it says “the file is damaged, but I can fix it” in a reassuring HAL-like way (that could be my imagination, it didn’t really talk – and if it did, probably best not to trust it to fix the file right…)

The par2 files totalled around 9MB at “5% redundancy” – not quite sure what that means – which isn’t much of an overhead for a some extra data security… I think, though I’ve not tried, that it is integrated into KDE4 too for a little bit of personal file protection.

The interesting thing about par2 is that it comes from an age when bandwidth was limited. If you downloaded a large file and it was corrupt, rather than have to download it again, you simply downloaded the (much smaller) par2 file that had the power to fix your download.

This got me thinking. Is there then any scope for archives to share par2 files with each other? (Do they already?) We cannot exchange confidential data but perhaps we could share the par2 files, a little like a pseudo-mini-LOCKSS?

All that said, I’m not quite sure we will use parchive here, though it’d be pretty easy to create the par2 files on ingest. In theory our use of ZFS, RAID, etc. should be covering this level of data security for us, but I guess it remains an interesting question – would anything be gained by keeping par2 data alongside our disk images? And, after Dundee, would smaller archives be able to get some of the protection offered by things like ZFS, but in a smaller, lighter way?

Oh, and Happy Summer Solstice!

-Peter Cliff

Carved in Silicon

Just found an article on the BBC site that is of interest – ‘Rosetta stone’ offers digital lifeline. Some nice pointers to research on the life of CDs and DVDs with numbers to use at presentations and it is comforting to read that what we’re trying to do Just Aint Easy(tm).

I rather like this too:

“…the digital data archivists’ arch enemy: magnetic polarity

(I added the bold!)

Does that make digital archivists like the X-Men ? 😉

-Peter Cliff

Standing on the shoulders of Giants?

Just attended the Repositories and Preservation Programme meeting in Aston Birmingham I would really recommend the talk The Institutional Perspective – How can institutions most effectively exploit work in the Repositories and Preservation field? given by Jeff Haywood- University of Edinburgh.

I would like to think this may kickstart a process to find methods by which current projects could more easily use and build on the outputs of previous projects and create a framework to more easily exchange code and ideas.

Jeff’s talk was given even more currency as in the afternoon session Rachel Heery give a presentation (it’s on slideshare) in Repositories Roadmap Session launching her just published Digital Repositories Roadmap Review: towards a vision for research and learning in 2013

The question however remains: by then will Standing on the shoulders of Giants still be a distant concept?

-Renhart Gittens