Tag Archives: case studies

The Case for Digital Preservation

Now, I’m pretty sure there is no need for me to make the case to the good readers of this blog, but if you’re ever stuck for something to say about why your work is important – for example at parties – then the demise of the print edition of the OED seems a good candidate!

OK, so no one is about to ditch the Pocket version, or even the Shorter (I got one of those for a graduation present from my Grandma!), but even so…

The last print OED was published in 1989. I imagine, given the regular updates to the OED online, that there has been a substantial influx of words since 1989 and I guess (given how Chaucer looks now) English will undergo some significant changes in the future. Unless we (the DP community) decide to preserve the digital OED, we will condemn readers of 2489 to struggle on with an antique 1989 print copy and much will they wonder when they don’t find things like “Internet”…

(Mind you, the electricity might have all run out by then so it wont really matter…)

On the flip side, and no doubt something someone at the party will point out, this is also a case for continuing to print the OED – at least a few copies, kept in safe places… 😉

-Peter Cliff

Any old tapes – a true story – part 1

My neighbour is a self-employed architect. He has worked digitally for at least ten years and now most of his work is done on either his old (but still perfectly serviceable) ThinkPad or a shiny new desktop PC. He works with a couple of different CAD packages along with some tax software and MS Office, all on Windows XP.

Recently, knowing what I do for a living, he asked if I could help with a problem he was having retrieving files from an external hard drive and, being easily persuaded by the promise of food and wine, I agreed to try to help (with all the usual caveats about probably not knowing anything about it all!).

We got the disk drive working quickly (this is often the way when solving other people’s computer issues. Sit with them and they’ll solve it themselves!) and so he asked me about his backups too – which should have been happening regularly to another external drive, but were not. I checked out the drive and found an old directory with a very uninformative name that contained some data files and a few manifests that didn’t make much sense. I’ve forgotten the name already, but he told me this was the name of the backup software. Searching, this software was not on the PC. The new PC had been recently built on the basis of the old one by an outsourced IT support. They’d done a good job restoring the software, etc. but this one backup program (a commercial one) was missing.

The consequences where two-fold:

1) No backup was running
2) the data files (about 1.4GB worth) and manifest were, without the software, entirely unreadable.

My neighbour thought perhaps the backup software was about so he’d ask the IT support to install and configure it. I fired up MS Windows Backup (the first time I’ve ever used it – it seems OK) and ran a one off backup of his work, just to be on the safe side and suggested he ask his support about that (one thing you must never do is undo or override the work of the real support person!) too – it required a password to add it to Windows scheduler.

After it completed, he astutely asked where the files had gone, and so I showed him, on the external drive and was dismayed to find that Windows Backup had also dumped all the files into a 1.4GB (proprietary?) container. I wondered if we’d ever have to extract files from Windows Backup files and made a mental note to keep a copy of the software (bundled with XP) in the cupboard just in case! Worse, it was then impossible to reassure him that the files were there without a crash course in Windows Restore. Still, I remember MS Backup and Restore being a pain way back to MS-DOS! 🙂

As we finished our wine and talked about these things, he seemed to suddenly remember my job and jumped up, rummaged in a cupboard. He pulled out an old tape cartridge:

Once his main backup media, but, like the files on the external drive, no longer usable. This time both the hardware and the software were long gone. He didn’t seem worried – the files has probably been migrated off his old machine to the new one at some point – but still he wondered what was on it and said “I don’t suppose it is readable now is it?”. He hadn’t meant it as a challenge, but I couldn’t resist! I convinced him to let me take the tape with me and try to recover his data – all in the name of digital archaeology, of course!

My next post will be my first adventures in the land of the Travans…

-Peter Cliff

Odds and ends from day one of the digital lives conference

The digital lives conference provided a space to digest some of the findings of the AHRC-funded digital lives project, and also to bring together other perspectives on the topic of personal digital archives. At the proposal stage, the conference was scheduled to last just a day; in the event one day came to be three, which demonstrates how much there is to say on the subject.

Day one was titled ‘Digital Lifelines: Practicalities, Professionalities and Potentialities’. This day was intended mostly for institutions that might archive digital lives for research purposes. Cathy Marshall of Microsoft Research gave the opening talk, which explored some personal digital archiving myths on the basis of her experiences interviewing real-life users about their management of personal digital information.

Next came a series of four short talks on ‘aspects of digital curation‘.

  • Cal Lee, of UNC Chapel Hill, emphasised the need for combining professional skills in order to undertake digital curation successfully. Archives and libraries need to have the right combination of skills to be trusted to do this work.
  • Naomi Nelson of MARBL, Emory University, told a tale of two donors. The first donor being the entity that gives/sells an archive to a library and the second being the academic researcher. Libraries need to have a dialogue with donors of the first type about what a digital archive might contain; this goes beyond the ‘files’ that they readily conceive as components of the archive, and includes several kinds of ‘hidden’ data that may be unknown to them. The second donor, ‘the researcher’, becomes a donor by virtue of the information that the research library can collect about their use of an archive. Naomi raised interesting questions about how we might be able to collect this kind of data and make it available to other researchers, perhaps at a time of the original researcher’s choosing.
  • Michael Olson of Stanford University Libraries spoke of their digital collections and programmes of work. Some mention of work on the fundamentals – the digital library architecture (equivalent to our developing Digital Asset Management System – DAMS – which will provide us with resilient storage, object management and tools and services that can be shared with other library applications). Their digital collections include a software collection of some 5000 titles, containing games and other software. I think that sparked some interest from many in the audience!
  • Ludmilla Pollock, Cold Spring Harbour Laboratory, told us about an extensive oral history programme giving rise to much digital data requiring preservation. The collection contains videos of the scientists talking about their memories and has a dedicated interface.

After, we heard from a panel of dealers in archival materials: Gabriel Heaton of Sotheby’s, Julian Rota of Bertram Rota and Joan Winterkorn of Bernard Quaritch. I was curious to hear if the dealers had needed to appraise archives conatining obsolete digital media. Digital material is still only a tiny proportion of collections being appraised by dealers, and it seems that what little digital material they do encounter may not be appraised as such (disk labels are viewed rather than their contents). While paper archives are plentiful, perhaps there’s not much incentive to develop what’s needed to cater for the digital (many archivists may well feel this way too!). What’s certain is that the dealer has to be quite sure that any investment in facilitating the appraisal of digital materials pays dividends come sale time.

Inevitably, questions of value were a feature of the session. The dealers suggest that archives and libraries are not willing to pay for born-digital archives yet; perhaps this stems from concerns about uniqueness and authenticity, and the lack of facilities to preserve, curate and provide access. It’s not like there’s actually much on the market at the moment, so perhaps it’s a matter of supply as much as demand? Comparisons with ‘traditional’ materials were also made using Larkin’s magic/meaningful values:

“All literary manuscripts have two kinds of value: what might be called the magical value and the meaningful value. The magical value is the older and more universal: this is the paper [the writer] wrote on, these are the words as he wrote them, emerging for the first time in this particular magical combination. We may feel inclined to be patronising about this Shelley-plain, Thomas-coloured factor, but it is a potent element in all collecting, and I doubt if any librarian can be a successful manuscript collector unless he responds to it to some extent. The meaningful value is of much more recent origin, and is the degree to which a manuscript helps to enlarge our knowledge and understanding of a writer’s life and work. A manuscript can show the cancellations, the substitutions, the shifting towards the ultimate form and the final meaning. A notebook, simply by being a fixed sequence of pages, can supply evidence of chronology. Unpublished work, unfinished work, even notes towards unwritten work all contribute to our knowledge of a writer’s intentions; his letters and diaries add to what we know of his life and the circumstances in which he wrote.”

Philip Larkin ‘A Neglected Responsibility: Contemporary Literary Manuscripts’, Encounter, July 1979, pp. 33-41.

The ‘meaningful’ aspects of digital archives are apparent enough, but what of the ‘magical’? Most, if not all, contributors to the discussion saw ‘artifactual’ value in digital media that had an obvious personal connection, whether Barack Obama’s Blackberry or J.K. Rowling’s laptop. What wasn’t discussed so much was the potential magical value of seeing a digital manuscript being rendered in its original environment. I find that quite magical, myself. I think more people will come to see it this way in time.

Delegates were then able to visit to digital scriptorium and audiovisual studio at the British Library.

After lunch, we resumed with a view of the ‘Digital Economy and Philosophy‘ from Annamaria Carusi of the Oxford e-Research Centre. Some interesting thoughts about trust and technology, referring back to Plato’s Phaedrus and the misgivings that an oral culture had about writing. New technologies can be disruptive and it takes time for them to be generally accepted and trusted.

Next, four talks under the theme of digital preservation.

  • First an overview of the history of personal films from Luke McKernan, a curator at the British Library. This included changes in use and physical format, up to the current rise of online video populating YouTube, and its even more prolific Chinese equivalents. Luke also talked about ‘lifecasting’, pointing to JenniCam (now a thing of the past, apparently), and also to folk who go so far as to install movement sensors and videos throughout their homes. Yikes!
  • We also heard from the British Library’s digital preservation team, about their work on risk assessment for the Library’s digital collections (if memory serves, about 3% of the CDs they sampled in a recent survey had problems). Their current focus is getting material off vulnerable media and into the Library’s preservation system; this is also a key aim in our first phase of futureArch. Also mention of the Planets and LIFE projects. Between project and permanent posts, the BL have some 14 people working on digital preservation. If you count those working on webarchiving, audiovisual colections, digitisation, born-digital manuscripts, digital legal deposit, etc., areas, who also have a knowledge of this area, it’s probably rather more.
  • William Prentice offered an enjoyable presentation on audio archiving, which had some similar features to Luke’s talk on film. It always strikes me that audiovisual archiving is very similar to digital archiving in many respects, especially when there’s a need to do digital archaeology that involves older hardware and software that itself requires management.
  • Juan-José Boté of the University of Barcelona spoke to us about a number of projects he had been working on. These were very definitely hybrid archives and interesting for that reason.

Next, I chaired a panel of ‘Practical Experiences‘. Being naturally oriented toward the practical, there was lots for me here.

  • John Blythe, University of North Carolina, spoke about the Southern Historical Collection at the Wilson Library, including the processes they are using for digital collections. Interestingly, they have use of a digital accessioning tool created by their neighbours at Duke University.
  • Erika Farr, Emory University, talked about the digital element of Salman Rushdie’s papers. Interesting to note that there was overlap of data between PCs, where the creator has migrated material from one device to another; this is something we’ve found in digital materials we’ve processed too. I also found Rushdie’s filenaming and foldering conventions curious. When working with personal archives, you come to know the ways people have of doing things. This applies equally to the digital domain – you come to learn the creator’s style of working with the technology.
  • Gabby Redwine of the Harry Ransom Center, University of Texas at Austin gave a good talk about the HRC’s experiences so far. HRC have made some of their collections accessible in the reading room and in exhibition spaces, and are doing some creative things to learn what they can from the process. Like us, they are opting for the locked down laptop approach as an interim means of researcher access to born-digital material.
  • William Snow of Stanford University Libraries spoke to us about SALT, or the Self Archiving Legacy Toolkit. This does some very cool things using semantic technologies, though we would need to look at technologies that can be implemented locally (much of SALT functionality is currently achieved using third-party web services). Stanford are looking to harness creators’ knowledge of their own lives, relationships, and stuff, to add value to their personal archives using SALT. I think we might use it slightly differently, with curators (perhaps mediating creator use, or just processing?) and researchers being the most likely users. I really like the richness in the faceted browser (they are currently using flamenco) – some possibilities for interfaces here. Their use of Freebase for authority control was also interesting; at the Bod, we use The National Register of Archives (NRA) for this and would be reluctant to change all our legacy finding aids and place our trust in such a new service! If the NRA could add some freebase-like functionality, that would be nice. Some other clever stuff too, like term extraction and relationship graphs.

The day concluded with a little discussion, mainly about where digital forensics and legal discovery tools fit into digital archiving. My feeling is that they are useful for capture and exploration. Less so for the work needed around long-term preservation and access.

-Susan Thomas