The Bodleian Libraries is involved in web archiving both through the Bodleian Libraries’ own web archive since 2011 , and – as one of the six UK Legal Deposit Libraries – through the Legal Deposit UK Web Archive since 2013.
- What’s cooking in the web archives? — (Detail from painting by Jean-François Millet [Public domain], via Wikimedia Commons)
A considerable amount of archivists’, curators’ and subject librarians’ time goes into this web archiving work, be it selecting websites for archiving, capturing and preserving web content, describing web archive resources or participating in web archiving strategy, collections management and outreach activities.
Current web archiving projects at the Bodleian include the further development of the Bodleian Libraries Web Archive, for example to capture audio files hosted on web servers, and curatorial work in the UK Web Archive context, such as the Easter Rising 1916 Web Archive and the EU Referendum website collection.
But why archive the web?
What’s on the internet will be there forever, won’t it? Haven’t we all be warned to be careful what we put on the internet, because all the information out there will still reveal awkward details of our first-year-at-university life when we are about to retire?
Unfortunately, for archivists, this is far from what really happens. In fact, websites are extremely ephemeral. They change and disappear at a fast rate.
In September 2015 our web archiving colleagues at the British Library looked at a random sample of websites they had archived in previous years, and compared them with the live web.
The result was striking: Within less than two years, almost half of the web sites archived in 2013 had disappeared or had become inaccessible, and looking back 11 years to 2004 showed that more than 90% had ‘GONE’ (not only was the URL missing, but the host that originally served that URL had disappeared from the web), produced an ‘ERROR’ (server still responded to our request, but a once-valid URL now caused the server to fail), or were ‘MISSING’ (the server was still there and responded but no longer recognized the URL [‘404 Not Found’]).
And even if a website was still existing and accessible (‘MOVED’ – URL accessible via redirect, or ‘OK’ – URL directly accessible) its content and appearance was likely to have changed drastically, as the comparison of the digital ‘fingerprints’ (…ssdeep fuzzy hash, for the geeks) of archived websites with those of their live web equivalent shows:
These figures clearly demonstrate how without active web archiving and preservation, an alarmingly large proportion of information, and indeed of early 21st-century online culture and knowledge, is lost forever.
However, numbers and graphs are quite abstract (…or very abstract, in case of those fuzzy hashes…). One much more tangible example for website change (and also a trip down memory lane, for those of us who vaguely remember CD-ROM databases and Telnet – sooooo 20th century!), from the depth of the archives – in this case the Internet Archive‘s WayBack Machine: The Bodleian Library Website from December 1997.
…Prince William was a teenager then, and the Weston Library the New Bodleian…
And who, in 1997, would have imagined things like Google, Twitter and Facebook, and, indeed, blogs like the very one I am writing a post for right now?
Undeniably, the World Wide Web has become deeply ingrained in modern British life: it is where we search for information, communicate, research, share opinions and ideas, and increasingly live our social lives. Being online has become so normal and taken-for-granted that how much we rely on online services and web content only becomes apparent when they disappear.
Earlier this week, the BBC announced that ‘a number of websites, including BBC Food and Newsbeat, are to close as part of plans to save £15m’. The reaction was immediate: within hours #bbcrecipes trended on Twitter, and an online petition to Save the BBC’s recipe archive! is approaching 200,000 supporters as I type (it is – how very apt – lunch time on Thursday 19 May). The Great British Public clearly likes their cooking, and scholars like the Food Historian Polly Russell have pointed out the importance of the BBC recipe online collection as a source for research.
But fear not, gourmets, the knight in shining armour is on his way, and he’s called…UK Web Archive!
Why and how? Here is a blog post by our British Library colleagues [click here or on the image to be taken to the British Library UK Web Archive Blog, including Polly Russell’s statement]
So next time someone asks the ‘What has web archiving ever done for us?’ question at a dinner party… Well. You know the answer.