Tag Archives: UK Legal Deposit Web Archive

#WAWeek2017 – Researchers, practitioners and their use of the archived web

This year, the world of web archiving  saw a premiere: not only were the biennial RESAW conference and the IIPC conference, established in 2016, held jointly for the first time, but they also formed part of a whole week of workshops, talks and public events around web archives – Web Archiving Week 2017 (or #WAWeek2017 for the social medially inclined).

After previous conferences Reykjavik (2016) and Arhus (RESAW 2015), the big 2017 event was held in London, 14-16 June 2017, organised jointly by the School of Advanced Studies of the University of London, the IIPC and the British Library.
The programme was packed full of an eclectic variety of presentations and discussions, with topics ranging from the theory and practice of curating web archive collections or capturing whole national web domains, via technical topics such as preservation strategies, software architecture and data management, to the development of methodologies and tools for using web archives based research and case studies of their application.

Even in digital times, who doesn’t like a conference pack? Of course, the full programme is also available online. (…but which version will be easier to archive?)

Continue reading

What has web archiving ever done for us? – Saving our dinner plans, for example.

The Bodleian Libraries is involved in web archiving both through the Bodleian Libraries’ own web archive since 2011 , and – as one of the six UK Legal Deposit Libraries – through the Legal Deposit UK Web Archive since 2013.

What’s cooking in the web archives?   —  (Detail from painting by Jean-François Millet [Public domain], via Wikimedia Commons)

A considerable amount of archivists’, curators’ and subject librarians’ time goes into this web archiving work, be it selecting websites for archiving, capturing and preserving web content, describing web archive resources or participating in web archiving strategy, collections management and outreach activities.

Current web archiving projects at the Bodleian include the further development of the Bodleian Libraries Web Archive, for example to capture audio files hosted on web servers, and curatorial work in the UK Web Archive context, such as the Easter Rising 1916 Web Archive and the EU Referendum website collection.

But why archive the web?

What’s on the internet will be there forever, won’t it? Haven’t we all be warned to be careful what we put on the internet, because all the information out there will still reveal awkward details of our first-year-at-university life when we are about to retire?

Unfortunately, for archivists, this is far from what really happens. In fact, websites are extremely ephemeral. They change and disappear at a fast rate.

Continue reading

WARC Files and Blue Lagoons: The IIPC Web Archiving Conference, 13-15 April 2016 in Reykjavik

The International Internet Preservation Consortium (IIPC) is the leading international organisation dedicated to improving the tools, standards and best practices of web archiving, promoting international collaboration and the broad access and use of web archives for research and as cultural heritage.

logoThis year, for the first time the IIPC’s annual General Assembly in Reykjavik was accompanied by a three-day conference, bringing together web archivists, curators, IT specialists and researchers to discuss challenges related to acquiring, preserving, making available and using web archives.  With over 150 participants, including leading experts – most prominently the internet pioneer Vint Cerf – the conference provided a unique opportunity to learn about web archiving strategies and projects around the world, and to keep up to date with emerging trends in research and latest technological developments.

Vint Cerf, Avoiding a Digital Dark Age

Vint Cerf, Avoiding a Digital Dark Age

The first day, after a warm welcome by Ingibjörk Sverrisdottir, Iceland’s National Librarian, was dedicated to the ‘big questions’ of web archiving: What’s worth saving? (Hjalmar Gislason) and how to avoid a Digital Dark Age? (Vint Cerf). How might new services look like, which tools and strategies for preservation are available (Emulation!), or being developed? Or, in the words of Brewster Kahle, founder of the Internet Archive: ’20 years of Web Archiving – What do we do now?’ (video of his talk introducing the ‘National Library of Atlantis’ prototype for integrated web archive discovery)

Brewster Kahle, What Do We Do Now?

Brewster Kahle, What Do We Do Now?

On the second day, the conference continued with two separate tracks, discussing either policies, practices and strategies for capture and preservation of web material, or looking more at the user side of web archives, and at how web archive data be accessed, searched, analysed and visualised as a resource for research.
The third day was the hands-on day with workshops exploring search interfaces such as the SHINE interface developed at the British Library for the UK Web Archive,  DIY web archiving tools such as webrecorder.io, the open-source platform Warcbase for analysing web archive data, and discussing the future of the WARC archive format.

There was plenty of time for Q&A and discussions between and after the talks and presentations, and open, friendly atmosphere of the conference encouraged informal conversations with web archiving colleagues and networking during coffee and lunch breaks, and on visits like the tour of the National and University Library of Iceland.

The National and University Library of Iceland

The National and University Library of Iceland

Once again it became clear that web archiving practice is at the same time extremely diverse and depending on joint efforts and collaborations:
For example, the priorities in curating a relatively small collection of Electronic Literature at the German Literary Archive Marbach are very different from these in capturing and preserving the .EU domain at the Portuguese National Foundation for Scientific Computing FCCN, owing the scope, size and structure of the collections, and the resources available to build and maintain them. Similarly, quality assurance policies and workflows differ considerably between national domain scale archives, such as the Legal Deposit UK Web Archive containing millions of websites, and specialized archives curated and captured by university libraries like the North Carolina State University. Researchers approach the UK Government Web Archive with different research questions than those they would use to look at archived Twitter data.

But no matter the size and scope of the web archive, the resources available at a web archiving institution, or the focus of a particular project, the underlying challenges are very similar:

  • How do we decide what to capture?
  • How to capture it?
  • How to preserve it for the future?
  • Metadata?
  • How to provide access and facilitate discovery?
  • How to use web archives for research?

Working collaboratively and across disciplines, including perspectives from archivists, curators, IT engineers and researchers seems to be the best way forward, and the practice of sharing knowledge and experience, and to openly discuss problems gets certainly embraced by the web archiving community. A particular project might have ‘failed’ in terms of achieving the intended outcome, but it can still provide valuable lessons for the next project elsewhere, and in the long run, for developing best practice, policies and standards for web archiving as a discipline.

Mistakes are only wrong if you - and others - don't learn from them!

Mistakes are only wrong if you – and others – don’t learn from them!

Curators might be slightly overwhelmed by technical details discussed by web crawl engineers (I certainly was!) and ‘the IT guys (and girls)’ might sometimes be confused by the curatorial way of thinking; web archiving cultures in North America seem to differ considerably from the approaches in Europe, where Legal Deposit regulations have a strong impact on collection strategies and access to archives. STEM researchers look at data in different ways than historians and social scientists.
International conferences like the IIPC Web Archiving Conference 2016 are invaluable for bringing together these different perspectives, for fostering discussion and knowledge sharing and for providing an opportunity to establish new and strengthening existing contacts with web archiving colleagues in archives, (university) libraries and research institutions worldwide.

Archiving social media...

Harvesting social media: Overview…

 

...the details.

…and details.

Web archivists love to produce new social media content:
The conference seen through the participants’ Tweets: #iipcwac16.
(Now we just have to archive that!
)

Not least, the Reykjavik conference provided a rare opportunity to meet web archiving colleagues from other UK Legal Deposit Libraries outside the usual committees and institutional settings. One of the conference lunch breaks was turned into an ad-hoc UK Legal Deposit Web Archive meeting, discussing user interface redevelopment – and where else but in Iceland can you have a Friday late afternoon conference debrief whilst soaking in a giant outdoor geothermal bathtub (aka the Blue Lagoon)?

UK web archivists after conference debrief

Some very clean UK web archivists after the conference debrief