This year’s International Internet Preservation Consortium Web Archiving Conference was held online from 15-16th June 2021, bringing together professionals from around the world to share their experiences of preserving the Web as a research tool for future generations. In this blog post, Simon Mackley reports back on some of the highlights from the conference.
How can we best preserve the World Wide Web for future researchers, and how can we best provide access to our collections? These were the questions that were at the forefront of this year’s International Internet Preservation Consortium Web Archiving Conference, which was hosted virtually by the National Library of Luxembourg. Web archiving is a subject of particular interest to me: as one of the Bodleian Library’s Graduate Trainee Digital Archivists, I spend a lot of my time working on our own Web collections as part of the Bodleian Libraries Web Archive. It was great therefore to have the chance to attend part of this virtual conference and hear for myself about new developments in the sector.
One thing that really struck me from the conference was the huge diversity in approaches to preserving the Web. On the one hand, many of the papers concerned large-scale efforts by national legal deposit institutions. For instance, Ivo Branco, Ricardo Basílio, and Daniel Gomes gave a very interesting presentation on the creation of the 2019 European Parliamentary Elections collection at the Portuguese Web Archive. This was a highly ambitious project, with the aim of crawling not just the Portuguese Web domain but also capturing a snapshot of elections coverage across 24 different European languages through the use of an automated search engine and a range of web crawler technologies (see their blog for more details). The World Wide Web is perhaps the ultimate example of an international information resource, so it is brilliant to see web archiving initiatives take a similarly international approach.
At the other end of the scale, Hélène Brousseau gave a fascinating paper on community-based web archiving at Artexte library and research centre, Canada. Within the arts community, websites often function as digital publications analogous to traditional exhibition catalogues. Brousseau emphasised the need for manual web archiving rather than automated crawling as a means of capturing the full content and functionality of these digital publications, and at Artexete this has been achieved by training website creators to self-archive their own websites using Conifer. Given that in many cases web archivists often have minimal or even no contact with website creators, it was fascinating to hear of an approach that places creators at the very heart of the process.
It was also really interesting to hear about the innovative new ways that web archives were engaging with researchers using their collections, particularly in the use of new ‘Labs’-style approaches. Marie Carlin and Dorothée Benhamou-Suesser for instance reported on the new services being planned for researchers at the Bibliothèque nationale de France Data Lab, including a crawl-on-demand service and the provision of web archive datasets. New methodologies are always being developed within the Digital Humanities, and so it is vitally important that web archives are able to meet the evolving needs of researchers.
Like all good conferences, the papers and discussions did not solely focus on the successes of the past year, but also explored the continued challenges of web archiving and how they can be addressed. Web archiving is often a resource-intensive activity, which can prove a significant challenge for collecting institutions. This was a major point of discussion in the panel session on web archiving the coronavirus pandemic, as institutions had to balance the urgency of quickly capturing web content during a fast-evolving crisis against the need to manage resources for the longer-term, as it became apparent that the pandemic would last months rather than weeks. It was clear from the speakers that no two institutions had approached documenting the pandemic in quite the same way, but nonetheless some very useful general lessons were drawn from the experiences, particularly about the need to clearly define collection scope and goals at the start of any collecting project dealing with rapidly changing events.
The question of access presents an even greater challenge. We ultimately work to preserve the Web so that researchers can make use of it, but as a sector we face significant barriers in delivering this goal. The larger legal deposit collections, for instance, can often only be consulted in the physical reading rooms of their collecting libraries. In his opening address to the conference, Claude D. Conter of the National Library of Luxembourg addressed this problem head-on, calling for copyright reform in order to meet reader expectations of access.
Yet although these challenges may be significant, I have no doubt from the range of new and innovative approaches showcased at this conference that the web archiving sector will be able to overcome them. I am delighted to have had the chance to attend the conference, and I cannot wait to see how some of the projects presented continue to develop in the years to come.
Simon Mackley