A couple months ago, thanks to the generous support of the IIPC student bursary, I had the pleasure of attending the International Internet Preservation Consortium (IIPC) web archiving conference in Hilversum, The Netherlands. The conference took place in The Netherlands Institute for Sound & Vision, adding gravitas and rainbow colour to each of the talks and panels.
What I was struck by most throughout the conference was the extremely up-to-date ideas and topics of the panel. While typical archiving usually deals with history that happened decades or centuries ago, web archiving requires fast-paced decisions and actions to preserve contemporary material as it is being produced. The web is a dynamic, flexible, constantly changing entity. Content is often deleted or frequently buried under the constant barrage of new content creation. Therefore, web archivists must stay in the know and up to date in order keep up with the arms race between web technology and archiving resources.
For instance, right from the beginning, the opening keynote speech discussed the ongoing Russian war in Ukraine. Eliot Higgins, founder of Bellingcat, the independent investigative collective focused on producing open source research, discussed the role of digital metadata and digital preservation techniques in the fight against disinformation. Using the example of Russian spread propaganda about the war in Ukraine, Higgins demonstrated that archived versions of sites and videos, and their associated metadata, can help to debunk intentionally spread misinformation depicting the Ukrainian army in a bad light. For instance, geolocation metadata has been used to prove that multiple videos supposedly showing the Ukrainian army threatening and killing innocent civilians, were actually staged and filmed behind the Russian frontlines. The notion that web archives are not just preserving modern culture and history, but also aiding in the fight against harmful disinformation, is quite heartening.
A similarly current topic of conversation was the potential use of artificial intelligence (AI) in web archives. Given the hot topic that AI is, it’s prevalence at the web archiving conference was well received. The quality assurance process for web archiving, which can be arduous and time consuming, was mentioned as a potential use-case for AI. Checking every subpage of an archived site against the live site is impossible given time and resource constraints. However, if AI could be used to compare screenshots of the live site to the captured version, even without actually going in and patching the issues, just knowing where the issues are would save considerable time. Additionally, AI could be used to fill gaps in collections. It is hard to know what you do not know. In particular, the Bodleian has a collection aimed at preserving the events and experiences of peopled affected by the war in Ukraine. Given our web archiving team’s lack of Ukrainian and Russian language skills, it can be hard to know what sites to include in the collection and what not to. Thus, having AI generate a list of sites deemed culturally relevant to the conflict could help fill the gaps in this collection that we were not even aware of.
Social media archiving was also a significant subject discussed at the conference. Despite the large part that social media plays in our lives and culture, it can be very challenging to capture. For example, the Heritrix crawler, the most commonly used web crawler in web archiving, is blocked by Facebook and Instagram. Additionally, while Twitter technically remains capturable, much of the dynamic content contained in posts (i.e. videos, gifs, links to outside content) can’t be replayed in archived versions. Discussions of collaborations between social media companies and archivists were heralded as a necessity and something that needs to happen soon. In the meantime, talk of web archiving tools that may be best suited for dealing with social media captures included Webrecorder and other tools that mimic how a user would navigate a website in order to create a high-fidelity capture that includes dynamic content.
Between discussions of the role of web archives in halting the spread of disinformation, the use of barely understood tools like generative AI, and potential techniques to get around stumbling blocks within the field of social media archiving, the conference discussions got all attendees excited to begin further exploration of web preservation. The internet is the main resource through which we communicate, disseminate knowledge, and create modern history. Therefore, the pursuit of preserving such history is necessary and integral to the field of archiving.