Category Archives: Web Archives

The Why and How of Digital Archiving

Guest post by Matthew Bell, Summer intern in the Modern Archives & Manuscripts Department

If you have ever wondered how future historians will reconstruct and analyse our present society, you may well have envisioned scholars wading through stacks of printed Tweets, Facebook messages and online quizzes, discussing the relevance of, for instance, GIFs sent on the comment section of a particular politician’s announcement of their candidacy, or what different E-Mail autoreplies reveal about communication in the 2010s. The source material for the researcher of this period must, after all, comprise overwhelmingly of internet material; the platform for our communication, the source of our news, the medium on which we work. To take but one example, Ofcom’s report on UK consumption of news from 2022 identifies that “The differences between platforms used across age groups are striking; younger age groups continue to be more likely to use the internet and social media for news, whereas their older counterparts favour print, radio and TV”. As this generation grows up to take the positions of power in our country, it is clear that in seeking to understand the cultural background from which they emerged, a reliance on storing solely physical newspapers will be insufficient. An accurate picture of Britain today would only be possible by careful digital archaeology, sifting through sediments of hyperlinks and screenshots.

This month, through the Oxford University Summer Internship Programme, I was incredibly fortunate to work as an intern in the Bodleian Libraries Web Archive (BLWA) for four weeks, at the cutting edge of digital archiving. One of the first things that became clear speaking to those working in the BLWA is that the world wide web as a source of research material as described above is by no means a foregone conclusion. The perception of the internet as a stable collection that will remain as it is without care and upkeep is a fallacy: websites are taken down, hyperlinks stop working or redirect somewhere else, social media accounts get removed, and companies go bankrupt and stop maintaining their online presence. Digital archiving can feel like a race against time, a push to capture the websites that people use today whilst we still can; without the constant work of web archivists, there is nothing to ensure that the online resources we use will still be available even decades down the line for researchers to consult.

Fortunately, the BLWA is far from alone in this endeavor. Perhaps the most ambitious contemporary web archive is the Internet Archive; from 1996 this archive has formed a collection of billions of websites, and states as its task the humble aim of providing “Universal Access to all Knowledge”, seeking to capture the entire internet. Other archives have a slightly more defined scope, such as the UK Web Archive, although even here the task is still an enormous one, of collecting “all UK websites at least once per year.” Because of the scale of online material that is published every day, whether or not a site has been archived by either the Internet Archive or the UK Web Archive has relevance for whether the Bodleian chooses to archive it; to this extent the world of digital archiving represents cooperation on an international scale.

One aspect of these web archives that struck me during my time here is the conscious effort made by many to place the power of web archiving in the hands of anyone with access to a computer. The Internet Archive, for instance, allows any users with a free account to add content to the archive. Furthermore, one of my responsibilities as intern was a research project into the viability of a programme named Webrecorder for capturing more complex sites such as social medias, and democratization of web archiving seems to be the key purpose of the programme. On their website, which offers free browser-based web archiving tools, the title of the company stands above the powerful rallying call “Web archiving for all!” Whilst the programme currently remains difficult to navigate without a certain level of coding knowledge, and never quite worked as expected during my research, its potential for expanding the responsibility of archiving is certainly exciting. As historians increasingly seek to understand the lives of those whose records have not generally made it into archive collections, one can see as particularly noble the desire to put secure archiving into the hands of people as well as institutions.

The “why” of Digital Archiving, then, seems clear, but what about the “how”? Before going into my main responsibilities this month, some clarification of terminology is necessary.

Capture – This refers to the Bodleian’s copy of a website, a snapshot of it at a particular moment in time which can be navigated exactly like the original.

Live Site – The website as it is available to users on the internet, as opposed to the capture.

Crawl – The process by which a website is captured, as the computer program “crawls” through the live site, clicking on all the links, copying all of the text and photographs, and gathering all of this together into a capture.

Crawl Frequency – The frequency with which a particular website is captured by the Bodleian, determined by a series of criteria including the regularity of the website’s updates.

Archive-It – The website used by the Bodleian to run these crawls, and which stores the captured websites.

Brozzler – A particularly detailed crawl, taking more time but better for dynamic or complicated sites such as social medias. Brozzlers are used for Twitter accounts, for instance. Crawls which are not brozzlers are known as standard crawls and use Heritrix software.

Data Budget – The allocated quantity of data the Bodleian libraries purchase to use on captures, meaning a necessary selectivity as to what is and is not captured.

Quality Assurance (QA) – A huge part of the work of digital archiving, the process by which a capture is compared with the live site and scrutinized for any potential problems in the way it has copied the website, which are then “patched” (fixed). These generally include missing images, stylesheets, or subpages.

Seed – The term for a website which is being captured.

Permission E-Mails – Due to the copyright regulations around web archiving, the BLWA requires permission from the owners of websites before archiving; this can be a particularly complicated task due to the difficulty of finding contact information for many websites, as well as language barriers.

My responsibilities during my internship were diverse, and my day to day work was generally split between quality assurance, setting off crawls, and sending or drafting permission e-mails. Alongside this I was not only carrying out research into Webrecorder, but also contributing to a report re-assessing the crawl frequency of several of our seeds. The work I have done this month has been not only incredibly satisfying (when the computer programme works and you are able to patch a PDF during QA of a website it makes one disproportionately happy), but rewarding. One missing image or hyperlink at a time, digital archivists are driving the careful maintenance of a particularly fragile medium, but one which is vital for the analysis of everything we are living through today.

The International Internet Preservation Consortium Web Archiving Conference: Thoughts and Takeaways

A couple months ago, thanks to the generous support of the IIPC student bursary, I had the pleasure of attending the International Internet Preservation Consortium (IIPC) web archiving conference in Hilversum, The Netherlands. The conference took place in The Netherlands Institute for Sound & Vision, adding gravitas and rainbow colour to each of the talks and panels.

The Netherlands Institute for Sound & Vision. Photo taken by Olga Holownia.

What I was struck by most throughout the conference was the extremely up-to-date ideas and topics of the panel. While typical archiving usually deals with history that happened decades or centuries ago, web archiving requires fast-paced decisions and actions to preserve contemporary material as it is being produced. The web is a dynamic, flexible, constantly changing entity. Content is often deleted or frequently buried under the constant barrage of new content creation. Therefore, web archivists must stay in the know and up to date in order keep up with the arms race between web technology and archiving resources.

For instance, right from the beginning, the opening keynote speech discussed the ongoing Russian war in Ukraine. Eliot Higgins, founder of Bellingcat, the independent investigative collective focused on producing open source research, discussed the role of digital metadata and digital preservation techniques in the fight against disinformation. Using the example of Russian spread propaganda about the war in Ukraine, Higgins demonstrated that archived versions of sites and videos, and their associated metadata, can help to debunk intentionally spread misinformation depicting the Ukrainian army in a bad light. For instance, geolocation metadata has been used to prove that multiple videos supposedly showing the Ukrainian army threatening and killing innocent civilians, were actually staged and filmed behind the Russian frontlines. The notion that web archives are not just preserving modern culture and history, but also aiding in the fight against harmful disinformation, is quite heartening.

A similarly current topic of conversation was the potential use of artificial intelligence (AI) in web archives. Given the hot topic that AI is, it’s prevalence at the web archiving conference was well received. The quality assurance process for web archiving, which can be arduous and time consuming, was mentioned as a potential use-case for AI. Checking every subpage of an archived site against the live site is impossible given time and resource constraints. However, if AI could be used to compare screenshots of the live site to the captured version, even without actually going in and patching the issues, just knowing where the issues are would save considerable time. Additionally, AI could be used to fill gaps in collections. It is hard to know what you do not know. In particular, the Bodleian has a collection aimed at preserving the events and experiences of peopled affected by the war in Ukraine. Given our web archiving team’s lack of Ukrainian and Russian language skills, it can be hard to know what sites to include in the collection and what not to. Thus, having AI generate a list of sites deemed culturally relevant to the conflict could help fill the gaps in this collection that we were not even aware of.

Social media archiving was also a significant subject discussed at the conference. Despite the large part that social media plays in our lives and culture, it can be very challenging to capture. For example, the Heritrix crawler, the most commonly used web crawler in web archiving, is blocked by Facebook and Instagram. Additionally, while Twitter technically remains capturable, much of the dynamic content contained in posts (i.e. videos, gifs, links to outside content) can’t be replayed in archived versions. Discussions of collaborations between social media companies and archivists were heralded as a necessity and something that needs to happen soon. In the meantime, talk of web archiving tools that may be best suited for dealing with social media captures included Webrecorder and other tools that mimic how a user would navigate a website in order to create a high-fidelity capture that includes dynamic content.

Between discussions of the role of web archives in halting the spread of disinformation, the use of barely understood tools like generative AI, and potential techniques to get around stumbling blocks within the field of social media archiving, the conference discussions got all attendees excited to begin further exploration of web preservation. The internet is the main resource through which we communicate, disseminate knowledge, and create modern history. Therefore, the pursuit of preserving such history is necessary and integral to the field of archiving.

Can web archives tell stories?

Archives tell stories. A series of induction sessions with archivists have brought me, a web archivist, to a new understanding of what archives are and what archivists do.

Archivists enable stories to be told — stories about people, organisations, society and much more. Archival materials bring them back to life. The very making of a collection — how its contents have been selected, preserved and made available to the public, and how some have not – constitute stories in themselves.

But can web archives tell stories? Web archives differ from conventional archives, where archival material comes into custody as a collection with a relatively clear boundary, within which archivists carry out appraisal, selection and cataloguing work. The boundaries for web archives, by comparison, have been both blurred and expanded.

Continue reading

Invasion of Ukraine: web archiving volunteers needed

The Bodleian Libraries Web Archive (BLWA) needs your help to document what is happening in Ukraine and the surrounding region. Much of the information about Ukraine being added to the web right now will be ephemeral, and especially information from individuals about their experiences, and those of the people around them. Action is needed to ensure we preserve some of these contemporary insights for future reflection. We hope to archive a range of different content, including social media, and to start forming a resource which can join with other collections being developed elsewhere to:

  • capture the experiences of people affected by the invasion, both within and outside of Ukraine
  • reflect the different ways the crisis is being described and discussed, including misinformation and propaganda
  • record the response to the crisis

To play our part, we need help from individuals with relevant cultural knowledge and language skills who can select websites for archiving. We are particularly interested in Ukrainian and Russian websites, and those from other countries in the region, though any suggestions are welcome.

Please nominate websites via: https://www2.bodleian.ox.ac.uk/beam/webarchive/nominate

Call for contributions: Afghanistan regime change (2021) and the international response web archive collection

On 4 October 2021, the International Internet Preservation Coalition (IIPC) initiated a web archiving collection in response to recent events in Afghanistan. Colleagues at the University of Oxford, and beyond, are invited to contribute nominations for websites to be archived in the collection.

The collection theme is Afghanistan regime change (2021) and the international response. The focus is on the international aspects of events in Afghanistan documenting transnational involvement and worldwide interest in the process of regime change, documenting how the situation evolves over time.

A post on the IIPC’s blog, by the collection’s lead curator Nicola Bingham (British Library), provides further details of the background and scope of the collection.

How to contribute to the collection:

  1. Please read the Collection Scoping Document and accompanying IIPC blog post for more details on the collection and a full overview of the collecting scope.
  2. Enter nominations for websites, and a small amount of basic metadata, via the collection’s Google Form. The Google form accepts website nominations in non-English scripts.

This post is based on Nicola Bingham’s blog IIPC Collaborative collection: “Afghanistan regime change (2021) and the international response”.

Conference Report: IIPC Web Archiving Conference 2021

This year’s International Internet Preservation Consortium Web Archiving Conference was held online from 15-16th June 2021, bringing together professionals from around the world to share their experiences of preserving the Web as a research tool for future generations. In this blog post, Simon Mackley reports back on some of the highlights from the conference.  

How can we best preserve the World Wide Web for future researchers, and how can we best provide access to our collections? These were the questions that were at the forefront of this year’s International Internet Preservation Consortium Web Archiving Conference, which was hosted virtually by the National Library of Luxembourg. Web archiving is a subject of particular interest to me: as one of the Bodleian Library’s Graduate Trainee Digital Archivists, I spend a lot of my time working on our own Web collections as part of the Bodleian Libraries Web Archive. It was great therefore to have the chance to attend part of this virtual conference and hear for myself about new developments in the sector.

One thing that really struck me from the conference was the huge diversity in approaches to preserving the Web. On the one hand, many of the papers concerned large-scale efforts by national legal deposit institutions. For instance, Ivo Branco, Ricardo Basílio, and Daniel Gomes gave a very interesting presentation on the creation of the 2019 European Parliamentary Elections collection at the Portuguese Web Archive. This was a highly ambitious project, with the aim of crawling not just the Portuguese Web domain but also capturing a snapshot of elections coverage across 24 different European languages through the use of an automated search engine and a range of web crawler technologies (see their blog for more details). The World Wide Web is perhaps the ultimate example of an international information resource, so it is brilliant to see web archiving initiatives take a similarly international approach.

At the other end of the scale, Hélène Brousseau gave a fascinating paper on community-based web archiving at Artexte library and research centre, Canada. Within the arts community, websites often function as digital publications analogous to traditional exhibition catalogues. Brousseau emphasised the need for manual web archiving rather than automated crawling as a means of capturing the full content and functionality of these digital publications, and at Artexete this has been achieved by training website creators to self-archive their own websites using Conifer. Given that in many cases web archivists often have minimal or even no contact with website creators, it was fascinating to hear of an approach that places creators at the very heart of the process.

It was also really interesting to hear about the innovative new ways that web archives were engaging with researchers using their collections, particularly in the use of new ‘Labs’-style approaches. Marie Carlin and Dorothée Benhamou-Suesser for instance reported on the new services being planned for researchers at the Bibliothèque nationale de France Data Lab, including a crawl-on-demand service and the provision of web archive datasets. New methodologies are always being developed within the Digital Humanities, and so it is vitally important that web archives are able to meet the evolving needs of researchers.

Like all good conferences, the papers and discussions did not solely focus on the successes of the past year, but also explored the continued challenges of web archiving and how they can be addressed. Web archiving is often a resource-intensive activity, which can prove a significant challenge for collecting institutions. This was a major point of discussion in the panel session on web archiving the coronavirus pandemic, as institutions had to balance the urgency of quickly capturing web content during a fast-evolving crisis against the need to manage resources for the longer-term, as it became apparent that the pandemic would last months rather than weeks. It was clear from the speakers that no two institutions had approached documenting the pandemic in quite the same way, but nonetheless some very useful general lessons were drawn from the experiences, particularly about the need to clearly define collection scope and goals at the start of any collecting project dealing with rapidly changing events.

The question of access presents an even greater challenge. We ultimately work to preserve the Web so that researchers can make use of it, but as a sector we face significant barriers in delivering this goal. The larger legal deposit collections, for instance, can often only be consulted in the physical reading rooms of their collecting libraries. In his opening address to the conference, Claude D. Conter of the National Library of Luxembourg addressed this problem head-on, calling for copyright reform in order to meet reader expectations of access.

Yet although these challenges may be significant, I have no doubt from the range of new and innovative approaches showcased at this conference that the web archiving sector will be able to overcome them. I am delighted to have had the chance to attend the conference, and I cannot wait to see how some of the projects presented continue to develop in the years to come.

Simon Mackley

UK Web Archive mini-conference 2020

On Wednesday 19th November I attended the UK Web Archive (UKWA) mini-conference 2020, my first conference as a Graduate Trainee Digital Archivist. It was hosted by Jason Webber, Engagement Manager at the UKWA and, as normal in these COVID times, it was hosted on Zoom (my first ever Zoom experience!)

The conference started with an introduction and demonstration of the UKWA by Jason Webber. Starting in 2005 the UKWA’s mission is to collect the entire UK webspace, at least once per year, and preserve the websites for future generations. As part of my traineeship I have used the UKWA but it was interesting to hear about the other functions and collections it provides. Along with being able to browse different versions of UK websites it also includes over 100 curated collections on themes ranging from Food to Brexit to Online Enthusiast Communities in the UK. It also features the SHINE tool, which was developed as part of the ‘Big UK Data Arts and Humanities’ project and contains over 3.5 billion items which have been full-text indexed so that every word is searchable. It allows users to perform searches and trend analysis on subjects over a huge range of websites, all you need to use this tool is a bit a Python knowledge. My Python knowledge is a bit basic but Caio Mello, during his researcher talk, provided a useful link for online python tutorials aimed at historians to aid in their research.

In his talk, Caio Mello (School of Advanced Study, University of London) discussed how he used the SHINE tool as part of his work for the CLEOPATRA Project. He was specifically looking at the Olympic legacy of the 2012 Olympics, how it was defined and how the view of the legacy changed over time. He explained the process he used to extract the information and the ways the information can be used for analysis, visualisation and context. My background is in mathematics and the concept of ‘Big Data’ came up frequently during my studies so it was fascinating to see how it can be used in a research project and how the UKWA is enabling research to be conducted over such a wide range of subjects.

The next researcher talk by Liam Markey (University of Liverpool and the British Library) showed a different approach to using the UKWA for his research project into how Remembrance in 20th Century Britain has changed. He explained how he conducted an analysis of archived newspaper articles, using specific search terms, to identify articles that focused on commemoration which he could then use to examine how the attitudes changed over time. The UKWA enabled him to find websites that focused on the war and compare these with mainstream newspapers to see how these differ.

The Keynote speaker was Paul Gooding (University of Glasgow) and was about the use and users of Non-Print Legal Deposit Libraries. His research as part of the Digital Library Futures Project, with the Bodleian Libraries and Cambridge University Library as case study partners, looked at how Academic Deposit libraries were impacted by e-Legal Deposit. It was an interesting discussion around some of the issues of the system, such as balancing the commercial rights with access for users and how highly restrictive access conditions are at odds with more recent legislation, such as the provision for disabled users and 2014 copyright exception for data and text mining for non-commercial uses.

Being new to the digital archiving world, my first conference was a great introduction to web archiving and provided context to the work I am doing. Thank you to the organisers and speakers for giving me insight into a few of the different ways the web archive is used and I have come away with a greater understanding of the scope and importance of digital archiving (as well as a list of blog posts and tutorials to delve into!)

Some Useful Links:

https://www.webarchive.org.uk/

https://programminghistorian.org/

https://blogs.bl.uk/webarchive/2020/11/how-remembrance-day-has-changed.html

http://cleopatra-project.eu/

 

#WeMissiPRES: Preserving social media and boiling 1.04 x 10^16 kettles

This year the annual iPRES digital preservation conference was understandably postponed and in its place the community hosted a 3-day Zoom conference called #WeMissiPRES. As two of the Bodleian Libraries’ Graduate Trainee Digital Archivists, Simon and I were in attendance and blogged about our experiences. This post contains some of my highlights.

The conference kicked off with a keynote by Geert Lovink. Geert is the founding director of the Institute of Network Cultures and the author of several books on critical Internet studies. His talk was wide-ranging and covered topics from the rise of so-called ‘Zoom fatigue’ (I guarantee you know this feeling by now) to how social media platforms affect all aspects of contemporary life, often in negative ways. Geert highlighted the importance of preserving social media in order to allow future generations to be able to understand the present historical moment. However, this is a complicated area of digital preservation because archiving social media presents a host of ethical and technical challenges. For instance, how do we accurately capture the experience of using social media when the content displayed to you is largely dictated by an algorithm that is not made public for us to replicate?

After the keynote I attended a series of talks about the ARCHIVER project. João Fernandes from CERN explained that the goal of this project is to improve archiving and digital preservation services for scientific and research data. Preservation solutions for this type of data need to be cost-effective, scalable, and capable of ingesting amounts of data within the petabyte range. There were several further talks from companies who are submitting to the design phase of this project, including Matthew Addis from Arkivum. Matthew’s talk focused on the ways that digital preservation can be conducted on the industrial scale required to meet the brief and explained that Arkivum is collaborating with Google to achieve this, because Google’s cloud infrastructure can be leveraged for petabyte-scale storage. He also noted that while the marriage of preserved content with robust metadata is important in any digital preservation context, it is essential for repositories dealing with very complex scientific data.

In the afternoon I attended a range of talks that addressed new standards and technologies in digital preservation. Linas Cepinskas (Data Archiving and Networked Services (DANS)) spoke about a self-assessment tool for the FAIR principles, which is designed to assess whether data is Findable, Accessible, Interoperable and Reusable. Later, Barbara Sierman (DigitalPreservation.nl) and Ingrid Dillo (DANS) spoke about TRUST, a new set of guiding principles that are designed to map well with FAIR and assess the reliability of data repositories. Antonio Guillermo Martinez (LIBNOVA) gave a talk about his research into Artificial Intelligence and machine learning applied to digital preservation. Through case studies, he identified that AI is especially good at tasks such as anomaly detection and automatic metadata generation. However, he found that regardless of how well the AI performs, it needs to generate better explanations for its decisions, because it’s hard for human beings to build trust in automated decisions that we find opaque.

Paul Stokes from Jisc3C gave a talk on calculating the carbon costs of digital curation and unfortunately concluded that not much research has been done in this area. The need to improve the environmental sustainability of all human activity could not be more pressing and digital preservation is no exception, as approximately 3% of the world’s electricity is used by data centres. Paul also offered the statistic that enough power is consumed by data centres worldwide to boil 10,400,000,000,000,000 kettles – which is the most important digital preservation metric I can think of.

This conference was challenging and eye-opening because it gave me an insight into (complicated!) areas of digital preservation that I was not familiar with, particularly surrounding the challenges of preserving large quantities of scientific and research data. I’m very grateful to the speakers for sharing their research and to the organisers, who did a fantastic job of bringing the community together to bridge the gap between 2019 and 2021!

#WeMissiPRES: A Bridge from 2019 to 2021

Every year, the international digital preservation community meets for the iPRES conference, an opportunity for practitioners to exchange knowledge and showcase the latest developments in the field. With the 2020 conference unable to take place due to the global pandemic, digital preservation professionals instead gathered online for #WeMissiPRES to ensure that the global community remained connected. Our graduate trainee digital archivist Simon Mackley attended the first day of the event; in this blog post he reflects on some of the highlights of the talks and what they tell us about the state of the field.

How do you keep the global digital preservation community connected when international conferences are not possible? This was the challenge faced by the organisers of #WeMissIPres, a three-day online event hosted by the Digital Preservation Coalition. Conceived as a festival of digital preservation, the aim was not to try and replicate the regular iPRES conference in an online format, but instead to serve as a bridge for the digital preservation community, connecting the efforts of 2019 with the plans for 2021.

As might be expected, the impact of the pandemic loomed large in many of the talks. Caylin Smith (Cambridge University Library) and Sara Day Thomson (University of Edinburgh) for instance gave a fascinating paper on the challenge of rapidly collecting institutional responses to coronavirus, focusing on the development of new workflows and streamlined processes. The difficulties of working from home, the requirements of remote access to resources, and the need to move training online likewise proved to be recurrent themes throughout the day. As someone whose own experience of digital preservation has been heavily shaped by the pandemic (I began my traineeship at the start of lockdown!) it was really useful to hear how colleagues in other institutions have risen to these challenges.

I was also struck by the different ways in which responses to the crisis have strengthened digital preservation efforts. Lynn Bruce and Eve Wright (National Records of Scotland) noted for instance that the experience of the pandemic has led to increased appreciation of the value of web-archiving from stakeholders, as the need to capture rapidly-changing content has become more apparent. Similarly, Natalie Harrower (Digital Repository of Ireland) made the excellent point that the crisis had not only highlighted the urgent need for the sharing of medical research data, but also the need to preserve it: Coronavirus data may one day prove essential to fighting a future pandemic, and so there is therefore a moral imperative for us to ensure that it is preserved.

As our keynote speaker Geert Lovink (Institute of Network Cultures) reminded us, the events of the past year have been momentous quite apart from the pandemic, with issues such as the distorting impacts of social media on society, the climate emergency, and global demands for racial justice all having risen to the forefront of society. It was great therefore to see the role of digital preservation in these challenges being addressed in many of the panel sessions. A personal highlight for me was the presentation by Daniel Steinmeier (KB National Library of the Netherlands) on diversity and digital preservation. Steinmeier stressed that in order for diversity efforts to be successful, institutions needed to commit to continuing programmes of inclusion rather than one-off actions, with the communities concerned actively included in the archiving process.

So what challenges can we expect from the year ahead? Perhaps more than ever, this year this has been a difficult question to answer. Nonetheless, a key theme that struck me from many of the discussions was that the growing challenge of archiving social media platforms was matched only by the increasing need to preserve the content hosted on them. As Zefi Kavvadia (International Institute of Social History) noted, many social media platforms actively resist archiving; even when preservation is possible, curators are faced with a dilemma between capturing user experiences and capturing platform data. Navigating this challenge will surely be a major priority for the profession going forward.

While perhaps no substitute for meeting in person, #WeMissiPRES nonetheless succeeded in bringing the international digital preservation community together in a shared celebration of the progress being made in the field, successfully bridging the gap between 2019 and 2021, and laying the foundations for next year’s conference.

 

#WeMissiPRES was held online from 22nd-24th September 2020. For more information, and for recordings of the talks and panel sessions, see the event page on the DPC website.

Archiving web content related to the University of Oxford and the coronavirus pandemic

Since March 2020, the scope of collection development at the Bodleian Libraries’ Web Archive has expanded to also focus on the coronavirus pandemic: how the University of Oxford, and wider university community have reacted and responded to the rapidly changing global situation and government guidance. The Bodleian Libraries’ Web Archive team have endeavoured (and will keep working) to capture, quality assess and make publicly available records from the web relating to Oxford and the coronavirus pandemic. Preserving these ephemeral records is important. Just a few months into what is sure to be a long road, what do these records show?

Firstly, records from the Bodleian Libraries’ Web Archive can demonstrate how university divisions and departments are continually adjusting in order to facilitate core activities of learning and research. This could be by moving planned events online or organising and hosting new events relevant to the current climate:

Capture of http://pcmlp.socleg.ox.ac.uk/ 24 May 2020 available through the Bodleian Libraries’ Web Archive. Wayback URL https://wayback.archive-it.org/2502/20200524133907/https://pcmlp.socleg.ox.ac.uk/global-media-policy-seminar-series-victor-pickard-on-media-policy-in-a-time-of-crisis/

Captures of websites also provide an insight to the numerous collaborations of Oxford University with both the UK government and other institutions at this unprecedented time; that is, the role Oxford is playing and how that role is changing and adapting. Much of this can be seen in the ever evolving news pages of departmental websites, especially those within Medical Sciences division, such as the Nuffield Department of Population Health’s collaboration with UK Biobank for the government department of health and social care announced on 17 May 2020.

The web archive preserves records of how certain groups are contributing to coronavirus covid-19 research, front line work and reviewing things at an extremely  fast pace which the curators at Bodleian Libraries’ Web Archive can attempt to capture by crawling more frequently. One example of this is the Centre for Evidence Based Medicine’s Oxford Covid-19 Evidence Service – a platform for rapid data analysis and reviews which is currently updated with several articles daily. Comparing two screenshots of different captures of the site, seven weeks apart, show us the different themes of data being reviewed, and particularly how the ‘Most Viewed’ questions change (or indeed, don’t change) over time.

Capture of https://www.cebm.net/covid-19/ 14 April 2020 available through the Bodleian Libraries’ Web Archive. Wayback URL https://wayback.archive-it.org/org-467/20200414111731/https://www.cebm.net/covid-19/

Interestingly, the page location has slightly changed, the eagle-eyed among you may have spotted that the article reviews are now under /oxford-covid-19-evidence-service/, which is still in the web crawler’s scope.

Capture of https://www.cebm.net/covid-19/ 05 June 2020 available through the Bodleian Libraries’ Web Archive. Wayback url https://wayback.archive-it.org/org-467/20200605100737/https://www.cebm.net/oxford-covid-19-evidence-service/

We welcome recommendations for sites to archive; if you would like to nominate a website for inclusion in the Bodleian Libraries’ Web Archive you can do so here. Meanwhile, the work to capture institutional, departmental and individual responses at this time continues.