Can web archives tell stories?

Archives tell stories. A series of induction sessions with archivists have brought me, a web archivist, to a new understanding of what archives are and what archivists do.

Archivists enable stories to be told — stories about people, organisations, society and much more. Archival materials bring them back to life. The very making of a collection — how its contents have been selected, preserved and made available to the public, and how some have not – constitute stories in themselves.

But can web archives tell stories? Web archives differ from conventional archives, where archival material comes into custody as a collection with a relatively clear boundary, within which archivists carry out appraisal, selection and cataloguing work. The boundaries for web archives, by comparison, have been both blurred and expanded.

For example, a national domain web archive can be as broad as ‘.uk’ or the geographic location of the content creator in the UK. That for a thematic web archive is often defined loosely as a topic or subject within the collection framework.

Although developing a domain web archive often involves an automated approach, the pitch on which we develop a selective collection is empty until we put URLs of websites/pages that we often call “seeds” in the collection, and the crawler works its way through these URLs, archives them and makes them parts of our collection. Our role is thus proactive: we purposely select what should be in our collections.

Web archives collect publicly published material from the web. In this respect, it is aligned with the kind of material that libraries collect. This is especially the case when web archives are compiled under non-print legal deposit legislation 2013 in the UK, which gives the British Library and the UK legal deposit libraries the legal framework to collect, store and preserve material published digitally and online. This makes web archives distinct from their conventional counterparts. (Brügger, 2018)

At the same time, the fact that some of the material collected in web archives is ‘unofficial’, highly ephemeral, unstructured, and collected as an exercise in the preservation of culture and heritage, mean that they bear similarities to conventional archives. Web archives are thus hybrid and unique.

I currently work on the Archive of Tomorrow project (AoT) and the Bodleian Libraries is one of the project partners. AoT is funded by the Wellcome Trust, to explore and preserve online information and misinformation relating to health, both official and unofficial. Our aim is to form a ‘Public Health discourse”’ collection within the UK Web Archive.

But how do we select websites for a topic that is so wide and multidisciplinary in nature? I started from a specific topic — breast cancer — as a part of a sub collection, through which I could establish an approach.  It can then be applied to other similar collections.

To scope the coverage, I use three factors to build a matrix: the content creator, the targeted audience, and the content. This gives me a grid to select a sample of what is available online related to a specific topic. Although this approach can help us build the baseline of a topic specific collection, it is rather mechanical in nature.

Yet the pathways are not always clear, and they sometimes overlap and cross over with other sub collections. In common with other information-seeking activities, I often discover websites/pages spontaneously, which may not be related to breast cancer, but are of broad interest to the collection. I will also collect them for different sub collections. The journey of discovery is therefore often interactive and a learning process.

A conversation with colleagues about our early internet experiences also made me reflect further on this approach. What online information did we access and how did we access it in the pre-Google era? These aspects of our individual digital memories can easily fade away. Our personal journeys of navigating information online and interacting with the Web and content are difficult to preserve, even in web archives.

Yet how a piece of information online could affect one’s choice of medical treatment; how an online community has been an important source of support or otherwise; how individual health related choices in day-to-day life have been shaped by online content they interact with, and the choice of organisation or people that are followed on social media platforms – they all constitute a collective story of our society. Web archives play a significant role in preserving and telling this story.

So, it occurs to me that, if web archives can tell stories in the way that conventional archives do, how can this be achieved? I hope our collection can help tell these stories with respect to the ways in which our society interacts with online health information. To preserve these collective memories, we would like to know more about the online health information that individuals access and interact with – that is: which websites do people use and for what purpose?

During the next 10 months of this project, I will be interested to hear of any websites you care to nominate related to health information and why you feel they need to be archived. They should be within the UK domain or created by UK based organisations/individuals. We also archive Twitter accounts (again, the account location needs to be in the UK).

Please send your nomination by filling in an online form: to be included in this collection. Please include “Archive of Tomorrow” in the form. If you would like to express an interest in developing a sub collection for your research projects, please contact me ( directly.

I will write blog posts regularly to keep you updated on the development of our collection and to discuss web archives in general. Please look out for this space if you would like to know more about web archives.

Brügger, N. (2018). The Archived Web: Doing History in the Digital Age. The MIT Press.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.