Web Archives as Scholarly Sources: Issues, Practices and Perspectives

RESAW conference in Aarhus, 8-10 June 2015

Web archiving has been part of Special Collections work at the Bodleian Library for quite a while now, both in cooperation with other UK Legal Deposit Libraries within the electronic Legal Deposit framework (since 2013) , and through the Bodleian Libraries’ own Web Archive.
But whereas the amount of archived web material – at the Bodleian and elsewhere – is constantly growing, the usage of these new resources has so far been quite low, with, it seems, scholars being largely unaware of the potential web archives have as sources for research or lacking knowledge and skills of how to work with such material, and web archiving institutions lacking resources to promote their web archive collection and support their use.

The Research Infrastructure for the Study of Archived Web Materials (RESAW) network aims to promote the establishing of a collaborative European research infrastructure for the study of archived web materials. This means collaborating internationally as well as interdisciplinary to meet the challenges – and the opportunities – archived web materials bring to develop new methods and approaches in research and teaching.

DSC00997

One of the topics: How to archive Social Media content?  And how to use archived Social Media content as scholarly sources?

Tweets from the conference have been collected via Storify. Thanks to Jane Winters from the Institute for Historical Research, University of London, for having set this up.  

The 2015 RESAW conference, hosted by the University of Aarhus in Denmark, was the third in a series of conferences: the first conference in 2001 focused on how to preserve web content, the second in 2008 on web archives theory, and this year’s third conference on the actual use of web archives in research.
Participants included over 80 web archivists, curators, researchers, and IT experts  from various disciplines  from Canada, Denmark, Finland, France, Germany, Italy, Israel, the Netherlands, Russia, the UK, the United Arabic Emirates, and the USA, representing public and private archives, state and university libraries, research institutions, IT service providers and web archiving consultants.

For an intense three days, keynote speeches, and short and long papers alternated panel discussions, with speakers and presenters reporting on their approaches to and practical experiences in archiving websites and in using archived web material for research.
Whereas the individual case studies came from very different backgrounds – focusing on YouTube or social media, exploring possible new tools and methodologies for web archiving and web archives analysis, dealing with the use of Big Data or small datasets in research disciplines from anthropology and linguistics to international relations and migration studies, looking at academic websites, popular culture, internet governance, citizen involvement and even troll communities – it soon became clear that the individual results would lead to common conclusions:

Archived web materials are ‘different’ from both traditional paper-based resources and from the live web. Therefore, existing research theories and traditional approaches to collecting and curating are often not useful when dealing with web materials; new methodologies need to be developed, new questions to be asked. On a practical basis, there is a big need for new tools to deal with the sheer amount of data available for research,  for example to filter and analyze web archive collections, and to visualize results.
Archiving web materials, curating collections, and using them as scholarly sources requires a great amount of resources  – staff/time, knowledge and expertise, technical infrastructure and tools. To use the existing resources as efficiently as possible, archivists, curators, researches of different disciplines, IT experts and service providers need to collaborate.  Pooling resources across institutions and creating (international) networks to share knowledge and experience seems to be the way forward.

Anna Perricci, Columbia University, on the importance of building web archiving collaborations

Anna Perricci, Columbia University, on the importance of building web archiving collaborations

Communication and openness are key! Archivists and curators should make web archiving processes transparent and explain to scholars what type of material and information they can realistically expect to find in web archives (and what is likely not to be included!).  Researchers should clearly express their needs and expectations, but at the same time, be willing to engage with a new type of resource, requiring new approaches, and at least basic IT skills. IT experts should develop easy to use and transparent tools, and share technical knowledge that helps to interpret archived web materials. Users should feed their experience back to curators and developers to help improve web archives selection, metadata/description and discovery tools.

Web archiving is still a young discipline – and research based on archived web material is an even younger one. There are no golden ‘how to’ rules, standards or ‘ultimate authorities’ yet, everyone is still learning. Individual projects encountering problems, or even ‘failing’ to achieve the desired outcome, can still provide valuable lessons to learn from for others. Successes, e.g. in developing and using methodologies and tools for web archiving and using web archives, can be the starting point for developing best practice guidelines in the medium to long term. Again, this requires communication and collaboration within and across institutions, professions, disciplines and countries.

Gareth Millward sharing his experience from the BUDDAH project

A case study of using Web archives as scholarly sources: Gareth Millward sharing his experience exploring the evolution of  disability organisation websites through the UK Domain Data Archive.

The conference’s big strength, apart from giving web archiving professionals and web archives users the opportunity to present their recent and ongoing projects and – in many cases – asking the other conference participants for input and advice, was certainly to bring together people concerned with web archives from a great variety of backgrounds, thus enabling exchange of ideas, debate and networking. There were many eye-opening moments in terms of discovering someone else, in a different institution in a different country, has been working on similar topics or encountered similar problems.

Knowing how and with which result web archived materials were used in other institutions will be very valuable if and when the Bodleian Libraries decide to promote their own web archive collections. At the same time, getting in touch with web archiving colleagues in the UK and internationally offers much potential for collaborations in future projects.
For example, the Tomsk State University in Russian is currently trying to establish a web archive similar to the Bodleian Libraries’ Web Archives, whilst research projects run at the Institute for Historical Research of the University of London  as part of the Big UK Domain Data for the Arts and Humanities Project in cooperation with the British Library could be used as examples to promote the scholarly use of the UK (Legal Deposit) Web Archive in Oxford.

Special Collections in the Danish Netarkivet

Special Collections in the Danish Web Archive, which is run by the State and University Library in Aarhus and The Royal Libray in Copenhagen. Since 2005 the collection and preservation of the .dk internet is included in the Danish Legal Deposit Law.

At the end of the conference, everyone was buzzing with enthusiasm and new ideas, and agreed that the event was a great success  – not least to the flawless organisation and wonderful Danish hospitality, which included a reception celebrating the anniversary of the Danish Web Archiv netarkivet.dk, lots of Smørrebrød (delicious Danish open sandwiches) and a memorable conference dinner, all adding to the friendly and sociable character of the event.

A similar conference is now envisaged to be held in 2016 or 2017 in London, an opportunity not to be missed to catch up with the latest in Web Archiving and strengthen old and new – forgive the pun – links!

2 thoughts on “Web Archives as Scholarly Sources: Issues, Practices and Perspectives

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.