Tag Archives: UK Legal Deposit Web Archive

The UK Web Archive: Online Enthusiast Communities in the UK

The beginnings of the Online Enthusiast collection of the UK Web Archive can be traced back to November 2016 and a task to scope out the viability and write a proposal for two potential special collections with a focus on current web use: Mental Health, and Online Enthusiasts.

The Online Enthusiasts special collection was intended to show how people within the UK are using the internet to aid them in practising their hobbies, for example discussing their collections of objects or coordinating their bus spotting. If it was something a person could enthuse about and it was on the internet within the UK then it was in scope. Where many UK Web Archive Special Collections are centred on a specific event and online reactions, this was more an attempt to represent the way in which people are using the internet on an everyday basis.

The first step toward a proposal was to assess the viability of the collection, and this meant searching out any potential online enthusiast sites to judge whether this collection would have enough content hosted within the UK to validate its existence. As it turns out, UK hobbyists are very active in their online communities and finding enough content was, if anything, the opposite of an issue. Difficulty came with trying to accurately represent the sheer scope of content available – it’s difficult to google something that you weren’t aware existed 5 minutes ago. After an afternoon among the forums and blogs of ferry spotters, stamp collectors, homebrewers, yarn-bombers, coffee enthusiasts and postbox seekers, there was enough proof of content to complete the initial proposal stating that a collection displaying the myriad uses hobbyists in the UK have for the internet is not only viable but also worthwhile. Eventually that proposal was accepted and the Online Enthusiast collection was born.

The UKWA Online Enthusiast Communities in the UK collection provides a unique cultural insight into how communities interact in digital spheres. It shows that with the power of the internet people with similar unique hobbies and interests can connect and share and enthuse about their favourite hobbies. Many of these communities grow and shrink at rapid paces and therefore many years of content can be lost if a website is no longer hosted.

With the amount of content on the internet, finding websites had a domino effect, where one site would link to another site for a similar enthusiast community, or we would find lists including hobbies we’d never even considered before. This meant that before long we had a wealth of content that we realised would need categorising. Our main approach to categorising the content was along thematic lines. After identifying what we were dealing with, we created a number of sub-collections, examples of which include: Animal related hobbies, collecting focused hobbies, observation hobbies, and sports.

The approach to selecting content for the collection was mainly focused around identifying UK-centric hobbies and using various search terms to identify active communities. The majority of these communities were forums. These forums provided enthusiasts with a platform to discuss various topics related to their hobbies whilst also providing the opportunity for them to share other forms of media such as video, audio and photographic content. Other platforms such as blogs and other websites were also collected, the blogs often focused on submitting content to the blog owner who would then filter and post related content to the community.

As of May 2018 the collection has over 300 archived websites. We found that the most filled categories for hobbies were Sports, collecting and animal related hobbies.

A few examples of websites related to hobbies that were new to us include:

  • UK Pidgeon Racing Forum: An online enthusiast forum concerned with pigeon racing.
  • Fighting Robots Association Forum: An online enthusiast forum for those involved with the creation of fighting robots.
  • Wetherspoon’s Carpets (Tumblr): A Tumblr blog concerned with taking photographs of the unique carpets inside the Wetherspoon’s chain of pubs across the UK.
  • Mine Exploration and History Forum: An online enthusiast community concerned with mine exploration in the UK.
  • Chinese Scooter Club Forum: An online enthusiast community concerned with all things related to Chinese scooters.
  • Knit The City (now Whodunnknit): A website belonging to a graffiti-knitter/yarnbomber from the UK

The Online Enthusiast Communities in the UK collection is accessible via the UK Web Archive’s new beta interface

#WAWeek2017 – Researchers, practitioners and their use of the archived web

This year, the world of web archiving  saw a premiere: not only were the biennial RESAW conference and the IIPC conference, established in 2016, held jointly for the first time, but they also formed part of a whole week of workshops, talks and public events around web archives – Web Archiving Week 2017 (or #WAWeek2017 for the social medially inclined).

After previous conferences Reykjavik (2016) and Arhus (RESAW 2015), the big 2017 event was held in London, 14-16 June 2017, organised jointly by the School of Advanced Studies of the University of London, the IIPC and the British Library.
The programme was packed full of an eclectic variety of presentations and discussions, with topics ranging from the theory and practice of curating web archive collections or capturing whole national web domains, via technical topics such as preservation strategies, software architecture and data management, to the development of methodologies and tools for using web archives based research and case studies of their application.

Even in digital times, who doesn’t like a conference pack? Of course, the full programme is also available online. (…but which version will be easier to archive?)

Continue reading #WAWeek2017 – Researchers, practitioners and their use of the archived web

What has web archiving ever done for us? – Saving our dinner plans, for example.

The Bodleian Libraries is involved in web archiving both through the Bodleian Libraries’ own web archive since 2011 , and – as one of the six UK Legal Deposit Libraries – through the Legal Deposit UK Web Archive since 2013.

What’s cooking in the web archives?   —  (Detail from painting by Jean-François Millet [Public domain], via Wikimedia Commons)

A considerable amount of archivists’, curators’ and subject librarians’ time goes into this web archiving work, be it selecting websites for archiving, capturing and preserving web content, describing web archive resources or participating in web archiving strategy, collections management and outreach activities.

Current web archiving projects at the Bodleian include the further development of the Bodleian Libraries Web Archive, for example to capture audio files hosted on web servers, and curatorial work in the UK Web Archive context, such as the Easter Rising 1916 Web Archive and the EU Referendum website collection.

But why archive the web?

What’s on the internet will be there forever, won’t it? Haven’t we all be warned to be careful what we put on the internet, because all the information out there will still reveal awkward details of our first-year-at-university life when we are about to retire?

Unfortunately, for archivists, this is far from what really happens. In fact, websites are extremely ephemeral. They change and disappear at a fast rate.

Continue reading What has web archiving ever done for us? – Saving our dinner plans, for example.

WARC Files and Blue Lagoons: The IIPC Web Archiving Conference, 13-15 April 2016 in Reykjavik

The International Internet Preservation Consortium (IIPC) is the leading international organisation dedicated to improving the tools, standards and best practices of web archiving, promoting international collaboration and the broad access and use of web archives for research and as cultural heritage.

logoThis year, for the first time the IIPC’s annual General Assembly in Reykjavik was accompanied by a three-day conference, bringing together web archivists, curators, IT specialists and researchers to discuss challenges related to acquiring, preserving, making available and using web archives.  With over 150 participants, including leading experts – most prominently the internet pioneer Vint Cerf – the conference provided a unique opportunity to learn about web archiving strategies and projects around the world, and to keep up to date with emerging trends in research and latest technological developments.

Vint Cerf, Avoiding a Digital Dark Age
Vint Cerf, Avoiding a Digital Dark Age

The first day, after a warm welcome by Ingibjörk Sverrisdottir, Iceland’s National Librarian, was dedicated to the ‘big questions’ of web archiving: What’s worth saving? (Hjalmar Gislason) and how to avoid a Digital Dark Age? (Vint Cerf). How might new services look like, which tools and strategies for preservation are available (Emulation!), or being developed? Or, in the words of Brewster Kahle, founder of the Internet Archive: ’20 years of Web Archiving – What do we do now?’ (video of his talk introducing the ‘National Library of Atlantis’ prototype for integrated web archive discovery)

Brewster Kahle, What Do We Do Now?
Brewster Kahle, What Do We Do Now?

On the second day, the conference continued with two separate tracks, discussing either policies, practices and strategies for capture and preservation of web material, or looking more at the user side of web archives, and at how web archive data be accessed, searched, analysed and visualised as a resource for research.
The third day was the hands-on day with workshops exploring search interfaces such as the SHINE interface developed at the British Library for the UK Web Archive,  DIY web archiving tools such as webrecorder.io, the open-source platform Warcbase for analysing web archive data, and discussing the future of the WARC archive format.

There was plenty of time for Q&A and discussions between and after the talks and presentations, and open, friendly atmosphere of the conference encouraged informal conversations with web archiving colleagues and networking during coffee and lunch breaks, and on visits like the tour of the National and University Library of Iceland.

The National and University Library of Iceland
The National and University Library of Iceland

Once again it became clear that web archiving practice is at the same time extremely diverse and depending on joint efforts and collaborations:
For example, the priorities in curating a relatively small collection of Electronic Literature at the German Literary Archive Marbach are very different from these in capturing and preserving the .EU domain at the Portuguese National Foundation for Scientific Computing FCCN, owing the scope, size and structure of the collections, and the resources available to build and maintain them. Similarly, quality assurance policies and workflows differ considerably between national domain scale archives, such as the Legal Deposit UK Web Archive containing millions of websites, and specialized archives curated and captured by university libraries like the North Carolina State University. Researchers approach the UK Government Web Archive with different research questions than those they would use to look at archived Twitter data.

But no matter the size and scope of the web archive, the resources available at a web archiving institution, or the focus of a particular project, the underlying challenges are very similar:

  • How do we decide what to capture?
  • How to capture it?
  • How to preserve it for the future?
  • Metadata?
  • How to provide access and facilitate discovery?
  • How to use web archives for research?

Working collaboratively and across disciplines, including perspectives from archivists, curators, IT engineers and researchers seems to be the best way forward, and the practice of sharing knowledge and experience, and to openly discuss problems gets certainly embraced by the web archiving community. A particular project might have ‘failed’ in terms of achieving the intended outcome, but it can still provide valuable lessons for the next project elsewhere, and in the long run, for developing best practice, policies and standards for web archiving as a discipline.

Mistakes are only wrong if you - and others - don't learn from them!
Mistakes are only wrong if you – and others – don’t learn from them!

Curators might be slightly overwhelmed by technical details discussed by web crawl engineers (I certainly was!) and ‘the IT guys (and girls)’ might sometimes be confused by the curatorial way of thinking; web archiving cultures in North America seem to differ considerably from the approaches in Europe, where Legal Deposit regulations have a strong impact on collection strategies and access to archives. STEM researchers look at data in different ways than historians and social scientists.
International conferences like the IIPC Web Archiving Conference 2016 are invaluable for bringing together these different perspectives, for fostering discussion and knowledge sharing and for providing an opportunity to establish new and strengthening existing contacts with web archiving colleagues in archives, (university) libraries and research institutions worldwide.

Archiving social media...
Harvesting social media: Overview…

 

...the details.
…and details.

Web archivists love to produce new social media content:
The conference seen through the participants’ Tweets: #iipcwac16.
(Now we just have to archive that!
)

Not least, the Reykjavik conference provided a rare opportunity to meet web archiving colleagues from other UK Legal Deposit Libraries outside the usual committees and institutional settings. One of the conference lunch breaks was turned into an ad-hoc UK Legal Deposit Web Archive meeting, discussing user interface redevelopment – and where else but in Iceland can you have a Friday late afternoon conference debrief whilst soaking in a giant outdoor geothermal bathtub (aka the Blue Lagoon)?

UK web archivists after conference debrief
Some very clean UK web archivists after the conference debrief