Category Archives: Web Archives

Capturing and Preserving the EU Referendum Debate (Brexit) – UK Web Archive blog

Following the announcement in May 2015 that there would be a referendum on the UK’s EU membership, the Legal Deposit UK Web Archive, led by curators at the Bodleian Libraries, started a collection of websites.

The team of curators includes contributors from the Bodleian Libraries, The British Library, the National Libraries of Scotland and Wales and also Queen’s University Belfast (for the Northern Ireland perspective) and the London School of Economics (for capturing and preserving individual documents, such as the pdf versions of campaigning leaflets).

The collection scope is to capture the ‘Brexit’ debate and the debate around the EU Referendum as well as the wider context of UK/EU relations, including:

  • Media coverage
  • websites of political parties and other political institutions and groups
  • campaigning and lobbying
  • trade unions, professional organisations, businesses
  • academic debate
  • culture and arts
  • public opinion through blogs, comments, and if possible social media.

We primarily archive UK websites under the Non-Print Legal Deposit mandate, but also decided to include some sites outside the UK, if relevant – e.g. websites of UK expats in Europe, or political parties, interest groups and think tanks in the EU and in EU member states – on a permission basis.

The collection (at the time of writing) has 2590 target websites. Some of these are whole websites; others will be a single news story or blog post.

Access and availability
The majority of the collection will be available in the reading rooms of UK Legal Deposit libraries, including both British Library sites, the Bodleian Libraries in Oxford, the National Library of Scotland, the National Library of Wales, Cambridge University Library and Trinity College Dublin. As is usual for web archive collections, there is a delay between collection and availability of up to a year, allowing for cataloguing and for ingest into digital library systems.

by Svenja Kunze, Project Archivist, Bodleian Libraries (Oxford University)

Source: Capturing and Preserving the EU Referendum Debate (Brexit) – UK Web Archive blog

EU Referendum Web Archiving Mini-internship – Part 1

On 20 and 21 June eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the first of two guest blog posts on the micro-internship.

Web archiving micro-interns on the roof of the Weston Library, June 2016.

Web archiving micro-interns on the roof of the Weston Library, June 2016.

Using library archives for their research is not a novelty for any student or scholar. However, web archives represent a completely new dimension of swiftly evolving research methods – they intend to document what is posted online – a  relatively recent form of data collection due to scientific advancements.

For researchers used to traditional archives, the need to store and analyse this data might be not really understandable, however, web archiving, despite being relatively new, is very significant. Firstly, it allows us to store information for generations of future historians and sociologists – contrary to the common perception, many data held on World Wide Web disappears or changes very frequently and rapidly. Secondly, it might be an asset for those pursuing topical research projects in the present – recent technologies (such as prototype SHINE database for historical research) allow us to trace data trends and come to important and fascinating conclusions. Therefore, even if some might underrate web archives, it surely does not diminish their utility to academia.

In the eve of the Brexit referendum, which sparked many debates and discussions in British web space, timely creation of a web collection has proven to be very important – after all, the decision is likely to have long-term consequences for our society, economy, and legal system. Traditionally, individual narratives and civic engagement are set aside when documenting major political decisions. However, a web collection can significantly improve this situation by collecting diverse standpoints expressed in the web sphere. This, in my opinion, perfectly mirrors the ethos of direct democracy where every vote and view counts.

However, important as it is, web archiving comes with a range of practical and ethical obstacles: with huge masses of information being stored online it is very hard to choose what is worthy of being preserved for future generations. Legal restrictions, such as the recent legal deposit legislation, also significantly limit the scope of archivists’ work. During my micro-internship I, along with other interns, tried to overcome these obstacles as much as possible, minimising bias and efficiently using our time resources and server memory. Even in the era of technology, it is the human resources and individual judgment that shape the scope and direction of the collection.

Working on a web collection, especially since the campaigning has increased just before the referendum, was very challenging. However, as interns, we tackled the masses of information by focusing on individual areas of knowledge. Our work on the project was also aided by the guidance provided by our supervisors and discussions on ethical and scientific implications of our research. This was a very rewarding insight into a new area of knowledge, and I am convinced that skills and knowledge acquired and applied by me during the internship will aid me in my future research career.

Anna Lukina

What has web archiving ever done for us? – Saving our dinner plans, for example.

The Bodleian Libraries is involved in web archiving both through the Bodleian Libraries’ own web archive since 2011 , and – as one of the six UK Legal Deposit Libraries – through the Legal Deposit UK Web Archive since 2013.

What’s cooking in the web archives?   —  (Detail from painting by Jean-François Millet [Public domain], via Wikimedia Commons)

A considerable amount of archivists’, curators’ and subject librarians’ time goes into this web archiving work, be it selecting websites for archiving, capturing and preserving web content, describing web archive resources or participating in web archiving strategy, collections management and outreach activities.

Current web archiving projects at the Bodleian include the further development of the Bodleian Libraries Web Archive, for example to capture audio files hosted on web servers, and curatorial work in the UK Web Archive context, such as the Easter Rising 1916 Web Archive and the EU Referendum website collection.

But why archive the web?

What’s on the internet will be there forever, won’t it? Haven’t we all be warned to be careful what we put on the internet, because all the information out there will still reveal awkward details of our first-year-at-university life when we are about to retire?

Unfortunately, for archivists, this is far from what really happens. In fact, websites are extremely ephemeral. They change and disappear at a fast rate.

Continue reading

WARC Files and Blue Lagoons: The IIPC Web Archiving Conference, 13-15 April 2016 in Reykjavik

The International Internet Preservation Consortium (IIPC) is the leading international organisation dedicated to improving the tools, standards and best practices of web archiving, promoting international collaboration and the broad access and use of web archives for research and as cultural heritage.

logoThis year, for the first time the IIPC’s annual General Assembly in Reykjavik was accompanied by a three-day conference, bringing together web archivists, curators, IT specialists and researchers to discuss challenges related to acquiring, preserving, making available and using web archives.  With over 150 participants, including leading experts – most prominently the internet pioneer Vint Cerf – the conference provided a unique opportunity to learn about web archiving strategies and projects around the world, and to keep up to date with emerging trends in research and latest technological developments.

Vint Cerf, Avoiding a Digital Dark Age

Vint Cerf, Avoiding a Digital Dark Age

The first day, after a warm welcome by Ingibjörk Sverrisdottir, Iceland’s National Librarian, was dedicated to the ‘big questions’ of web archiving: What’s worth saving? (Hjalmar Gislason) and how to avoid a Digital Dark Age? (Vint Cerf). How might new services look like, which tools and strategies for preservation are available (Emulation!), or being developed? Or, in the words of Brewster Kahle, founder of the Internet Archive: ’20 years of Web Archiving – What do we do now?’ (video of his talk introducing the ‘National Library of Atlantis’ prototype for integrated web archive discovery)

Brewster Kahle, What Do We Do Now?

Brewster Kahle, What Do We Do Now?

On the second day, the conference continued with two separate tracks, discussing either policies, practices and strategies for capture and preservation of web material, or looking more at the user side of web archives, and at how web archive data be accessed, searched, analysed and visualised as a resource for research.
The third day was the hands-on day with workshops exploring search interfaces such as the SHINE interface developed at the British Library for the UK Web Archive,  DIY web archiving tools such as webrecorder.io, the open-source platform Warcbase for analysing web archive data, and discussing the future of the WARC archive format.

There was plenty of time for Q&A and discussions between and after the talks and presentations, and open, friendly atmosphere of the conference encouraged informal conversations with web archiving colleagues and networking during coffee and lunch breaks, and on visits like the tour of the National and University Library of Iceland.

The National and University Library of Iceland

The National and University Library of Iceland

Once again it became clear that web archiving practice is at the same time extremely diverse and depending on joint efforts and collaborations:
For example, the priorities in curating a relatively small collection of Electronic Literature at the German Literary Archive Marbach are very different from these in capturing and preserving the .EU domain at the Portuguese National Foundation for Scientific Computing FCCN, owing the scope, size and structure of the collections, and the resources available to build and maintain them. Similarly, quality assurance policies and workflows differ considerably between national domain scale archives, such as the Legal Deposit UK Web Archive containing millions of websites, and specialized archives curated and captured by university libraries like the North Carolina State University. Researchers approach the UK Government Web Archive with different research questions than those they would use to look at archived Twitter data.

But no matter the size and scope of the web archive, the resources available at a web archiving institution, or the focus of a particular project, the underlying challenges are very similar:

  • How do we decide what to capture?
  • How to capture it?
  • How to preserve it for the future?
  • Metadata?
  • How to provide access and facilitate discovery?
  • How to use web archives for research?

Working collaboratively and across disciplines, including perspectives from archivists, curators, IT engineers and researchers seems to be the best way forward, and the practice of sharing knowledge and experience, and to openly discuss problems gets certainly embraced by the web archiving community. A particular project might have ‘failed’ in terms of achieving the intended outcome, but it can still provide valuable lessons for the next project elsewhere, and in the long run, for developing best practice, policies and standards for web archiving as a discipline.

Mistakes are only wrong if you - and others - don't learn from them!

Mistakes are only wrong if you – and others – don’t learn from them!

Curators might be slightly overwhelmed by technical details discussed by web crawl engineers (I certainly was!) and ‘the IT guys (and girls)’ might sometimes be confused by the curatorial way of thinking; web archiving cultures in North America seem to differ considerably from the approaches in Europe, where Legal Deposit regulations have a strong impact on collection strategies and access to archives. STEM researchers look at data in different ways than historians and social scientists.
International conferences like the IIPC Web Archiving Conference 2016 are invaluable for bringing together these different perspectives, for fostering discussion and knowledge sharing and for providing an opportunity to establish new and strengthening existing contacts with web archiving colleagues in archives, (university) libraries and research institutions worldwide.

Archiving social media...

Harvesting social media: Overview…

 

...the details.

…and details.

Web archivists love to produce new social media content:
The conference seen through the participants’ Tweets: #iipcwac16.
(Now we just have to archive that!
)

Not least, the Reykjavik conference provided a rare opportunity to meet web archiving colleagues from other UK Legal Deposit Libraries outside the usual committees and institutional settings. One of the conference lunch breaks was turned into an ad-hoc UK Legal Deposit Web Archive meeting, discussing user interface redevelopment – and where else but in Iceland can you have a Friday late afternoon conference debrief whilst soaking in a giant outdoor geothermal bathtub (aka the Blue Lagoon)?

UK web archivists after conference debrief

Some very clean UK web archivists after the conference debrief

 

 

Catching butterflies

Archival Uncertainties: International Conference on Literary Archives at the British Library – 4 April 2016

This one-day conference focused on digital humanities, with papers from a spectrum of interested parties including academics working on digitisation projects, authors, translators, archivists and curators. I attended three panels on the day and the unifying theme was a contrary message of dispersal and amalgamation (and butterflies).

The first thing that has been dispersed or discarded is any idea of a literary canon. As plenary speaker and archivist Catherine Hobbs pointed out, scholarship now focuses less on established set texts and more on themes like “environmental literature”. Over the past few decades, in response to this, archives have collected more non-traditionally canonical literary papers but, Catherine reminded us, as archivists we can’t stop paying attention to the ways that literature continues to change. We need to keep tabs on what is going on in the literary world in order to document it, and this will include tackling new forms of experimental, avant-garde and self-published writing.

Caterpillar: Schwalbenschwanz (Raupe)

Caterpillars and collection development [By Eric Steinert – photo taken by Eric Steinert at Paussac, France, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=338409]

As Catherine noted, it used to be easy to find the avant-garde – pretty much whoever was hanging out on the Left Bank – but now it’s up to archivists to not only collect this material, but to track it down in the first place, and not to default to the temptingly easy path of collecting only the papers of that tiny sliver of authors considered publishable by mainstream publishers.

Continue reading

Web Archiving Micro-internship – Part 2

On 14 and 15 March eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the second of two guest blog posts on the micro-internship.

The most central aspect of modern life is now the proliferation of digital technology. Since the 1990s, it has become a central mode of communication which is often taken for granted. At the start of this micro-internship, we were introduced to the concept of the digital ‘black hole’, a term used to describe the irrevocable loss of this information. Unlike physical correspondence and materials–the letters, writs, and manuscripts of earlier centuries–so much of what we write is fragile and evanescent. To stem the loss of this digital history, we were shown how the Bodleian Libraries and other legal deposit libraries use domain crawls to capture online content at pre-determined intervals using the W3ACT tool. This then preserves a screen grab of the website on the Internet Archive, namely the waybackmachine, before the website is updated.

Web archiving micro-interns working in the Centre for Digital Scholarship, Weston Libary, March 2016.

Web archiving micro-interns working in the Centre for Digital Scholarship, Weston Libary, March 2016.

The right to a copy of electronic and other non-print publications, such as e-journals and CD-ROMs by legal deposit libraries only came into existence on 6th April 2013. This meant that libraries were able to create an archive of all websites with domains based in the United Kingdom. The recent ‘right to be forgotten’ law adopted by the EU is a signal of the fact that the legal status of digital archives is nevertheless becoming increasingly complicated, particularly when compiling archives of events receiving international commentary, like the upcoming EU referendum. Each of us focused on a different aspect of the EU referendum, reflecting our individual interests, ranging from national newspapers and student newspapers to the blogs of Scottish MSPs, Welsh AMs, and MEPs, and the blogs of solicitors and legal firms’ websites offering advice to businesses and refugees in the event of a ‘Brexit’. One of the trickier views to archive was that of British expats living abroad. In this situation, unless the site can be proven to be based in the UK, we would have to write to the owner of the domain to request permission to archive the website. In a situation where permission was given but the person expressing those views subsequently wished to erase this history under the ‘right to be forgotten’ law adopted by the EU, should the UK have voted to leave the EU, this would leave the archived material in a tricky legal position. We learned during the internship that this would most likely result in the relevant archived material being deleted. However, this is exactly what the archive was set up to prevent and so the tension between the right to privacy and freedom of information on a public platform presents considerable problems to the aim of web archives to be fully comprehensive, aggravated further by the omission of websites with pay walls.

After finding this material and ensuring it was covered by the legal deposit law, it was necessary to classify the site accurately, identifying the main language, and providing titles and descriptions. For newspaper articles, this was relatively straightforward, but for Welsh and Irish-language publications produced by political parties, languages which I am studying at Jesus college, this was more complicated as the only languages available to select from were German or English–a testament to the nascent stage of the web archive’s development. In addition, classifying material was very much up to our own individual discretion and the descriptions to our own style. To complicate things further, the order in which searched-for material should be presented raises further issues, which we discussed at the end of the micro-internship. Namely whether results should be arranged by ‘most popular’, by date of publication, or any other criterion. The discussions and practical experience offered by this internship gave us an opportunity to help address the legal and administrative challenges facing web archivists.

Daniel Taylor

Web Archiving Micro-internship – Part 1

On 14 and 15 March eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the first of two guest blog posts on the micro-internship.

During a micro-internship at the Bodleian’s legal deposit web archives, focusing on the EU referendum collection, we have had an occasion to reflect on the meaning of such an archive, and particularly on its potential for creating meaning.

Web archiving micro-interns on the roof of the Weston Library, March 2016.

Web archiving micro-interns on the roof of the Weston Library, March 2016.

A web archive’s potential document base is clearly much wider than a paper collection’s. No material criteria, such as donations and physical availability, play a defining factor in the content archived. The main restriction placed on this particular archive is that of legal permission, which allows only UK domains to be easily archived. Even so, the scope remains incredibly wide.

Therefore, archiving the web implies a deliberate narrowing of choices on the archivist’s side. Much is left to their discretion.

A lot of what we know of history is defined by the material that is preserved. It is difficult to learn about the working class or women in the past from original sources, as material by and about such people is conspicuously absent from our collections. A contemporary web archivist has the chance to select material that can most broadly represent society. This will make it impossible for future historians to ignore the history of many groups, and will enable research into a variety of thoughts and experiences.

This was reflected and magnified in the approaches that the group of interns took, which evidences the importance of having a range of different people cooperate on the gathering of knowledge. One woman, for example, concentrated on the representation of the Brexit referendum in media specific to certain ethnic and religious groups, such as Judaism. Another made sure to include the views of Scottish, Gaelic and Irish media and organisations, in order to avoid an England-only approach. One of the interns chose to gather information about the way the referendum is seen in small communities, enriching the archive with small local publications. On the first day, I concentrated on the views and representation of immigrants, whose lives will be strongly affected by the referendum. On the second day, I preserved information about women’s roles and views.

Such a wide range of approaches contributes to the broadening and deepening of historical studies. It also positively contributes to contemporary social science. This can happen in two main ways. Firstly, it places virtual documents in a setting that makes their analysis easier. It thus enables social scientists to observe internet trends throughout the years, and compare them to each other. For this purpose, a wide range of archived material is essential, and again the archivist has a role in creating the foundational understanding of British society..

Secondly, and perhaps more interestingly (as the first function can be fulfilled by tools on the live web) they allow social scientists to track trends in academia. A web archive describes what subjects and focuses contemporary academia considers to be salient. It points out what we, as researchers, think is worth being saved from the internet black hole.

The defining potential of this is striking, and this internship allowed us to understand the social, political and historical role of archiving.

Zad El Bacha

DPC Student Conference: What I Wish I Knew Before I Started

The world of digital preservation can appear a bit daunting: a world full of checksums and programming and OAIS models, AIPs and DIPs, combined with the urgency of acting before it all becomes too late and technological obsolescence creates a black hole, swallowing up our digital heritage. The Digital Preservation Coalition’s What I Wish I Knew Before I Started  Student Conference provided an opportunity to meet others beginning to work in digital preservation, and hear advice and reassurance from a range of interesting expert speakers.

Fancy words and acronym bingo

The day began with an Introduction to Digital Preservation by the DPC’s Sharon McMeekin who introduced us to current models, methodologies and frameworks, which she warned could also be known as fancy words and acronym bingo. Her presentation was very practical and informed us about resources which will be invaluable when putting digital preservation into practice. Sharon emphasised the importance of active preservation: it isn’t only the digital materials which are vulnerable to obsolescence, but the digital preservation systems that they are stored in. Crucially, digital preservation needs to be embedded into day-to-day work to make it sustainable.

The need for active preservation was echoed by Steph Taylor from the University of London Computer Centre, who urged us all to learn to keep up to date and engage with the digital preservation community through twitter, blogs and forums. She counselled us to be prepared to explain again and again that digital preservation is really not the same thing as backing up files.

Matthew Addis from Arkivum then gave a technologist’s perspective, introducing us to a range of software and tools including the DROID file format identification tool; the POWRR Grid that maps preservation tools against types of content and stages of their lifecycle; the PRONOM registry of file formats; the Exactly checksum tool, among many others, carrying on the game of acronym bingo. The amount of choice of tools and standards can lead to what Matthew called preservation paranoia and then to preservation paralysis where the task seems so big and complex that it seems better to do nothing at all.

It’s people that are the biggest risk to digital content surviving into the future. People thinking that preservation is too hard, too expensive, or tomorrow’s problem and not today’s. (Addis, 2016)

Being a digital archivist = being an archivist with extra super powers

The afternoon sessions were launched by Adrian Brown from the Parliamentary Archives. The Parliamentary Archives hold a wide range of digital material, from the expected email and audio-visual records to the more surprising virtual reality tours and reconstructions of sinking ships. He emphasised that digital archiving was still essentially archiving, involving selection, appraisal, preservation, cataloguing and supporting users. Being a digital archivist, he said, is the same thing as being an archivist, only with extra super powers.

Next, Glenn Cumiskey, Digital Preservation Manager at the British Museum spoke about the importance of engaging with technology, decision makers and user communities. In the current environment, Glenn  illustrated through the roles associated with digital preservation: Archivist, Records Manager, Librarian, Information Technologist, Digital Humanities, and Software Programmer all at once, that you may need to be all of these things at once.

We then heard from Helen Hockx-Yu from the Internet Archive. Here at the Bodleian, the digital archive trainees are actively involved with the Bodleian Libraries Web Archive which uses the Internet Archive’s ‘Archive-It’ and ‘wayback machine’ services. It was interesting to hear from Helen about the redevelopment work she is involved in and how her own career developed in web archiving. Her final advice to us was to keep learning and not worry about being a perfectionist.

Ann MacDonald from the University of Kent inspired us with a talk about her own career began and developed over the last few years, and emphasised that technical innovations are not all about big machines and that small actions can go a long way in implementing digital preservation.

Only point of digital preservation is reuse of data. Nothing else.

Finally, Dave Thompson, Digital Curator at the Wellcome Collection, gave an entertaining presentation which made the point that digital preservation is not an exercise in technology  for its own sake.  He argued that the only point of digital preservation is the reuse of data, therefore data needs to be reusable, consumable and shareable. Digital preservation should be seized as a social opportunity to do this.

Overall, the DPC’s Student Conference: What I Wish I Knew Before I Started was an engaging mixture of reassurance, ideas and advice to prepare us to begin working practically with digital preservation. Key themes which emerged across the presentations were the importance of people in the process, the importance we must give to what users actually want from digital collections, and the importance of selling the benefits and opportunities that digital preservation can bring. It introduced us to technology, tools and processes, but at the same time stressed that you do not need to be a qualified programmer to work in digital preservation.

Preserving Social Media – a briefing day

This post is a bit late as the DPC briefing day on Preserving Social Media was almost a month ago, but our excuse is that there was a lot of food for thought!

As digital archives trainees Rachael and I have spent a lot of time thinking about preserving social media (a bit sad maybe, but true!). Everyone loves web 2.0: It’s dynamic and complex; it gives us the ability to communicate and interact across continents; and it’s a giant headache if you’re trying to archive it!

So as you can see we were quite excited about this briefing day, and it did not disappoint!

Throughout the day the talks were pretty evenly split between various means of capturing and curating social media and how researchers looked to access and use it, as well as the quality of datasets they were able to pull from it. They also touched on the legal ramifications of preserving it and there were a few case studies that discussed lessons learnt from institutions that are actively collecting social media.

Nathan Cunningham introduced us to the concept of the Big Data Network and the UK Data Archive. He talked about how much data and metadata the web was currently generating and the funding that the government was putting into it.

Sara Thomson’s keynote focused on different strategies for capturing and curating social media, such as: the pros and cons of Platform APIs, Data Resellers, Third-party Services and Platform Self-Archiving Services.  She also argued the need for better integration of Social Media with Web Archives in order to contextualize the social media; including preserving archived pages of content that URLs link to. She also focuses on more collaboration between institutions in terms of resources, access and methods/knowledge and within institutions with their own researchers and end users.

Stephen Daisley from STV talked about Social Media & Journalism, about how it provided diverse and up-to-date coverage through non-traditional channels and its use as a tool for those underrepresented in mainstream media.

After lunch we had Katrin Weller from GESIS discuss how social scientists were using social media (For research! Not lolcats!) and the challenges of collecting, sharing and documentation. Going back to the methods that Sara Thomson listed in her keynote, most involve a third party and have restrictions on how the data can be shared, what tools can be used on it, how much data they give you. She highlighted the difficulties this can cause when researchers want to replicate or expand upon another researcher’s work as well as other issues that come from using data that they researcher has not collected.

Tom Storrar from the National Archives rounded off the presentations with a talk on how the UK Government’s social media presence was being captured for posterity. His project was to capture the UK Government’s official Twitter presence. This involved deciding what would be in scope including content and metadata, how they would collect this data and finally how they would present it.

Emily:

While I found Sara’s keynote interesting and quite informative—especially in terms of what is available out there and a balanced view of what they have to offer—it wasn’t as relevant as I had hoped as it was focused more on someone else providing the data to you rather than the tools you can use to collect what you are interested in. While there are many benefits to having authorised data resellers or the platform itself giving you archiving abilities (especially being able to harvest all the metadata associated with it) I like the flexibility and power that we get with Archive-IT (though of course in some ways it will be a much shallower collection as we only collect what the end-user sees) and the fact that we aren’t restricted to the data that the providers think we want.

I’m glad that she talked about the need for collaboration so that we don’t all try to reinvent the wheel. At the Bodleian we’re quite lucky because we work closely with other legal deposit libraries to capture web content (including social media) so we regularly have the opportunity to discuss and learn from each other’s experiences. We also have our own Bodleian Library Web Archive where we encourage our own researchers to use it as a repository and a resource that they can help us grow.

One thing that I found problematic was Stephen Daisley’s talk. Well not problematic, but perhaps a bit naïve? While I agreed with some of his points, I think he romanticises the notion of social media as the great equaliser. I can think off the top of my head at least one quite large group of underrepresented voices that are not getting their say in social media; the elderly. And I’m sure that there are many examples that you can come up with if you stop to think of it too. Just because the barrier to access is much lower than traditional news stations does not mean there is no barrier. The vast amount of data and metadata generated makes it tempting to believe that that is the whole of the story but I think we need to remember who isn’t part of the conversation.

I also really enjoyed Tom Storrar’s presentation because it highlights the need to have a clear collection policy, to realise you can’t and shouldn’t capture everything, and to make your decisions transparent so that researchers will know exactly what they do and do not have to work with.

Rachael:

Although the talks on Big Data and social science research were less relevant to our work on the Bodleian Libraries Web Archive, it was an eye-opening introduction to the sheer amount of digital data which is collected. This might be commercial research, profiting from the amount of information we can give to social media sites such as our name, nationality, photos, mobile number, address, and interests; or for forecasting purposes such as predicting results of political elections; or for academic study in areas such as activism, audiences, networks and crisis communication and response. I think Katrin Weller certainly succeeded in dismissing the claim that ‘99% of tweets are worthless babble’ – Weller, Social Media as Research Data, 27/10/2015.

Like Emily, I also enjoyed Tom Storrar’s presentation on the capture of government bodies’ Twitter and YouTube feeds. For me it really highlighted how complex the web of legislation is, requiring them to adapt to changing circumstances. If an organisation ceases to be a government body, the National Archives no longer has the right to capture its social media content. Because of these legal restrictions, no retweets or YouTube comments are captured, which means it is a one-way conversation. I think this is a shame, as we are losing that interaction which is so essential to social media. If YouTube comments are modern day equivalents to the letters sent to the government to comment on its policies, should we be preserving them?

Overall the day was full of fascinating talks and discussions on how to move forward in preserving social media. But, the best part of the briefing day was knowing we weren’t alone! We got to talk to people approaching preserving social media from very different angles; the BBC, the National Archives, etc. And even though we all had different mandates and different foci we still found a lot of common ground.

Event: Exploring the UK Web, 11 December 2015

 

Wab Archives TalkExploring the UK Web:
An introduction to web archives as scholarly resources

11 December 2015
2.00pm – 4.00pm

Venue: Lecture Theatre, Weston Library

Speakers: Jason Webber, Prof Jane Winters, Dr Gareth Millward, Prof Ralph Schroeder

‘The Web’, in the 25 years of its existence, has become deeply ingrained in modern life: it is where we find information, communicate, research, share ideas, shop, get entertained, set and follow trends and, increasingly, live our social lives.
As much as we rely on traditional paper archives today to find out about the past, for anyone trying to understand life in the late 20th and early 21st century, archived websites will be an invaluable resource.

Join us and our expert panel for an afternoon of exploring the archives of the UK web space, focusing on their potential use for research and teaching. Short presentations will introduce the resources and tools available for web archives research in the UK, and the opportunities (and challenges) they come with in theory and practice: from web archives curation, preservation and research tool development at the British Library, to current research in the Big UK Domain Data for the Arts and Humanities (BUDDAH) Project and at the Oxford Internet Institute.
Afterwards there will be plenty of time for questions and discussion – your chance to ask everything you ever wanted to know about web archives and to contribute your thoughts and ideas to an emerging discipline.

Admission free. All welcome.
To secure a place, please complete our booking form via What’s on

Jason Webber is the Web Archiving Engagement and Liaison Manager at the British Library, working with the UK Web Archive and the Legal Deposit Web Archive.
Jane Winters is Professor of Digital History at the Institute of Historical Research, and Principal Investigator in the BUDDAH Project.
Gareth Millward is a Research Fellow at the London School of Hygiene and Tropical Medicine and one of the BUDDAH Project bursary holders.
Ralph Schroeder is a Senior Research Fellow at the Oxford Internet Institute.