Tag Archives: Web Archives

The UK Web Archive: Mental Health, Social Media and the Internet Collection

The UK Web Archive hosts several Special Collections, curating material related to a particular theme or subject. One such collection is on Mental Health, Social Media and the Internet.

Since the advent of Web 2.0, people have been using the Internet as a platform to engage and connect, amongst other things, resulting in new forms of communication, and consequently new environments to adapt to – such as social media networks. This collection aims to illustrate how this has affected the UK, in terms of the impact on mental health. This collection will reflect the current attitudes displayed online within the UK towards mental health, and how the Internet and social media are being used in contemporary society.

We began curating material in June 2017, archiving various types of web content, including: research, news pieces, UK based social media initiatives and campaigns, charities and organisations’ websites, blogs and forums.

Material is being collected around several themes, including:

Body Image
Over the past few years, there has been a move towards using social media to discuss body image and mental health. This part of the collection curates material relating to how the Internet and social media affect mental health issues relating to body image. This includes research about developing theory in this area, news articles on various individuals experiences, as well as various material posted on social media accounts discussing this theme.

Cyber-bullying
This theme curates material, such as charities and organisations’ websites and social media accounts, which discuss, raise awareness and tackle this issue. Furthermore, material which examines the impact of social media and Internet use on bullying such as news articles, social media campaigns and blog posts, as well as online resources created to aid with this issue, such as guides and advice, are also collected.

Addiction

This theme collects material around gaming and other  Internet-based activities that may become addictive such as social media, pornography and gambling. It includes recent UK based research, studies and online polls, social media campaigns, online resources, blogs and news articles from individuals and organisations. Discourse, discussions, opinion and actions regarding different aspects of Internet addition are all captured and collected in this overarching catchment term of addiction, including social media addiction.

The Mental Health, Social Media and the Internet Special Collection, is available via the new UK Web Archive Beta Interface!

Co authored with Carl Cooper

The UK Web Archive Ebola Outbreak collection

By CDC Global (Ebola virus) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

By CDC Global (Ebola virus) [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons

Next month marks the four year anniversary of the WHO’s public announcement of “a rapidly evolving outbreak of Ebola virus disease (EVD)” that went on to become the deadliest outbreak of EVD in history.

With more than 28,000 cases and 11,000 deaths, it moved with such speed and virulence that–though concentrated in Guinea, Liberia and Sierra Leone–it was feared at the time that the Ebola virus disease outbreak of 2014-2016 would soon spread to become a global pandemic.

No cure or vaccine has yet been discovered and cases continue to flare up in West Africa. The most recent was declared over on 2 July 2017. Yet today most people in the UK unless directly affected don’t give it a second thought.

Searching online now, you can find fact sheets detailing everything you might want to know about patient zero and the subsequent rapid spread of infection. You can find discussions detailing the international response (or failure to do so) and lessons learned. You might even find the reminiscences of aid workers and survivors. But these sites all examine the outbreak in retrospect and their pages and stories have been updated so often that posts from then can no longer be found.

Posts that reflected the fear and uncertainty that permeated the UK during the epidemic. The urgent status updates and travel warnings.  The misinformation that people were telling each other. The speculation that ran riot. The groundswell of giving. The mobilisation of aid.

Understandably when we talk about epidemics the focus is on the scale of physical suffering: numbers stricken and dead; money spent and supplies sent; the speed and extent of its spread.

Whilst UKWA regularly collects the websites of major news channels and governmental agencies, what we wanted to capture was the public dialogue on, and interpretation of, events as they unfolded. To see how local interests and communities saw the crisis through the lenses of their own experience.

To this end, the special collection Ebola Outbreak, West Africa 2014 features a broad selection of websites concerning the UK response to the Ebola virus crisis. Here you can find:

  • The Anglican community’s view on the role of faith during the crisis;
  • Alternative medicine touting the virtues of liposomal vitamin C as a cure for Ebola;
  • Local football clubs fundraising to send aid;
  • Parents in the UK withdrawing children from school because of fear of the virus’ spread;
  • Think tanks’ and academics’ views on the national and international response;
  • Universities issuing guidance and reports on dealing with international students; and more.

Active collection for Ebola began in November 2014 at the height of the outbreak whilst related websites dating back to the infection of patient zero in December 2013 have been retrospectively added to the collection. Collection continued through to January 2016, a few months before the outbreak began tailing off in April 2016.

The Ebola collection is available via the UK Web Archive’s new beta interface.

#WAWeek2017 – Researchers, practitioners and their use of the archived web

This year, the world of web archiving  saw a premiere: not only were the biennial RESAW conference and the IIPC conference, established in 2016, held jointly for the first time, but they also formed part of a whole week of workshops, talks and public events around web archives – Web Archiving Week 2017 (or #WAWeek2017 for the social medially inclined).

After previous conferences Reykjavik (2016) and Arhus (RESAW 2015), the big 2017 event was held in London, 14-16 June 2017, organised jointly by the School of Advanced Studies of the University of London, the IIPC and the British Library.
The programme was packed full of an eclectic variety of presentations and discussions, with topics ranging from the theory and practice of curating web archive collections or capturing whole national web domains, via technical topics such as preservation strategies, software architecture and data management, to the development of methodologies and tools for using web archives based research and case studies of their application.

Even in digital times, who doesn’t like a conference pack? Of course, the full programme is also available online. (…but which version will be easier to archive?)

Continue reading

Researchers,practitioners and their use of the archived web. IIPC Web Archiving Conference 15th June 2017

From the 14th – 16th of June researchers and practitioners from a global community came together for a series of talks, presentations and workshops on the subject of Web Archiving at the IIPC Web Archiving Conference. This event coincided with Web Archiving Week 2017, a week long event running from 12th – 16th June hosted by the British Library and the School of Advance Study

I was lucky enough to attend the conference  on the 15th June with a fellow trainee digital archivist and listen to some thoughtful, engaging and challenging talks.

The day started with a plenary in which John Sheridan, Digital Director of the National Archives, spoke about the work of the National Archives and the challenges and approaches to Web Archiving they have taken. The National Archives is principally the archive of the government, it allows us to see what the state saw through the state’s eyes. Archiving government websites is a crucial part of this record keeping as we move further into the digital age where records are increasingly born-digital. A number of points were made which highlighted the motivations behind web archiving at the National Archives.

  • They care about the records that government are publishing and their primary function is to preserve the records
  • Accountability for government services online or information they publish
  • Capturing both the context and content

By preserving what the government publishes online it can be held accountable, accountability is one aspect that demonstrates the inherent value of archiving the web. You can find a great blog post on accountability and digital services by Richard Pope in this link.  http://blog.memespring.co.uk/2016/11/23/oscon-2016/

The published records and content on the internet provides valuable and crucial context for the records that are unpublished, it links the backstory and the published records. This allows for a greater understanding and analysis of the information and will be vital for researchers and historians now and into the future.

Quality assurance is a high priority at the National Archives. By having a narrow focus of crawling, it has allowed for but also prompted a lot of effort to be directed into the quality of the archived material so it has a high fidelity in playback. To keep these high standards it can take weeks in order to have a really good in-depth crawl. Having a small curated collection it is an incentive to work harder on capture.

The users and their needs were also discussed as this often shapes the way the data is collected, packaged and delivered.

  • Users want to substantiate a point. They use the archived sites for citation on Facebook or Twitter for example
  • The need to cite for a writer or researcher
  • Legal – What was the government stance or law at the time of my clients case
  • Researchers needs – This was highlighted as an area where improvements can be made
  • Government itself are using the archives for information purposes
  • Government websites requesting crawls before their website closes – An example of this is the NHS website transferring to a GOV.UK site

The last part of the talk focused on the future of web archiving and how this might take shape at the National Archives. Web archiving is complex and at times chaotic. Traditional archiving standards have been placed upon it in an attempt to order the records. It was a natural evolution for information managers and archivists to use the existing knowledge, skills and standards to bring this information under control. This has resulted in difficulties in searching across web archives, describing the content and structuring the information. The nature of the internet and the way in which the information is created means that uncertainty has to inevitably be embraced. Digital Archiving could take the turn into the 2.0, the second generation and move away from the traditional standards and embrace new standards and concepts. One proposed method is the ICA Records in Context conceptual model. It proposes a multidimensional description with each ‘ thing ‘ having a unique description as opposed to the traditional unit of description (one size fits all).  Instead of a single hierarchical fonds down approach, the Records in Context model uses a  description that can be formed as a network or graph. The context of the fonds is broader, linking between other collections and records to give different perspectives and views. The records can be enriched this way and provide a fuller picture of the record/archive. The web produces content that is in a constant state of flux and a system of description that can grow and morph over time, creating new links and context would be a fruitful addition.

Visual Diagram of How the Records in Context Conceptual Model works

“This example shows some information about P.G.F. Leveau a French public notary in the 19th century including:
• data from the Archives nationales de France (ANF) (in blue); and
• data from a local archival institution, the Archives départementales du Cher (in yellow).” INTERNATIONAL COUNCIL ON ARCHIVES: RECORDS IN CONTEXTS A CONCEPTUAL MODEL FOR ARCHIVAL DESCRIPTION.p.93

 

Traditional Fonds Level Description

 

I really enjoyed the conference as a whole and the talk by John Sheridan. I learnt a lot about the National Archives approach to web archiving, the challenges and where the future of web archiving might go. I’m looking forward to taking this new knowledge and applying it to the web archiving work I do here at the Bodleian.

Changes are currently being made to the National Archives Web Archiving site and it will relaunch on the 1st July this year.  Why don’t you go and check it out.

 

 

 

EU Referendum Web Archiving Mini-internship – Part 1

On 20 and 21 June eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the first of two guest blog posts on the micro-internship.

Web archiving micro-interns on the roof of the Weston Library, June 2016.

Web archiving micro-interns on the roof of the Weston Library, June 2016.

Using library archives for their research is not a novelty for any student or scholar. However, web archives represent a completely new dimension of swiftly evolving research methods – they intend to document what is posted online – a  relatively recent form of data collection due to scientific advancements.

For researchers used to traditional archives, the need to store and analyse this data might be not really understandable, however, web archiving, despite being relatively new, is very significant. Firstly, it allows us to store information for generations of future historians and sociologists – contrary to the common perception, many data held on World Wide Web disappears or changes very frequently and rapidly. Secondly, it might be an asset for those pursuing topical research projects in the present – recent technologies (such as prototype SHINE database for historical research) allow us to trace data trends and come to important and fascinating conclusions. Therefore, even if some might underrate web archives, it surely does not diminish their utility to academia.

In the eve of the Brexit referendum, which sparked many debates and discussions in British web space, timely creation of a web collection has proven to be very important – after all, the decision is likely to have long-term consequences for our society, economy, and legal system. Traditionally, individual narratives and civic engagement are set aside when documenting major political decisions. However, a web collection can significantly improve this situation by collecting diverse standpoints expressed in the web sphere. This, in my opinion, perfectly mirrors the ethos of direct democracy where every vote and view counts.

However, important as it is, web archiving comes with a range of practical and ethical obstacles: with huge masses of information being stored online it is very hard to choose what is worthy of being preserved for future generations. Legal restrictions, such as the recent legal deposit legislation, also significantly limit the scope of archivists’ work. During my micro-internship I, along with other interns, tried to overcome these obstacles as much as possible, minimising bias and efficiently using our time resources and server memory. Even in the era of technology, it is the human resources and individual judgment that shape the scope and direction of the collection.

Working on a web collection, especially since the campaigning has increased just before the referendum, was very challenging. However, as interns, we tackled the masses of information by focusing on individual areas of knowledge. Our work on the project was also aided by the guidance provided by our supervisors and discussions on ethical and scientific implications of our research. This was a very rewarding insight into a new area of knowledge, and I am convinced that skills and knowledge acquired and applied by me during the internship will aid me in my future research career.

Anna Lukina

Web Archiving Micro-internship – Part 2

On 14 and 15 March eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the second of two guest blog posts on the micro-internship.

The most central aspect of modern life is now the proliferation of digital technology. Since the 1990s, it has become a central mode of communication which is often taken for granted. At the start of this micro-internship, we were introduced to the concept of the digital ‘black hole’, a term used to describe the irrevocable loss of this information. Unlike physical correspondence and materials–the letters, writs, and manuscripts of earlier centuries–so much of what we write is fragile and evanescent. To stem the loss of this digital history, we were shown how the Bodleian Libraries and other legal deposit libraries use domain crawls to capture online content at pre-determined intervals using the W3ACT tool. This then preserves a screen grab of the website on the Internet Archive, namely the waybackmachine, before the website is updated.

Web archiving micro-interns working in the Centre for Digital Scholarship, Weston Libary, March 2016.

Web archiving micro-interns working in the Centre for Digital Scholarship, Weston Libary, March 2016.

The right to a copy of electronic and other non-print publications, such as e-journals and CD-ROMs by legal deposit libraries only came into existence on 6th April 2013. This meant that libraries were able to create an archive of all websites with domains based in the United Kingdom. The recent ‘right to be forgotten’ law adopted by the EU is a signal of the fact that the legal status of digital archives is nevertheless becoming increasingly complicated, particularly when compiling archives of events receiving international commentary, like the upcoming EU referendum. Each of us focused on a different aspect of the EU referendum, reflecting our individual interests, ranging from national newspapers and student newspapers to the blogs of Scottish MSPs, Welsh AMs, and MEPs, and the blogs of solicitors and legal firms’ websites offering advice to businesses and refugees in the event of a ‘Brexit’. One of the trickier views to archive was that of British expats living abroad. In this situation, unless the site can be proven to be based in the UK, we would have to write to the owner of the domain to request permission to archive the website. In a situation where permission was given but the person expressing those views subsequently wished to erase this history under the ‘right to be forgotten’ law adopted by the EU, should the UK have voted to leave the EU, this would leave the archived material in a tricky legal position. We learned during the internship that this would most likely result in the relevant archived material being deleted. However, this is exactly what the archive was set up to prevent and so the tension between the right to privacy and freedom of information on a public platform presents considerable problems to the aim of web archives to be fully comprehensive, aggravated further by the omission of websites with pay walls.

After finding this material and ensuring it was covered by the legal deposit law, it was necessary to classify the site accurately, identifying the main language, and providing titles and descriptions. For newspaper articles, this was relatively straightforward, but for Welsh and Irish-language publications produced by political parties, languages which I am studying at Jesus college, this was more complicated as the only languages available to select from were German or English–a testament to the nascent stage of the web archive’s development. In addition, classifying material was very much up to our own individual discretion and the descriptions to our own style. To complicate things further, the order in which searched-for material should be presented raises further issues, which we discussed at the end of the micro-internship. Namely whether results should be arranged by ‘most popular’, by date of publication, or any other criterion. The discussions and practical experience offered by this internship gave us an opportunity to help address the legal and administrative challenges facing web archivists.

Daniel Taylor

Preserving Social Media – a briefing day

This post is a bit late as the DPC briefing day on Preserving Social Media was almost a month ago, but our excuse is that there was a lot of food for thought!

As digital archives trainees Rachael and I have spent a lot of time thinking about preserving social media (a bit sad maybe, but true!). Everyone loves web 2.0: It’s dynamic and complex; it gives us the ability to communicate and interact across continents; and it’s a giant headache if you’re trying to archive it!

So as you can see we were quite excited about this briefing day, and it did not disappoint!

Throughout the day the talks were pretty evenly split between various means of capturing and curating social media and how researchers looked to access and use it, as well as the quality of datasets they were able to pull from it. They also touched on the legal ramifications of preserving it and there were a few case studies that discussed lessons learnt from institutions that are actively collecting social media.

Nathan Cunningham introduced us to the concept of the Big Data Network and the UK Data Archive. He talked about how much data and metadata the web was currently generating and the funding that the government was putting into it.

Sara Thomson’s keynote focused on different strategies for capturing and curating social media, such as: the pros and cons of Platform APIs, Data Resellers, Third-party Services and Platform Self-Archiving Services.  She also argued the need for better integration of Social Media with Web Archives in order to contextualize the social media; including preserving archived pages of content that URLs link to. She also focuses on more collaboration between institutions in terms of resources, access and methods/knowledge and within institutions with their own researchers and end users.

Stephen Daisley from STV talked about Social Media & Journalism, about how it provided diverse and up-to-date coverage through non-traditional channels and its use as a tool for those underrepresented in mainstream media.

After lunch we had Katrin Weller from GESIS discuss how social scientists were using social media (For research! Not lolcats!) and the challenges of collecting, sharing and documentation. Going back to the methods that Sara Thomson listed in her keynote, most involve a third party and have restrictions on how the data can be shared, what tools can be used on it, how much data they give you. She highlighted the difficulties this can cause when researchers want to replicate or expand upon another researcher’s work as well as other issues that come from using data that they researcher has not collected.

Tom Storrar from the National Archives rounded off the presentations with a talk on how the UK Government’s social media presence was being captured for posterity. His project was to capture the UK Government’s official Twitter presence. This involved deciding what would be in scope including content and metadata, how they would collect this data and finally how they would present it.

Emily:

While I found Sara’s keynote interesting and quite informative—especially in terms of what is available out there and a balanced view of what they have to offer—it wasn’t as relevant as I had hoped as it was focused more on someone else providing the data to you rather than the tools you can use to collect what you are interested in. While there are many benefits to having authorised data resellers or the platform itself giving you archiving abilities (especially being able to harvest all the metadata associated with it) I like the flexibility and power that we get with Archive-IT (though of course in some ways it will be a much shallower collection as we only collect what the end-user sees) and the fact that we aren’t restricted to the data that the providers think we want.

I’m glad that she talked about the need for collaboration so that we don’t all try to reinvent the wheel. At the Bodleian we’re quite lucky because we work closely with other legal deposit libraries to capture web content (including social media) so we regularly have the opportunity to discuss and learn from each other’s experiences. We also have our own Bodleian Library Web Archive where we encourage our own researchers to use it as a repository and a resource that they can help us grow.

One thing that I found problematic was Stephen Daisley’s talk. Well not problematic, but perhaps a bit naïve? While I agreed with some of his points, I think he romanticises the notion of social media as the great equaliser. I can think off the top of my head at least one quite large group of underrepresented voices that are not getting their say in social media; the elderly. And I’m sure that there are many examples that you can come up with if you stop to think of it too. Just because the barrier to access is much lower than traditional news stations does not mean there is no barrier. The vast amount of data and metadata generated makes it tempting to believe that that is the whole of the story but I think we need to remember who isn’t part of the conversation.

I also really enjoyed Tom Storrar’s presentation because it highlights the need to have a clear collection policy, to realise you can’t and shouldn’t capture everything, and to make your decisions transparent so that researchers will know exactly what they do and do not have to work with.

Rachael:

Although the talks on Big Data and social science research were less relevant to our work on the Bodleian Libraries Web Archive, it was an eye-opening introduction to the sheer amount of digital data which is collected. This might be commercial research, profiting from the amount of information we can give to social media sites such as our name, nationality, photos, mobile number, address, and interests; or for forecasting purposes such as predicting results of political elections; or for academic study in areas such as activism, audiences, networks and crisis communication and response. I think Katrin Weller certainly succeeded in dismissing the claim that ‘99% of tweets are worthless babble’ – Weller, Social Media as Research Data, 27/10/2015.

Like Emily, I also enjoyed Tom Storrar’s presentation on the capture of government bodies’ Twitter and YouTube feeds. For me it really highlighted how complex the web of legislation is, requiring them to adapt to changing circumstances. If an organisation ceases to be a government body, the National Archives no longer has the right to capture its social media content. Because of these legal restrictions, no retweets or YouTube comments are captured, which means it is a one-way conversation. I think this is a shame, as we are losing that interaction which is so essential to social media. If YouTube comments are modern day equivalents to the letters sent to the government to comment on its policies, should we be preserving them?

Overall the day was full of fascinating talks and discussions on how to move forward in preserving social media. But, the best part of the briefing day was knowing we weren’t alone! We got to talk to people approaching preserving social media from very different angles; the BBC, the National Archives, etc. And even though we all had different mandates and different foci we still found a lot of common ground.

Event: Exploring the UK Web, 11 December 2015

 

Wab Archives TalkExploring the UK Web:
An introduction to web archives as scholarly resources

11 December 2015
2.00pm – 4.00pm

Venue: Lecture Theatre, Weston Library

Speakers: Jason Webber, Prof Jane Winters, Dr Gareth Millward, Prof Ralph Schroeder

‘The Web’, in the 25 years of its existence, has become deeply ingrained in modern life: it is where we find information, communicate, research, share ideas, shop, get entertained, set and follow trends and, increasingly, live our social lives.
As much as we rely on traditional paper archives today to find out about the past, for anyone trying to understand life in the late 20th and early 21st century, archived websites will be an invaluable resource.

Join us and our expert panel for an afternoon of exploring the archives of the UK web space, focusing on their potential use for research and teaching. Short presentations will introduce the resources and tools available for web archives research in the UK, and the opportunities (and challenges) they come with in theory and practice: from web archives curation, preservation and research tool development at the British Library, to current research in the Big UK Domain Data for the Arts and Humanities (BUDDAH) Project and at the Oxford Internet Institute.
Afterwards there will be plenty of time for questions and discussion – your chance to ask everything you ever wanted to know about web archives and to contribute your thoughts and ideas to an emerging discipline.

Admission free. All welcome.
To secure a place, please complete our booking form via What’s on

Jason Webber is the Web Archiving Engagement and Liaison Manager at the British Library, working with the UK Web Archive and the Legal Deposit Web Archive.
Jane Winters is Professor of Digital History at the Institute of Historical Research, and Principal Investigator in the BUDDAH Project.
Gareth Millward is a Research Fellow at the London School of Hygiene and Tropical Medicine and one of the BUDDAH Project bursary holders.
Ralph Schroeder is a Senior Research Fellow at the Oxford Internet Institute.

Web Archiving at the Bodleian

Web archiving is a relatively new initiative which is becoming more and more of a priority as we realise how rapidly the World Wide Web is expanding and how transient web pages can be. The Bodleian Libraries is working to ensure meaningful online content is captured for posterity and future research.

The British Library’s UK Web Archive blog published a worrying chart of how many URLs are now irrecoverable because the content is simply no longer available online:

eya blog pic 2

(‘What is still on the web after 10 years of archiving?’, UK Web Archive Blog, 2014)

To combat this in the future, the Bodleian has been contributing to the British Library’s UK Web Archive, alongside the five other legal deposit libraries for the UK (the British Library, the National Library of Scotland, the National Library of Wales, Cambridge University Library, and the library of Trinity College Dublin). We do this by selecting sites to be archived and deciding how often snapshots of their content should be taken, which ranges from weekly to annually to just a one-off interactive picture of the site. The Bodleian has ensured the World Wide Web’s recording of significant global happenings has been captured by curating collections on the Ebola epidemic and Typhoon Haiyan. As well as this, the Bodleian contributes to collections managed by all the legal deposit libraries, such as the UK General Election and the Scottish Independence Referendum, and offers input into what sites should be considered key sites and crawled regularly. These cover a broad range of subjects, from news sites to governmental sites to sports sites, to ensure the strongest representation of society today is preserved.

As well as this initiative, the Bodleian has been developing its own web archive, which seeks to archive sites which relate to the University of Oxford, and to the Bodleian’s archival holdings. We are working hard to capture the websites of the various colleges, departments and sub-divisions which make up the university, as well as building web archive collections around the subjects of Arts and Humanities; International; Science, Medicine and Technology and Social Sciences to complement and strengthen our physical holdings. Sites include those relating to J.R.R. Tolkien, the Conservative Party and research sites on colonialism and the British Empire. We welcome public nominations for sites you deem worthy of perpetual preservation, and also invite the public to consult our current web archives. You can find links to both here.

Websites crawled in the UK Web Archive are produced in the United Kingdom and so can be crawled under the E-Legal deposit act. The Bodleian’s Web Archive, on the other hand, relies on gaining permission from the website owner to capture the website. If permission is granted, we add it to our collections, and set it to a One-Time, Monthly, Bi-monthly, Quarterly, Semiannual or Annual crawl, and the captures are available online after each time they are produced. The work does not stop there though, as websites are constantly updated, which means we need to check collection-crawls at determined intervals to make sure we are still preserving accessible content.

Since beginning the web archive in March 2011, we have captured a broad range of websites, and have accessible archives of content that is no longer available, such as the webpages for the Conservative Women’s Organisation for Yorkshire and the South West.

As well as preserving valuable transitory content, the web archive charts the development of websites. A screenshot of the Bodleian Libraries’ homepage captured in October 2011 in contrast to that taken in October 2015 demonstrates how much websites transform visually and aesthetically, as well as documenting their content changing.

eya blog pic 1

(capture of www.bodleian.ox.ac.uk, October 2011)

eya blog pic 3

(capture of www.bodleian.ox.ac.uk, October 2015)

If you would like to learn more about using web archives as scholarly resources, there will be a free public lecture on the subject on the 11th December 2015. You can reserve tickets here.

Web Archives as Scholarly Sources: Issues, Practices and Perspectives

RESAW conference in Aarhus, 8-10 June 2015

Web archiving has been part of Special Collections work at the Bodleian Library for quite a while now, both in cooperation with other UK Legal Deposit Libraries within the electronic Legal Deposit framework (since 2013) , and through the Bodleian Libraries’ own Web Archive.
But whereas the amount of archived web material – at the Bodleian and elsewhere – is constantly growing, the usage of these new resources has so far been quite low, with, it seems, scholars being largely unaware of the potential web archives have as sources for research or lacking knowledge and skills of how to work with such material, and web archiving institutions lacking resources to promote their web archive collection and support their use.

The Research Infrastructure for the Study of Archived Web Materials (RESAW) network aims to promote the establishing of a collaborative European research infrastructure for the study of archived web materials. This means collaborating internationally as well as interdisciplinary to meet the challenges – and the opportunities – archived web materials bring to develop new methods and approaches in research and teaching.

DSC00997

One of the topics: How to archive Social Media content?  And how to use archived Social Media content as scholarly sources?

Tweets from the conference have been collected via Storify. Thanks to Jane Winters from the Institute for Historical Research, University of London, for having set this up.  

The 2015 RESAW conference, hosted by the University of Aarhus in Denmark, was the third in a series of conferences: the first conference in 2001 focused on how to preserve web content, the second in 2008 on web archives theory, and this year’s third conference on the actual use of web archives in research.
Participants included over 80 web archivists, curators, researchers, and IT experts  from various disciplines  from Canada, Denmark, Finland, France, Germany, Italy, Israel, the Netherlands, Russia, the UK, the United Arabic Emirates, and the USA, representing public and private archives, state and university libraries, research institutions, IT service providers and web archiving consultants.

For an intense three days, keynote speeches, and short and long papers alternated panel discussions, with speakers and presenters reporting on their approaches to and practical experiences in archiving websites and in using archived web material for research.
Whereas the individual case studies came from very different backgrounds – focusing on YouTube or social media, exploring possible new tools and methodologies for web archiving and web archives analysis, dealing with the use of Big Data or small datasets in research disciplines from anthropology and linguistics to international relations and migration studies, looking at academic websites, popular culture, internet governance, citizen involvement and even troll communities – it soon became clear that the individual results would lead to common conclusions:

Archived web materials are ‘different’ from both traditional paper-based resources and from the live web. Therefore, existing research theories and traditional approaches to collecting and curating are often not useful when dealing with web materials; new methodologies need to be developed, new questions to be asked. On a practical basis, there is a big need for new tools to deal with the sheer amount of data available for research,  for example to filter and analyze web archive collections, and to visualize results.
Archiving web materials, curating collections, and using them as scholarly sources requires a great amount of resources  – staff/time, knowledge and expertise, technical infrastructure and tools. To use the existing resources as efficiently as possible, archivists, curators, researches of different disciplines, IT experts and service providers need to collaborate.  Pooling resources across institutions and creating (international) networks to share knowledge and experience seems to be the way forward.

Anna Perricci, Columbia University, on the importance of building web archiving collaborations

Anna Perricci, Columbia University, on the importance of building web archiving collaborations

Communication and openness are key! Archivists and curators should make web archiving processes transparent and explain to scholars what type of material and information they can realistically expect to find in web archives (and what is likely not to be included!).  Researchers should clearly express their needs and expectations, but at the same time, be willing to engage with a new type of resource, requiring new approaches, and at least basic IT skills. IT experts should develop easy to use and transparent tools, and share technical knowledge that helps to interpret archived web materials. Users should feed their experience back to curators and developers to help improve web archives selection, metadata/description and discovery tools.

Web archiving is still a young discipline – and research based on archived web material is an even younger one. There are no golden ‘how to’ rules, standards or ‘ultimate authorities’ yet, everyone is still learning. Individual projects encountering problems, or even ‘failing’ to achieve the desired outcome, can still provide valuable lessons to learn from for others. Successes, e.g. in developing and using methodologies and tools for web archiving and using web archives, can be the starting point for developing best practice guidelines in the medium to long term. Again, this requires communication and collaboration within and across institutions, professions, disciplines and countries.

Gareth Millward sharing his experience from the BUDDAH project

A case study of using Web archives as scholarly sources: Gareth Millward sharing his experience exploring the evolution of  disability organisation websites through the UK Domain Data Archive.

The conference’s big strength, apart from giving web archiving professionals and web archives users the opportunity to present their recent and ongoing projects and – in many cases – asking the other conference participants for input and advice, was certainly to bring together people concerned with web archives from a great variety of backgrounds, thus enabling exchange of ideas, debate and networking. There were many eye-opening moments in terms of discovering someone else, in a different institution in a different country, has been working on similar topics or encountered similar problems.

Knowing how and with which result web archived materials were used in other institutions will be very valuable if and when the Bodleian Libraries decide to promote their own web archive collections. At the same time, getting in touch with web archiving colleagues in the UK and internationally offers much potential for collaborations in future projects.
For example, the Tomsk State University in Russian is currently trying to establish a web archive similar to the Bodleian Libraries’ Web Archives, whilst research projects run at the Institute for Historical Research of the University of London  as part of the Big UK Domain Data for the Arts and Humanities Project in cooperation with the British Library could be used as examples to promote the scholarly use of the UK (Legal Deposit) Web Archive in Oxford.

Special Collections in the Danish Netarkivet

Special Collections in the Danish Web Archive, which is run by the State and University Library in Aarhus and The Royal Libray in Copenhagen. Since 2005 the collection and preservation of the .dk internet is included in the Danish Legal Deposit Law.

At the end of the conference, everyone was buzzing with enthusiasm and new ideas, and agreed that the event was a great success  – not least to the flawless organisation and wonderful Danish hospitality, which included a reception celebrating the anniversary of the Danish Web Archiv netarkivet.dk, lots of Smørrebrød (delicious Danish open sandwiches) and a memorable conference dinner, all adding to the friendly and sociable character of the event.

A similar conference is now envisaged to be held in 2016 or 2017 in London, an opportunity not to be missed to catch up with the latest in Web Archiving and strengthen old and new – forgive the pun – links!