Category Archives: Web Archives

Initiating conversation: let’s talk about web content (part 1)

To initiate conversation about preserving web content and to encourage people to think about why archiving the web is so important, I asked staff at the Bodleian Libraries to imagine the following: If you could choose just one website to have guaranteed access to in 10 years’ time what would it be – and why? Keep reading to discover staff answers and perspectives…

Richard Ovenden, Bodley’s Librarian, Bodleian Libraries. Chosen site: bodleian.ox.ac.uk

‘Obviously as somebody who is leading this institution, seeing its history reflected in the institutional website is so significant. If you go back to the archived captures of bodleian.ox.ac.uk that are accessible now through the Internet Archive it’s incredible not only to see evolution of the HTML site itself and the look and feel of it but just to see how it reflects the changes in the organisation since the 1990’s when the first Bodleian website was set up…which was actually the first library in the UK to have a website.

We can see the changes to the way the Bodleian Libraries reflect their public persona through the web but also the website is a useful proxy for how the organisation itself has changed: the organisational structure, the administrative arrangements, the policies and strategies, how the web is a reflection of those changes over the past 20 years is really interesting. And in 10 years’ time it would be over 30 years and there will be another decade of evolution, growth, change…the web is a very convenient place to see that at a glance. We obviously archive a large number of institutional and administrative records in paper and digital form but it’s a huge amount to wade through, whereas the web provides a very convenient lens to view our organisational past through. I can’t think of another way, so conveniently, to chart our history, our progress, our challenges and even some of the mistakes that we’ve made as an organisation over that time.

Our organisation as a whole changed dramatically in the year 2000 when we stopped being just the historic Bodleian Library and we were integrated with the departmental faculty libraries. We then changed our name to University of Oxford Library services, then back to the Bodleian. Through the website you can actually see that extraordinary change. It’s such a convenient way of getting a grip on our history’.


Lukasz Kowalski, Bodleian Library Reader Services, Weston Library. Chosen site: stackexchange.com

‘I was thinking “what’s the website with the most information in it?”. My initial thought was Wikipedia.org. But I could easily live without it if I had to, as probably most knowledge contained in it is available in print. My next thought was stackexchange.com. It facilitates an exchange of knowledge and collective problem-solving on a large scale, otherwise unattainable via printed media. It’s supported by a large community of users, including experts in their fields. Together with its sister sites, it covers virtually any discipline and questions that can be asked and answered. Stackexchange is a web of knowledge, but different from Wikipedia. Rather than being organised knowledge it is more organised thinking.

My background is in Physics and I have used this site to further my understanding of concepts which did not have clear explanations in textbooks, or when I wanted to check that my thinking about a solution to a given problem was on the same page as others.

I think it goes back to what, I guess, the internet was about in the first place: the exchange of knowledge and ideas, and such is the character of this site. It’s great to rely on good teachers if one has access to them – but it is wonderful that people from across the world can gain a deeper understanding of concepts and exchange ideas by connecting more readily with those who have the expertise.’


 

Sophie Quantrell, Library Assistant, Philosophy and Theology Faculty Library. Chosen site: youtube.com

‘I was thinking about youtube.com as a resource mainly because it’s so versatile. It can be used to display images, sound…I’ve seen some people use it for musical scores – putting musical scores alongside the sound and that sort of thing. I think it is a site that can be used almost for any purpose – so you’ve got the social aspect of it with the comments and the interaction as well as the instructional aspect. I learn sign language when I am not busy with other things [gestures around her at the library] so to be able to see and learn it through videos it is great…it’s much more difficult to tell what the signs are if all you’ve got are drawings on a piece of paper!

It can link to videos on so many different topics, like instructional TED talks. There are so many good quality resources online that get overlooked with all the cat videos. It also crosses cultural boundaries…you can upload and view videos in whatever language you want. You could post a video from Australia and someone could be watching it in Kazakhstan!’


Iram Safdar, Graduate Trainee Digital Archivist, Weston Library. Chosen site: wikipedia.org

Wikipedia has been the main source for my knowledge since I was a kid. It’s also provided me with countless hours of entertainment by following the breadcrumb trail of links and seeing where you end up! All sorts of hilarity ensues when you find a rogue edit by someone…I like that it is an open source resource.

Similarly, it shows you what society thinks about things and reveals how we view stuff…which I think in a broader sense is quite interesting.’


Keep an eye out for part 2 and more staff insights coming up on the Archives and Modern Manuscripts blog imminently…

 

iPRES 2016

Last month, I attended the 13th International Conference on Digital Preservation, this year hosted in Bern, Switzerland. The four days of papers, panels, posters and workshops were an intensive and exciting opportunity to meet with colleagues working in digital preservation around the world, share ideas, and hear about innovative projects and approaches. The topics ranged widely from technical systems and practices, to quality and risk assessment, and stewardship and sustainability. What follows are just a couple of highlights from a really fascinating week.

Networking wall

The post-it note networking wall: What do you know? What do you want to know?

Net-based and digital art

As email, digital documents and social media replace traditional forms of communication, it is crucial to be able to preserve born-digital material and make it accessible. An area which I hadn’t previously considered was the realm of net-based art. Here, the internet is used as an artistic medium, which of course has implications (and complications) for digital preservation.

In her key-note speech, Sabine Himmelsbach from the House of Electronic Arts in Basel, introduced us to this exciting field, showing artwork such as Olia Lialina’s ‘Summer’, 2013, shown below.

Summer, by Olia Lialina

Screenshot of Summer, Olia Lialina, 2013. Available at https://www.youtube.com/watch?v=SxvHoXdC4Uk

The artwork features an animated loop of Lialina swinging from the browser bar. Each frame is hosted by a different website, and the playback therefore depends on your connection speed. This creative use of technology creates enormous challenges for preservation. Here, rather than preserving artefacts, it is the preservation of behaviours which is crucial, and these behaviours are extremely vulnerable to obsolescence.

Marc Lee’s ‘TV Bot’ is another net-based artwork, which is automated to broadcast current news stories with live TV streams, radio streams and webcam images from around the world. Reliant on technical infrastructure in this way, the shift from Real Player to Adobe Flash Player was one such development which prevented ‘TV Bot’ from functioning. The artist then not only worked on technical migration, but re-interpreted the artwork, modernising the look and feel, resulting in ‘TV Bot 2.0’ in 2010. This process soon happened again, this time including a twitter stream, in ‘TV Bot 3.0’, 2016. In this way, the artist is working against cultural, as well as technical obsolescence.

Marc Lee, 'TV Bot 2.0', 2010. Image from http://ceaac.org/en/artistes/marc-lee

Marc Lee, ‘TV Bot 2.0’, 2010. Image from http://ceaac.org/en/artistes/marc-lee

The heavy involvement from the artist in this case has helped preserve the artwork, but this process cannot be sustained indefinitely. Himmelsbach ended her speech by stressing the need for collaboration and dialogue, which emerged as a central theme of the conference.

A new approach to web archiving

Another highlight was the workshop on Webrecorder lead by Dragan Espenschied from Rhizome. He introduced their new tool which departs from the usual crawling method to capture web content ‘symmetrically’, which results in incredibly high-fidelity captures. The demonstration of how the tool can capture dynamic and interactive content sparked gasps of amazement from the group!

Webrecorder not only captures social media, embedded video and complex javascript (often tricky with current tools), but can actually capture the essence of an individual’s interaction with the web-content.

How it works: Webrecorder records all the content you interact with during the recording session. Users are then able to interact with the content themselves, but anything that was not viewed during the recording session will not be available to them.

Current web archiving strategies aren’t able to capture the personalised nature of web use. How to use this functionality is still a big question, as a web recording in this way would be personal to the web archivist: showing what they decided to explore, unless a systematic approach was designed by an institution. This itself would be very resource-intensive, and is arguably not where the potential of Webrecorder lies: the ability to capture dynamic content, such as net-based artworks. However, the possibility of preserving not only web content, but our interaction with it, is a very exciting development.iPRES 2016 balloon

iPRES 2016 was a fantastic opportunity to gain insight into projects happening around the world to further digital preservation. It showed me that often there are no clear answers to ‘which file format is best for that?’ or ‘how do I preserve this?’ and that seeking advice from others, and experimenting, is often the way forward. What was really clear from attending was that the strength and support of the community is the most valuable digital preservation tool available.

 

Capturing and Preserving the EU Referendum Debate (Brexit) – UK Web Archive blog

Following the announcement in May 2015 that there would be a referendum on the UK’s EU membership, the Legal Deposit UK Web Archive, led by curators at the Bodleian Libraries, started a collection of websites.

The team of curators includes contributors from the Bodleian Libraries, The British Library, the National Libraries of Scotland and Wales and also Queen’s University Belfast (for the Northern Ireland perspective) and the London School of Economics (for capturing and preserving individual documents, such as the pdf versions of campaigning leaflets).

The collection scope is to capture the ‘Brexit’ debate and the debate around the EU Referendum as well as the wider context of UK/EU relations, including:

  • Media coverage
  • websites of political parties and other political institutions and groups
  • campaigning and lobbying
  • trade unions, professional organisations, businesses
  • academic debate
  • culture and arts
  • public opinion through blogs, comments, and if possible social media.

We primarily archive UK websites under the Non-Print Legal Deposit mandate, but also decided to include some sites outside the UK, if relevant – e.g. websites of UK expats in Europe, or political parties, interest groups and think tanks in the EU and in EU member states – on a permission basis.

The collection (at the time of writing) has 2590 target websites. Some of these are whole websites; others will be a single news story or blog post.

Access and availability
The majority of the collection will be available in the reading rooms of UK Legal Deposit libraries, including both British Library sites, the Bodleian Libraries in Oxford, the National Library of Scotland, the National Library of Wales, Cambridge University Library and Trinity College Dublin. As is usual for web archive collections, there is a delay between collection and availability of up to a year, allowing for cataloguing and for ingest into digital library systems.

by Svenja Kunze, Project Archivist, Bodleian Libraries (Oxford University)

Source: Capturing and Preserving the EU Referendum Debate (Brexit) – UK Web Archive blog

EU Referendum Web Archiving Mini-internship – Part 1

On 20 and 21 June eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the first of two guest blog posts on the micro-internship.

Web archiving micro-interns on the roof of the Weston Library, June 2016.

Web archiving micro-interns on the roof of the Weston Library, June 2016.

Using library archives for their research is not a novelty for any student or scholar. However, web archives represent a completely new dimension of swiftly evolving research methods – they intend to document what is posted online – a  relatively recent form of data collection due to scientific advancements.

For researchers used to traditional archives, the need to store and analyse this data might be not really understandable, however, web archiving, despite being relatively new, is very significant. Firstly, it allows us to store information for generations of future historians and sociologists – contrary to the common perception, many data held on World Wide Web disappears or changes very frequently and rapidly. Secondly, it might be an asset for those pursuing topical research projects in the present – recent technologies (such as prototype SHINE database for historical research) allow us to trace data trends and come to important and fascinating conclusions. Therefore, even if some might underrate web archives, it surely does not diminish their utility to academia.

In the eve of the Brexit referendum, which sparked many debates and discussions in British web space, timely creation of a web collection has proven to be very important – after all, the decision is likely to have long-term consequences for our society, economy, and legal system. Traditionally, individual narratives and civic engagement are set aside when documenting major political decisions. However, a web collection can significantly improve this situation by collecting diverse standpoints expressed in the web sphere. This, in my opinion, perfectly mirrors the ethos of direct democracy where every vote and view counts.

However, important as it is, web archiving comes with a range of practical and ethical obstacles: with huge masses of information being stored online it is very hard to choose what is worthy of being preserved for future generations. Legal restrictions, such as the recent legal deposit legislation, also significantly limit the scope of archivists’ work. During my micro-internship I, along with other interns, tried to overcome these obstacles as much as possible, minimising bias and efficiently using our time resources and server memory. Even in the era of technology, it is the human resources and individual judgment that shape the scope and direction of the collection.

Working on a web collection, especially since the campaigning has increased just before the referendum, was very challenging. However, as interns, we tackled the masses of information by focusing on individual areas of knowledge. Our work on the project was also aided by the guidance provided by our supervisors and discussions on ethical and scientific implications of our research. This was a very rewarding insight into a new area of knowledge, and I am convinced that skills and knowledge acquired and applied by me during the internship will aid me in my future research career.

Anna Lukina

What has web archiving ever done for us? – Saving our dinner plans, for example.

The Bodleian Libraries is involved in web archiving both through the Bodleian Libraries’ own web archive since 2011 , and – as one of the six UK Legal Deposit Libraries – through the Legal Deposit UK Web Archive since 2013.

What’s cooking in the web archives?   —  (Detail from painting by Jean-François Millet [Public domain], via Wikimedia Commons)

A considerable amount of archivists’, curators’ and subject librarians’ time goes into this web archiving work, be it selecting websites for archiving, capturing and preserving web content, describing web archive resources or participating in web archiving strategy, collections management and outreach activities.

Current web archiving projects at the Bodleian include the further development of the Bodleian Libraries Web Archive, for example to capture audio files hosted on web servers, and curatorial work in the UK Web Archive context, such as the Easter Rising 1916 Web Archive and the EU Referendum website collection.

But why archive the web?

What’s on the internet will be there forever, won’t it? Haven’t we all be warned to be careful what we put on the internet, because all the information out there will still reveal awkward details of our first-year-at-university life when we are about to retire?

Unfortunately, for archivists, this is far from what really happens. In fact, websites are extremely ephemeral. They change and disappear at a fast rate.

Continue reading

WARC Files and Blue Lagoons: The IIPC Web Archiving Conference, 13-15 April 2016 in Reykjavik

The International Internet Preservation Consortium (IIPC) is the leading international organisation dedicated to improving the tools, standards and best practices of web archiving, promoting international collaboration and the broad access and use of web archives for research and as cultural heritage.

logoThis year, for the first time the IIPC’s annual General Assembly in Reykjavik was accompanied by a three-day conference, bringing together web archivists, curators, IT specialists and researchers to discuss challenges related to acquiring, preserving, making available and using web archives.  With over 150 participants, including leading experts – most prominently the internet pioneer Vint Cerf – the conference provided a unique opportunity to learn about web archiving strategies and projects around the world, and to keep up to date with emerging trends in research and latest technological developments.

Vint Cerf, Avoiding a Digital Dark Age

Vint Cerf, Avoiding a Digital Dark Age

The first day, after a warm welcome by Ingibjörk Sverrisdottir, Iceland’s National Librarian, was dedicated to the ‘big questions’ of web archiving: What’s worth saving? (Hjalmar Gislason) and how to avoid a Digital Dark Age? (Vint Cerf). How might new services look like, which tools and strategies for preservation are available (Emulation!), or being developed? Or, in the words of Brewster Kahle, founder of the Internet Archive: ’20 years of Web Archiving – What do we do now?’ (video of his talk introducing the ‘National Library of Atlantis’ prototype for integrated web archive discovery)

Brewster Kahle, What Do We Do Now?

Brewster Kahle, What Do We Do Now?

On the second day, the conference continued with two separate tracks, discussing either policies, practices and strategies for capture and preservation of web material, or looking more at the user side of web archives, and at how web archive data be accessed, searched, analysed and visualised as a resource for research.
The third day was the hands-on day with workshops exploring search interfaces such as the SHINE interface developed at the British Library for the UK Web Archive,  DIY web archiving tools such as webrecorder.io, the open-source platform Warcbase for analysing web archive data, and discussing the future of the WARC archive format.

There was plenty of time for Q&A and discussions between and after the talks and presentations, and open, friendly atmosphere of the conference encouraged informal conversations with web archiving colleagues and networking during coffee and lunch breaks, and on visits like the tour of the National and University Library of Iceland.

The National and University Library of Iceland

The National and University Library of Iceland

Once again it became clear that web archiving practice is at the same time extremely diverse and depending on joint efforts and collaborations:
For example, the priorities in curating a relatively small collection of Electronic Literature at the German Literary Archive Marbach are very different from these in capturing and preserving the .EU domain at the Portuguese National Foundation for Scientific Computing FCCN, owing the scope, size and structure of the collections, and the resources available to build and maintain them. Similarly, quality assurance policies and workflows differ considerably between national domain scale archives, such as the Legal Deposit UK Web Archive containing millions of websites, and specialized archives curated and captured by university libraries like the North Carolina State University. Researchers approach the UK Government Web Archive with different research questions than those they would use to look at archived Twitter data.

But no matter the size and scope of the web archive, the resources available at a web archiving institution, or the focus of a particular project, the underlying challenges are very similar:

  • How do we decide what to capture?
  • How to capture it?
  • How to preserve it for the future?
  • Metadata?
  • How to provide access and facilitate discovery?
  • How to use web archives for research?

Working collaboratively and across disciplines, including perspectives from archivists, curators, IT engineers and researchers seems to be the best way forward, and the practice of sharing knowledge and experience, and to openly discuss problems gets certainly embraced by the web archiving community. A particular project might have ‘failed’ in terms of achieving the intended outcome, but it can still provide valuable lessons for the next project elsewhere, and in the long run, for developing best practice, policies and standards for web archiving as a discipline.

Mistakes are only wrong if you - and others - don't learn from them!

Mistakes are only wrong if you – and others – don’t learn from them!

Curators might be slightly overwhelmed by technical details discussed by web crawl engineers (I certainly was!) and ‘the IT guys (and girls)’ might sometimes be confused by the curatorial way of thinking; web archiving cultures in North America seem to differ considerably from the approaches in Europe, where Legal Deposit regulations have a strong impact on collection strategies and access to archives. STEM researchers look at data in different ways than historians and social scientists.
International conferences like the IIPC Web Archiving Conference 2016 are invaluable for bringing together these different perspectives, for fostering discussion and knowledge sharing and for providing an opportunity to establish new and strengthening existing contacts with web archiving colleagues in archives, (university) libraries and research institutions worldwide.

Archiving social media...

Harvesting social media: Overview…

 

...the details.

…and details.

Web archivists love to produce new social media content:
The conference seen through the participants’ Tweets: #iipcwac16.
(Now we just have to archive that!
)

Not least, the Reykjavik conference provided a rare opportunity to meet web archiving colleagues from other UK Legal Deposit Libraries outside the usual committees and institutional settings. One of the conference lunch breaks was turned into an ad-hoc UK Legal Deposit Web Archive meeting, discussing user interface redevelopment – and where else but in Iceland can you have a Friday late afternoon conference debrief whilst soaking in a giant outdoor geothermal bathtub (aka the Blue Lagoon)?

UK web archivists after conference debrief

Some very clean UK web archivists after the conference debrief

 

 

Catching butterflies

Archival Uncertainties: International Conference on Literary Archives at the British Library – 4 April 2016

This one-day conference focused on digital humanities, with papers from a spectrum of interested parties including academics working on digitisation projects, authors, translators, archivists and curators. I attended three panels on the day and the unifying theme was a contrary message of dispersal and amalgamation (and butterflies).

The first thing that has been dispersed or discarded is any idea of a literary canon. As plenary speaker and archivist Catherine Hobbs pointed out, scholarship now focuses less on established set texts and more on themes like “environmental literature”. Over the past few decades, in response to this, archives have collected more non-traditionally canonical literary papers but, Catherine reminded us, as archivists we can’t stop paying attention to the ways that literature continues to change. We need to keep tabs on what is going on in the literary world in order to document it, and this will include tackling new forms of experimental, avant-garde and self-published writing.

Caterpillar: Schwalbenschwanz (Raupe)

Caterpillars and collection development [By Eric Steinert – photo taken by Eric Steinert at Paussac, France, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=338409]

As Catherine noted, it used to be easy to find the avant-garde – pretty much whoever was hanging out on the Left Bank – but now it’s up to archivists to not only collect this material, but to track it down in the first place, and not to default to the temptingly easy path of collecting only the papers of that tiny sliver of authors considered publishable by mainstream publishers.

Continue reading

Web Archiving Micro-internship – Part 2

On 14 and 15 March eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the second of two guest blog posts on the micro-internship.

The most central aspect of modern life is now the proliferation of digital technology. Since the 1990s, it has become a central mode of communication which is often taken for granted. At the start of this micro-internship, we were introduced to the concept of the digital ‘black hole’, a term used to describe the irrevocable loss of this information. Unlike physical correspondence and materials–the letters, writs, and manuscripts of earlier centuries–so much of what we write is fragile and evanescent. To stem the loss of this digital history, we were shown how the Bodleian Libraries and other legal deposit libraries use domain crawls to capture online content at pre-determined intervals using the W3ACT tool. This then preserves a screen grab of the website on the Internet Archive, namely the waybackmachine, before the website is updated.

Web archiving micro-interns working in the Centre for Digital Scholarship, Weston Libary, March 2016.

Web archiving micro-interns working in the Centre for Digital Scholarship, Weston Libary, March 2016.

The right to a copy of electronic and other non-print publications, such as e-journals and CD-ROMs by legal deposit libraries only came into existence on 6th April 2013. This meant that libraries were able to create an archive of all websites with domains based in the United Kingdom. The recent ‘right to be forgotten’ law adopted by the EU is a signal of the fact that the legal status of digital archives is nevertheless becoming increasingly complicated, particularly when compiling archives of events receiving international commentary, like the upcoming EU referendum. Each of us focused on a different aspect of the EU referendum, reflecting our individual interests, ranging from national newspapers and student newspapers to the blogs of Scottish MSPs, Welsh AMs, and MEPs, and the blogs of solicitors and legal firms’ websites offering advice to businesses and refugees in the event of a ‘Brexit’. One of the trickier views to archive was that of British expats living abroad. In this situation, unless the site can be proven to be based in the UK, we would have to write to the owner of the domain to request permission to archive the website. In a situation where permission was given but the person expressing those views subsequently wished to erase this history under the ‘right to be forgotten’ law adopted by the EU, should the UK have voted to leave the EU, this would leave the archived material in a tricky legal position. We learned during the internship that this would most likely result in the relevant archived material being deleted. However, this is exactly what the archive was set up to prevent and so the tension between the right to privacy and freedom of information on a public platform presents considerable problems to the aim of web archives to be fully comprehensive, aggravated further by the omission of websites with pay walls.

After finding this material and ensuring it was covered by the legal deposit law, it was necessary to classify the site accurately, identifying the main language, and providing titles and descriptions. For newspaper articles, this was relatively straightforward, but for Welsh and Irish-language publications produced by political parties, languages which I am studying at Jesus college, this was more complicated as the only languages available to select from were German or English–a testament to the nascent stage of the web archive’s development. In addition, classifying material was very much up to our own individual discretion and the descriptions to our own style. To complicate things further, the order in which searched-for material should be presented raises further issues, which we discussed at the end of the micro-internship. Namely whether results should be arranged by ‘most popular’, by date of publication, or any other criterion. The discussions and practical experience offered by this internship gave us an opportunity to help address the legal and administrative challenges facing web archivists.

Daniel Taylor

Web Archiving Micro-internship – Part 1

On 14 and 15 March eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the first of two guest blog posts on the micro-internship.

During a micro-internship at the Bodleian’s legal deposit web archives, focusing on the EU referendum collection, we have had an occasion to reflect on the meaning of such an archive, and particularly on its potential for creating meaning.

Web archiving micro-interns on the roof of the Weston Library, March 2016.

Web archiving micro-interns on the roof of the Weston Library, March 2016.

A web archive’s potential document base is clearly much wider than a paper collection’s. No material criteria, such as donations and physical availability, play a defining factor in the content archived. The main restriction placed on this particular archive is that of legal permission, which allows only UK domains to be easily archived. Even so, the scope remains incredibly wide.

Therefore, archiving the web implies a deliberate narrowing of choices on the archivist’s side. Much is left to their discretion.

A lot of what we know of history is defined by the material that is preserved. It is difficult to learn about the working class or women in the past from original sources, as material by and about such people is conspicuously absent from our collections. A contemporary web archivist has the chance to select material that can most broadly represent society. This will make it impossible for future historians to ignore the history of many groups, and will enable research into a variety of thoughts and experiences.

This was reflected and magnified in the approaches that the group of interns took, which evidences the importance of having a range of different people cooperate on the gathering of knowledge. One woman, for example, concentrated on the representation of the Brexit referendum in media specific to certain ethnic and religious groups, such as Judaism. Another made sure to include the views of Scottish, Gaelic and Irish media and organisations, in order to avoid an England-only approach. One of the interns chose to gather information about the way the referendum is seen in small communities, enriching the archive with small local publications. On the first day, I concentrated on the views and representation of immigrants, whose lives will be strongly affected by the referendum. On the second day, I preserved information about women’s roles and views.

Such a wide range of approaches contributes to the broadening and deepening of historical studies. It also positively contributes to contemporary social science. This can happen in two main ways. Firstly, it places virtual documents in a setting that makes their analysis easier. It thus enables social scientists to observe internet trends throughout the years, and compare them to each other. For this purpose, a wide range of archived material is essential, and again the archivist has a role in creating the foundational understanding of British society..

Secondly, and perhaps more interestingly (as the first function can be fulfilled by tools on the live web) they allow social scientists to track trends in academia. A web archive describes what subjects and focuses contemporary academia considers to be salient. It points out what we, as researchers, think is worth being saved from the internet black hole.

The defining potential of this is striking, and this internship allowed us to understand the social, political and historical role of archiving.

Zad El Bacha

DPC Student Conference: What I Wish I Knew Before I Started

The world of digital preservation can appear a bit daunting: a world full of checksums and programming and OAIS models, AIPs and DIPs, combined with the urgency of acting before it all becomes too late and technological obsolescence creates a black hole, swallowing up our digital heritage. The Digital Preservation Coalition’s What I Wish I Knew Before I Started  Student Conference provided an opportunity to meet others beginning to work in digital preservation, and hear advice and reassurance from a range of interesting expert speakers.

Fancy words and acronym bingo

The day began with an Introduction to Digital Preservation by the DPC’s Sharon McMeekin who introduced us to current models, methodologies and frameworks, which she warned could also be known as fancy words and acronym bingo. Her presentation was very practical and informed us about resources which will be invaluable when putting digital preservation into practice. Sharon emphasised the importance of active preservation: it isn’t only the digital materials which are vulnerable to obsolescence, but the digital preservation systems that they are stored in. Crucially, digital preservation needs to be embedded into day-to-day work to make it sustainable.

The need for active preservation was echoed by Steph Taylor from the University of London Computer Centre, who urged us all to learn to keep up to date and engage with the digital preservation community through twitter, blogs and forums. She counselled us to be prepared to explain again and again that digital preservation is really not the same thing as backing up files.

Matthew Addis from Arkivum then gave a technologist’s perspective, introducing us to a range of software and tools including the DROID file format identification tool; the POWRR Grid that maps preservation tools against types of content and stages of their lifecycle; the PRONOM registry of file formats; the Exactly checksum tool, among many others, carrying on the game of acronym bingo. The amount of choice of tools and standards can lead to what Matthew called preservation paranoia and then to preservation paralysis where the task seems so big and complex that it seems better to do nothing at all.

It’s people that are the biggest risk to digital content surviving into the future. People thinking that preservation is too hard, too expensive, or tomorrow’s problem and not today’s. (Addis, 2016)

Being a digital archivist = being an archivist with extra super powers

The afternoon sessions were launched by Adrian Brown from the Parliamentary Archives. The Parliamentary Archives hold a wide range of digital material, from the expected email and audio-visual records to the more surprising virtual reality tours and reconstructions of sinking ships. He emphasised that digital archiving was still essentially archiving, involving selection, appraisal, preservation, cataloguing and supporting users. Being a digital archivist, he said, is the same thing as being an archivist, only with extra super powers.

Next, Glenn Cumiskey, Digital Preservation Manager at the British Museum spoke about the importance of engaging with technology, decision makers and user communities. In the current environment, Glenn  illustrated through the roles associated with digital preservation: Archivist, Records Manager, Librarian, Information Technologist, Digital Humanities, and Software Programmer all at once, that you may need to be all of these things at once.

We then heard from Helen Hockx-Yu from the Internet Archive. Here at the Bodleian, the digital archive trainees are actively involved with the Bodleian Libraries Web Archive which uses the Internet Archive’s ‘Archive-It’ and ‘wayback machine’ services. It was interesting to hear from Helen about the redevelopment work she is involved in and how her own career developed in web archiving. Her final advice to us was to keep learning and not worry about being a perfectionist.

Ann MacDonald from the University of Kent inspired us with a talk about her own career began and developed over the last few years, and emphasised that technical innovations are not all about big machines and that small actions can go a long way in implementing digital preservation.

Only point of digital preservation is reuse of data. Nothing else.

Finally, Dave Thompson, Digital Curator at the Wellcome Collection, gave an entertaining presentation which made the point that digital preservation is not an exercise in technology  for its own sake.  He argued that the only point of digital preservation is the reuse of data, therefore data needs to be reusable, consumable and shareable. Digital preservation should be seized as a social opportunity to do this.

Overall, the DPC’s Student Conference: What I Wish I Knew Before I Started was an engaging mixture of reassurance, ideas and advice to prepare us to begin working practically with digital preservation. Key themes which emerged across the presentations were the importance of people in the process, the importance we must give to what users actually want from digital collections, and the importance of selling the benefits and opportunities that digital preservation can bring. It introduced us to technology, tools and processes, but at the same time stressed that you do not need to be a qualified programmer to work in digital preservation.