Developing collections on Gender Equality at the UK Web Archive

The Gender Equality collection

The UK web archive Gender Equality collection and its themed subsections provide a rich insight into attitudes and approaches towards gender equality in contemporary UK society and culture. This was previously discussed in my last blog post about the collection, which you can read here.

Curating the collection

A great deal of the discussion and activity relating to gender equality occurs predominantly in an online space. This means that as a curator for the Gender Equality collection, the harvest is plenty! The type of content being collected by the UK Web Archive includes:

Of course there is some crossover, not only regarding the type of content but also within subsections of the gender equality collection.

This image is made available and reproduced by CC-BY-NC-SA 2.0. [https://creativecommons.org/licenses/by-nc-sa/2.0/legalcode]

Specifically, I find the event sites in the collection really interesting. As well as documenting that the event(s) even existed and happened in the first place, they can give us a snapshot of who organised the event, as well as who the intended audience were. Also, the collection exhibits the evolution of websites related to gender equality over time (which can be very speedy indeed when it comes to sites like twitter accounts!), and the changing priorities, trends, initiatives and more that can tell us about attitudes towards gender equality in the UK. These kinds of websites are being created by and engaged with by humans right now.

Nominate a website!

The endeavour of the UK Web Archive never stops – if you would like to help grow the Gender Equality collection (or indeed, any other collections) click here to nominate a website to save. Go on…whilst you’re at it, you can explore the UK Web Archive’s funky new interface!

 

Image reference: Workers Solidarity Movement (2012) March for Choice

 

Festivals in the UK Web Archive

Live events are funny things; can their spirit be captured or do you have to “be there to get it”? Personally I don’t think you can, so why are we archiving festival websites?

Running throughout the year, though most tend to be clustered around the short UK summer, festivals form a huge part of the UK’s contemporary cultural scene.  While it’s often the big music festivals that come to mind such as Glastonbury and Reading or perhaps the more local CAMRA sponsored beer and cider festivals; these days there is a festival for pretty much everything under the sun.

UK Web Archive topics and themes

In part this explosion of festivals from the very local and niche to the mainstream and brand sponsored has been helped by the internet. You can now find festivals dedicated to anything from bird watching to meat grilling to vintage motors.

With the number of tools and platforms available for website creation and event and bookings management and the rise of social media, it seems anyone with an idea can put on a festival. More importantly with increasing connectedness that the web gives us, the reach of these home grown festivals has become potentially global.

Of course most will remain small local events that go on until the organisers lose interest or money such as Blissfields in Winchester which had to cancel their 2018 event due to poor ticket sales. But some will make it big like Neverworld which started in 2006 in Lee Denny’s back garden while his parents were away for the week but now 10+ years on has sold out the 5000 capacity festival venue it has relocated to.

The UK Web Archive‘s Festivals collection attempts to capture the huge variety of UK festivals taking place each year and currently has around 1200 events being archived that are loosely categorised based around 15 common themes, though of course there is a great deal of crossover as they can be found combining themes such as:

In this collection of UK festivals sites, while we cannot capture the spirit of a live event we can still try to capture their transient nature. Here you can see their rise and fall, the photographs and comments left in their wake, and their impact on local communities over time. Hopefully these sites and their contents can still give future researchers a sometimes surprising and often candid snapshot of contemporary British culture.

Emily Chen

Wilfred Owen Archive: New catalogue

The Wilfred Owen archive has just been fully rehoused and catalogued, with a detailed list of items available online. The collection has had a lively existence thus far, with the bulk of it donated by Harold Owen in 1975 to the English Faculty Library. Wilfred’s cousin Leslie Gunston donated the Gunston collection in 1978. Small additions have been made since then, and the collection now includes the working papers and correspondence of two prominent Owen scholars, Dominic Hibberd and Jon Stallworthy. The entire collection was transferred to the Weston Library on 13 January 2016.

Following a month of work, the collection has been reordered and renumbered, although the former, widely-cited OEF (Oxford English Faculty) references are included in the catalogue, as are references to Jon Stallworthy’s transcripts in Wilfred Owen: The Complete Poems and Fragments (CPF).

Wilfred Owen’s literary papers make up the first six boxes (MSS. 12282/1-6) and include Wilfred’s original manuscripts (digital versions of which are available on the Word War I Poetry Digital Archive), allowing the reader to see the maturation of Owen’s poetry from the early ‘To Poesy’ to his masterpieces ‘Dulce et Decorum Est’, ‘Mental Cases’ and ‘Anthem for Dead Youth’. Drafts of poems that Wilfred sent to his cousin, Leslie Gunston, are also found in this part of the collection.

The archive also contains other primary source material relating to Wilfred. At MSS. 12282/34-5 there are original editions of The Hydra, a magazine published by the patients at Craiglockhart Hospital for Neurasthenic Officers where Wilfred was a patient in 1917. He edited several issues of the magazine and some of the copies have annotations by him, such as ‘With the Editor’s Compliments!’ School exercise books and correspondence are similarly preserved, and there is an extensive collection of objects and family possessions relating to Wilfred and his family. Many of the objects are extremely fragile and kept in a Reserved part of the collection, but they provide a tangible closeness to Wilfred. Found here are some of Tom Owen’s souvenirs from India, Susan Owen’s jewellery box, with locks of Wilfred’s baby hair, an old family clock, a boat handmade by Tom for Wilfred, and some binoculars belonging to Wilfred himself.

The photographs in the archive span from the late 19th century to the late 20th century, and include many generations of Wilfred’s mother’s family. The photos are arranged by size and subject and include photographs of Wilfred.

The remainder of the archive mostly consists of Harold Owen’s correspondence, press cuttings and working papers. These offer a fascinating insight into the life of Wilfred’s brother, Harold and highlight the way in which he controlled Wilfred’s reputation and that of the Owen family. His correspondence with admirers, scholars, publishers, libraries and museums uncovers the human face of archival acquisitions and posthumous literary fame. Harold’s biography, Journey from Obscurity, is found in this part of the collection, with a first draft of almost 1000 pages written by hand in Harold’s characteristic small capitals.

There are three later additions to the archive. The 1978 Gunston donation includes manuscripts dating back to the 19th century, letters, photographs and cartoons. Particularly charming are Leslie’s letters to his wife Norah, and the sketches contained in them.

The Owen scholar Dominic Hibberd gave his working files, which contain correspondence, press cuttings, photocopies and photographs, generated in the course of his research. Some of these items are dated as recently as 2002, and include new resources, such as photocopies of the birth, death and marriage certificates of Wilfred’s extended family.

Also present are Jon Stallworthy’s working files, which are comprised mostly of photocopies of the Owen manuscripts which he used to create his Complete Poems and Fragments.

Several items in particular caught my attention throughout the archiving process:

Items 83 and 102 in MS. 12282/7, folder 2 are two letters from Annie G Phillips to Harold Owen, dated November 1969. Annie is studying for her A levels, and writes to Harold of her admiration for Journey from Obscurity, his memoirs. She says that learning about the family life of the Owens has helped her understand Wilfred’s poetry on a deeper level, but she also makes some very personal connections. Like Wilfred, she cannot afford to go to university. Harold’s reply must have been kind because her follow-up letter is even more brimming with excitement. These exchanges really posit Harold as a living connection to Wilfred, a way for readers to access the poet, a way of keeping Wilfred alive. But this is of course exactly what Harold’s archival work did and does. His own papers are testimony to that process of preservation, and exist as items worthy of study in their own right. But these letters also left me wondering what happened to Annie Phillips, who must now be nearing 80. Did she ever go to university? Is she still reading Wilfred Owen?

Item 151 in MS. 12282 photogr. 3 is a postcard of Scarborough during the war, collected as part of a group of postcards of places connected to Wilfred Owen. It follows postcards of Bordeaux, Ripon, Ors, and many other places. The photographed place is the focus of these postcards, and very few have any writing on them. But item 151 dates from the First World War and has a message written to a ‘Miss Lucy Sunderland’ from ‘Daddy’. Archival work is never neutral, and the decision made to use this postcard in the collection represents a value judgement: the photographic record of a place is of greater importance than the message contained on the verso of the card. In the catalogue, I decided to include the information about the scribbled message in an attempt to balance out the conflicting demands placed upon this item. We’ll never know if Lucy’s Daddy made it back home again.

Item 16 in MS. 12282 objects 2 is a tiny cardboard box inside Susan Owen’s jewellery box. This tiny box contains two envelopes with the hair of Wilfred Owen inside. One of the locks of hair even had the shedded skin of a carpet beetle lodged within it! The hair itself was one of the most moving discoveries within the collection, with a tangibility that is both enticing and repulsive. But the manner of preservation was fascinating, too. The hair had originally been labelled in the envelopes and box by someone with a cursive hand, most likely Susan Owen herself, who would have been the one to cut Wilfred’s hair. The pencil marks had somewhat faded away, but one of the envelopes read ‘The hair of Sir Wilfred Edward Salter-Owen at the age of 11 ½ months in the year 1894’ For Susan, then, this was the act of a proud mother, keeping a memory of her son’s early years, to look back upon when he was older. But the cursive pencil writing is overshadowed by the characteristic small capitals in ink of Harold Owen. Harold labels the box as ‘The poet Wilfred Owen’s hair’. He displays an entirely different motive – to preserve the remains of a well-known literary figure. The object’s purpose and identity has been altered by the motives of its various owners. How the Bodleian labels this item will necessarily be another act of alteration. A strand of hair is never just a strand of hair!

Laura Hackett

Oxford LibGuides: Web Archives

Web archives are becoming more and more prevalent and are being increasingly used for research purposes. They are fundamental to the preservation of our cultural heritage in the interconnected digital age. With the continuing collection development on the Bodleian Libraries Web Archive and the recent launch of the new UK Web Archive site, the web archiving team at the Bodleian have produced a new guide to web archives. The new Web Archives LibGuide includes useful information for anyone wanting to learn more about web archives.

It focuses on the following areas:

  • The Internet Archive, The UK Web Archive and the Bodleian Libraries Web Archive.
  • Other web archives.
  • Web archive use cases.
  • Web archive citation information.

Check out the new look for the Web Archives LibGuide.

 

 

New Catalogue: Papers of Louis MacNeice

The catalogue of the papers of the Northern Irish poet and playwright Louis MacNeice (1907-1963) is now available online.

MacNeice studied Classics at Oxford from 1926, and together with Stephen Spender and Cecil Day-Lewis, he became part of the circle of poets and writer that had formed around W.H. Auden. His professional life began in 1930 as a lecturer in Classics, but in 1941 he joined the BBC and for the next twenty years produced radio plays and other programmes for the Features Department.

Whilst he also wrote articles and reviews, theatre plays, a novel and even a children’s book, MacNeice is best known for his poetry. Between 1929 and 1963, he published more than a dozen poetry volumes, such as Autumn Journal (1939) – regarded by many as his masterpiece, Springboard (1944), Holes in the Sky (1948), Ten Burnt Offerings (1952), and Visitations (1957). His last poetry volume, The Burning Perch came out just a few days after MacNeice’s untimely death in autumn 1963.

Amongst other works published posthumously were a book entitled Astrology (1964), Selected Poems (1964) edited by W.H. Auden, the autobiography The Strings are False (1965) edited by E.R. Dodds, and Varieties of Parable (1965), as well as the radio/ theatre plays The Mad Islands and The Administrator (1964), One for the Grave (1968) and Persons from Porlock (1969), and the song cycle The Revenant (1975).

(Frederick) Louis MacNeice by Howard Coster,
nitrate negative, 1942. NPG x1624.
© National Portrait Gallery, London.
(CC BY-NC-ND 3.0)

The archive at the Bodleian Libraries comprises more than 70 boxes of literary papers and other material relating to Louis MacNeice’s career as a writer, as well as extensive personal and professional correspondence, and some personal papers. Continue reading

Introducing the new UK Web Archive website

Until recently, if you wanted to search the vast UK Legal Deposit Web Archive (containing the whole UK Web space), then you would need to travel to the reading room of a UK Legal Deposit Library to see if what you needed was there. For the first time, the new UK Web Archive website offers:

  • The ability to search the Legal Deposit web archive from anywhere.
  • The ability to search the Legal Deposit web archive alongside the ‘Open’ UK Web Archive (15,000 or so publicly available websites collected since 2005).
  • The opportunity to browse over 100 curated collections on a wide range of topics.

Who is the UK Web Archive?
UKWA is a partnership of all the UK Legal Deposit Libraries – The British Library, National Library of Scotland, National Library of Wales, the Bodleian Libraries, Cambridge University Libraries, and Trinity College, Dublin. The Legal Deposit Web Archive is available in the reading rooms of all the Libraries.

How much is available now?
At the time of writing, everything that a human (curators and collaborators) has selected since 2005 is searchable. This constitutes many thousands of websites and millions of individual web pages. The huge yearly Legal Deposit domain crawls will be added over the coming year.

This includes over 100 curated collections of websites on a wide range of topics and themes. Recent collections curated by the Bodleian Libraries include:

Do the websites look and work as they did originally?
Yes and no. Every effort is made so that websites look how they did originally and internal links should work. However, for a variety of technical  issues many websites will look different or some elements may be missing. As a minimum, all of the text in the collection is searchable and most images should be there. Whilst we collect a considerable amount of video, much of this will not play back.

Is every UK website available?
We aim to collect every website made or owned by a UK resident, however, in reality it is extremely difficult to be comprehensive! Our annual Legal Deposit collections include every .uk (and .london, .scot, .wales and .cymru) plus any website on a server located in the UK. Of course, many websites are .com, .info etc. and on servers in other countries.

If you have or know of a UK website that should be in the archive we encourage you to nominate them via the website.

Another version of this post was first published on the UK Web Archive blog.

Sixth British Library Labs Symposium

On Monday November 12, 2018 I was fortunate enough to attend the annual British Library Labs Symposium. During the symposium the British Library showcases the projects that they have been working on for their digital collections and issues awards to those who either contributed to those projects or used the digital collections to create their own projects.

According to Adam Farquhar, Head of Digital Scholarship at the British Library, this year’s symposium was their biggest and best attended yet: a testimony to the growing importance of digitization, as well as digital preservation and curation, within both archives and libraries.

This year’s theme of 3D models and scanning was wonderfully introduced by Daniel Prett, Head of Digital and IT at the Fitzwilliam Museum in Cambridge, in his keynote lecture on ‘The Value, Impact and Importance of experimenting with Cultural Heritage Digital Collections’. He explained how, during his time with the British Museum, they began to experiment with the creation of digital 3D models. This eventually lead to the purchase of a rig with multiple camera’s allowing them to take better quality photos in less time. At the Fitzmuseum, Prett has continued to advocate the development of 3D imaging. The museum now even offers free 3D imaging workshops open to anyone who is in possession of a laptop and any device that has a camera (including a smartphone).

Although Prett shared much of his other successful projects with us, he also emphasized that much of digitization is about trial and error, and stressed the importance recording those errors. Unfortunately, libraries and archives alike are prone to celebrate their successes, but cover-up their errors, even though we may learn just as much from them. Prett called upon all attendees to more frequently share their errors, so we may learn from each other.

During the break I wandered into a separate room where individuals and companies showcased the projects that they developed in relation to the digital libraries special collections. A lucky few managed to lay their hands on a VR headset in order to experience Project Lume (a virtual data simulation program) and part of the exhibition by Nomad. The British Library itself showcased their own digitization services, including 360° spin photography and 3D imaging. The latter lead to some interesting discussions about the de- and re-contextualization of artworks when using 3D imaging technology.

In the midst of all this there was one stand that did not lure its spectators with fancy technology or gadgets. Instead, Jonah Coman, winner of the BL Teaching & Learning Award, showcased the small zines that he created. The format of these Pocket Miscellany, as they are called, are inspired by small medieval manuscripts and are intended to inform their readers about marginalized bodies, disability and queerness in medieval literature. Due to copyright issues these zines are not available for purchase, but can be found on Coman’s Patreon website.

The BL labs symposium also showed how the digital collections of the British Library can inspire both art and fashion. Fashion designer Nabil Nayal, who unfortunately could not accept his BL labs Commercial Award in person, for example, had used the Elizabethan digital collections as inspiration for the collection he presented at the British Library during the London Fashion week.

Artist Richard Wright, on the other hand, looked to the library’s infrastructure for inspiration. This resulted in The Elastic System, a virtual mosaic of hundreds of the British Library books that together make-up a sketch of Thomas Watts. When you zoom in on the mosaic you can browse the books in detail and can even order them through a link to the BL’s catalogue that is integrated in the picture. Once a book is checked out, it reveals the pictures of BL employees working in the stacks to collect the books. It thereby slowly reveals a part of the library that is usually hidden from view.

Another fascinating talk was given by artist Michael Takeo Magruder about his exhibition on Imaginary Cities which will be staged at the British Library’s entrance hall from 5 April to 14 July 2019. Magruder is using the library’s 19th and early 20th century maps collection to create new and ever changing maps and simulations of virtual, fantastical cities. Try as I might, I fear I cannot do justice to Magruder’s unique and intriguing artwork with words alone and can therefore only urge you to go visit the exhibition this coming year.

These are only a few of the wonderful talks that were given during the Labs symposium. The British Labs symposium was a real eye opener for me. I did not realize just how quickly the field of 3D imaging had developed within the museum and library world. Nor did I realize how digital collections could be used, not simply to inspire, but create actual artworks.

Yet, one of the things that struck me most is how much the development of and advocacy for the use of digital collections within archives and libraries is spurred on by passionate individuals; be they artists who use digital collections to inspire their work, digital- and IT-specialists willing to sacrifice a lunch break or two for the sake of progress or individual scholars who create little zines to spread awareness about a topic they feel passionate about. Imagine what they can do if initiatives like the BL labs continue to bring such people together. I, for one, cannot wait to see what the future for digital collections and scholarship holds. On to next year’s symposium.

 

New Conservative Party Archive releases for 2019

Speaking notes prepared for Margaret Thatcher, annotated drafts of William Hague’s election leaflets, and briefing papers written by David Cameron as a young researcher are all among files newly-released by the Conservative Party Archive for 2019. This year, our releases are drawn primarily from the records of the Conservative Research Department (CRD): these comprise the department’s subject files and working papers, its briefings prepared for Members of Parliament, and the papers and correspondence of CRD desk officers. In addition to our regular scheduled de-restrictions, the Conservative Party Archive is pleased to announce that the papers of Robin Harris, the Director of the Conservative Research Department from 1985-1989, will also be made available for consultation for the first time. This blog will briefly look at some of the items to be found in each of these main series, demonstrating the value of these collections to researchers of the Conservative Party and historians of modern British history.

Conservative Research Department Files, 1988

Among the newly-released records are a number of files on the ever-thorny question of Europe, including the minutes and papers of the European Steering Committee, the Party’s coordinating group for the 1989 elections to the European Parliament. These files provide a fascinating insight into the challenges the Party faced in trying to balance the record of its MEPs with the increasing Euroscepticism of British Conservatism: a September 1988 report on the Party’s private polling on Europe, for instance, warned that nearly a third of Conservative general election voters were opposed to EEC membership and would not turn out to support the Party in the European Elections [CPA CRD 4/30/3/1]. The Conservative Party Archive has, separately, also recently acquired the records of the Conservative delegation to the European Parliament in this period, and will be seeking to make these available for consultation later in 2019.

Minutes and papers of the European Steering Committee – CPA CRD 4/30/3/1.

Conservative Research Department Briefings, 1988

This year’s releases under the thirty-year rule include a wide range of policy briefings prepared by the Research Department. These briefings, typically prepared for Conservative MPs and Peers ahead of parliamentary debates, provide an excellent snapshot of the Party’s thinking, tactics, and rhetorical strategy on the key issues of the day. Subjects covered by the briefings include some of the most prominent policies of the Thatcher government, including the introduction of the Community Charge (Poll Tax) and the privatisation of state-owned utilities.

A selection of CRD briefings from the Environment and Local Government file, covering the Community Charge, Section 28, and Acid Rain – CPA CRD/B/11/7.

This series notably includes briefing papers prepared by David Cameron during his time in CRD, covering topics on environmental, energy and industrial policy. In 1989 Cameron became the Head of the Political Section, a post he held in the department until 1992, and we expect to be able to de-restrict more of his papers from this period in the years ahead.

Two CRD briefings on Energy Privatisation written by David Cameron – CPA CRD/B/10/8.

Conservative Research Department Letter Books, 1988

The papers and letter books of the Research Department desk officers are a unique resource for those studying the history of Conservatism. Among those files newly de-restricted for 2019 are the letter books of CRD Desk Officer Richard Marsh. Specialising in environmental policy and local government, Marsh’s papers include extensive material on the Poll Tax, and are likely to be of high value to researchers of the subject. Marsh’s papers also include a draft copy of William Hague’s election leaflet from the 1989 by-election, complete with revealing annotations – a pledge to bring in harsher sentences for criminals, for instance, is struck out and replaced with a vaguer commitment to take ‘vigorous action in the fight against crime’ [CPA CRD/L/4/40/2].

Annotated drafts of an election leaflet for William Hague, the Party’s candidate in the 1989 Richmond By-election – CPA CRD/L/4/40/2.

Papers of Robin Harris, Research Department Director, 1985-1988

Finally, the records of CRD Director Robin Harris provide a rich insight into the Conservative Party during the 1980s. For instance, Harris’ letter book for August and September 1987 shows how the Research Department went about preparing material for Thatcher’s speech to the Conservative Party Conference, with draft sections of the speech and working memoranda included in the file [CRD/D/10/2/25].

Robin Harris file on Margaret Thatcher’s 1987 Party Conference speech, including draft speech sections – CPA CRD/D/10/2/25.

Harris’ papers also show how the Party responded at times of political crisis. During the Westland Affair, when Thatcher’s premiership was briefly seen to be threatened, the Party received numerous letters from the public calling on the Prime Minister to resign. Harris’ memo books from the time show how Conservative Central Office managed the situation, drafting template responses defending the government’s conduct [CRD/D/10/1/11]. The papers should prove to be a valuable resource for historians of the period, and we expect to be able to make further de-restrictions in this series under the thirty-year rule in January 2020.

Robin Harris memoranda on the Party’s response to the Westland Affair – CPA CRD/D/10/1/11.

All the material featured in this blog post will be made available from 1 Jan 2019. The full list of de-restricted items will be published shortly on the CPA website, where de-restriction lists from previous years are also available.

DCDC 2018: Memory and Transformation

Entitled ‘Memory and Transformation’, this year’s DCDC conference (Discovering Collections Discovering Communities) sought to bring together a variety of practitioners from different cultural sectors including museums, libraries and archives to discuss the importance of memory management across the cultural heritage sector and the duty of archives as memory institutions in ensuring our rich past is not forgotten but routinely remembered, commemorated and celebrated.

Celebrating Anniversaries: The Memory Milestones of History

In an opening keynote Jane Ellison (Head of Creative Partnership, BBC) stressed the need for archivists as custodians of memory to mark important anniversaries through a regular programme of outreach events and activities to ensure past histories are not overlooked or misinterpreted but suitably commemorated. Ellison discussed the pivotal role of the BBC Archives in facilitating such events including the recent centenary anniversary of the First World War through the provision of archival memories including: photographs, first-hand accounts from the front line and oral histories. In employing archival evidence we help to honour our history in the truest form possible, adding colour to past events and bringing them to life in a way simply retelling stories from the front line would struggle to achieve. In bringing the keynote to a close Ellison ended with a thought-provoking quote taken from the Armistice day Sunday service which really brought home the importance and duty of archives to act in an effort to remember well,  “we are not responsible for what happened in history but we are responsible for remembering it well”.

Recalling Past Memories: The Role of Archives in Dementia Care

An example of a memory box

Having had the opportunity to work with individuals suffering from dementia In the past I have experienced first-hand the life-altering impact the condition can have both on the individual and their friends and family. It was extremely refreshing therefore to have the opportunity to hear from Sophie Clapp (Boots UK Archive) about the therapeutic role archives can play in helping to revive forgotten memories and transform the lives of people living with dementia.

Reminiscence Therapy: The Value of Memory Boxes

Through their work with Professor Victoria Tischler (Head of Dementia Care at the University of West London) Boots UK Archive have been able to develop multi-sensory memory boxes for care home residents living with dementia. Boxes include specially selected items from the Boots Archive which houses over ten thousand items including recipes, formulations and health-care products thought to trigger memories of nostalgia. From carbolic soap to Devonshire bath salt, the smell of these items was reported to have a powerful impact on dementia sufferers enabling them to recall past memories and strike up conversations, sparking new hope for researchers and families of those living with dementia.

The Power of Archival Memories

It was wonderful to attend the DCDC conference this year and learn more about the power of archives beyond their traditional research, evidential and community value as memory institutions with a duty and ability to commemorate historic milestones, acquire archival memories of different cultures and even provide reminiscence therapies.

Archives Unleashed – Vancouver Datathon

On the 1st-2nd of November 2018 I was lucky enough to attend the  Archives Unleashed Datathon Vancouver co-hosted by the Archives Unleashed Team and Simon Fraser University Library along with KEY (SFU Big Data Initiative). I was very thankful and appreciative of the generous travel grant from the Andrew W. Mellon Foundation that made this possible.

The SFU campus at the Habour Centre was an amazing venue for the Datathon and it was nice to be able to take in some views of the surrounding mountains.

About the Archives Unleashed Project

The Archives Unleashed Project is a three year project with a focus on making historical internet content easily accessible to scholars and researchers whose interests lay in exploring and researching both the recent past and contemporary history.

After a series of datathons held at a number of International institutions such as the British Library, University of Toronto, Library of Congress and the Internet Archive, the Archives Unleashed Team identified some key areas of development that would enable and help to deliver their aim of making petabytes of valuable web content accessible.

Key Areas of Development
  • Better analytics tools
  • Community infrastructure
  • Accessible web archival interfaces

By engaging and building a community, alongside developing web archive search and data analysis tools the project is successfully enabling a wide range of people including scholars, programmers, archivists and librarians to “access, share and investigate recent history since the early days of the World Wide Web.”

The project has a three-pronged approach
  1. Build a software toolkit (Archives Unleashed Toolkit)
  2. Deploy the toolkit in a cloud-based environment (Archives Unleashed Cloud)
  3. Build a cohesive user community that is sustainable and inclusive by bringing together the project team members with archivists, librarians and researchers (Datathons)
Archives Unleashed Toolkit

The Archives Unleashed Toolkit (AUT) is an open-source platform for analysing web archives with Apache Spark. I was really impressed by AUT due to its scalability, relative ease of use and the huge amount of analytical options it provides. It can work on a laptop (Mac OS, Linux or Windows), a powerful cluster or on a single-node server and if you wanted to, you could even use a Raspberry Pi to run AUT. The Toolkit allows for a number of search functions across the entirety of a web archive collection. You can filter collections by domain, URL pattern, date, languages and more. Create lists of URLs to return the top ten in a collection. Extract plain text files from HTML files in the ARC or WARC file and clean the data by removing ‘boilerplate’ content such as advertisements. Its also possible to use the Stanford Named Entity Recognizer (NER) to extract names of entities, locations, organisations and persons. I’m looking forward to seeing the possibilities of how this functionality is adapted to localised instances and controlled vocabularies – would it be possible to run a similar programme for automated tagging of web archive collections in the future? Maybe ingest a collection into ATK , run a NER and automatically tag up the data providing richer metadata for web archives and subsequent research.

Archives Unleashed Cloud

The Archives Unleashed Cloud (AUK) is a GUI based front end for working with AUT, it essentially provides an accessible interface for generating research derivatives from Web archive files (WARCS). With a few clicks users can ingest and sync Archive-it collections, analyse the collections, create network graphs and visualise connections and nodes. It is currently free to use and runs on AUK central servers.

My experience at the Vancouver Datathon

The datathons bring together a small group of 15-20 people of varied professional backgrounds and experience to work and experiment with the Archives Unleashed Toolkit and the Archives Unleashed Cloud. I really like that the team have chosen to minimise the numbers that attend because it created a close knit working group that was full of collaboration, knowledge and idea exchange. It was a relaxed, fun and friendly environment to work in.

Day One

After a quick coffee and light breakfast, the Datathon opened with introductory talks from project team members Ian Milligan (Principal Investigator), Nick Ruest (Co-Principal Investigator) and Samantha Fritz (Project Manager), relating to the project – its goals and outcomes, the toolkit, available datasets and event logistics.

Another quick coffee break and it was back to work – participants were asked to think about the datasets that interested them, techniques they might want to use and questions or themes they would like to explore and write these on sticky notes.

Once placed on the white board, teams naturally formed around datasets, themes and questions. The team I was in consisted of  Kathleen Reed and Ben O’Brien  and formed around a common interest in exploring the First Nations and Indigenous communities dataset.

Virtual Machines were kindly provided by Compute Canada and available for use throughout the Datathon to run AUT, datasets were preloaded onto these VMs and a number of derivative files had already been created. We spent some time brainstorming, sharing ideas and exploring datasets using a number of different tools. The day finished with some informative lightning talks about the work participants had been doing with web archives at their home institutions.

Day Two

On day two we continued to explore datasets by using the full text derivatives and running some NER and performing key word searches using the command line tool Grep. We also analysed the text using sentiment analysis with the Natural Language Toolkit. To help visualise the data, we took the new text files produced from the key word searches and uploaded them into Voyant tools. This helped by visualising links between words, creating a list of top terms and provides quantitative data such as how many times each word appears. It was here we found that the word ‘letter’ appeared quite frequently and we finalised the dataset we would be using – University of British Columbia – bc-hydro-site-c.

We hunted down the site and found it contained a number of letters from people about the BC Hydro Dam Project. The problem was that the letters were in a table and when extracted the data was not clean enough. Ben O’Brien came up with a clever extraction solution utilising the raw HTML files and some script magic. The data was then prepped for geocoding by Kathleen Reed to show the geographical spread of the letter writers, hot-spots and timeline, a useful way of looking at the issue from the perspective of engagement and the community.

Map of letter writers.

Time Lapse of locations of letter writers. 

At the end of day 2 each team had a chance to present their project to the other teams. You can view the presentation (Exploring Letters of protest for the BC Hydro Dam Site C) we prepared here, as well as the other team projects.

Why Web Archives Matter

How we preserve, collect, share and exchange cultural information has changed dramatically. The act of remembering at National Institutes and Libraries has altered greatly in terms of scope, speed and scale due to the web. The way in which we provide access to, use and engage with archival material has been disrupted. All current and future historians who want to study the periods after the 1990s will have to use web archives as a resource. Currently issues around accessibility and usability have lagged behind and many students and historians are not ready. Projects like Archives Unleashed will help to furnish and equip researchers, historians, students and the community with the necessary tools to combat these problems. I look forward to seeing the next steps the project takes.

Archives Unleashed are currently accepted submissions for the next Datathon in March 2019, I highly recommend it.