All posts by kellyburchmore

New catalogue – Oxford Women in Computing: An Oral History project

The catalogue of the Oxford Women in Computing oral history project is now available online.

This oral history project captures the experiences of 10 pioneering women who were active in computing research, teaching and service provision between the 1950s and 1990s, not only in Oxford, but at national and international levels. The rationale for the project, funded by the Engineering and Physical Sciences Research Council, through grants held by Professor Ursula Martin, was that women had participated in very early stages of computing; aside from a few exceptions their stories had not been captured – or indeed told. Among the interviewees are Eleanor Dodson, methods developer for Protein Crystallography and former research technician for Dorothy Hodgkin and Linda Hayes, former Head of User Services at the Oxford University Computing Service – now University of Oxford IT services. Leonor Barroca left Portugal in 1982 as a qualified electrical engineer to follow a boyfriend to Oxford – later that year she was one of three women on the university’s MSc in Computing course. Leonor also worked briefly as a COBOL (common business-oriented language) programmer for the Bodleian Libraries.

Themes throughout the interviews, which were conducted in 2018 by author and broadcaster Georgina Ferry, include:

  • career opportunities and early interests in computing
  • gender splits in computing
  • the origins and development of computing teaching and research in Oxford
  • development of the University of Oxford’s Computing Service and the commercial software house the Numerical Algorithms Group (NAG).

The Oxford Women in Computing oral histories serve as a source for insight into nearly half a century of women’s involvement in computing at Oxford and beyond.  The collection will particularly be of use to those interested in gender studies and the history of computing.

The interviews can be listened to online though University of Oxford podcasts here.

Communications programmer Esther White in the early days of the University of Oxford’s Computing Service. © University of Oxford

 

 

What’s The Catch? Before Daniel Meadows’ Free Photographic Omnibus there was his free photographic studio, Moss Side

The first version of the catalogue for the Archive of Daniel Meadows, photographer and social documentarist, is available here. Meadows is distinguished for his tour of England in the Free Photographic Omnibus, 1973-1974, amongst many other works. This was a project he returned to in the mid 1990s to rephotograph those he had met and taken pictures of around England in the 1970s, culminating in his series of National Portraits: Now & Then, which have been exhibited both at home and abroad.

The portraits and related material from Meadows’ archive, such as national and international press coverage, are currently on display for a special exhibition in Blackwell Hall, where admission is free and everybody is encouraged to come and see. But for now, let’s take a look at what drove (pun unintentional) Meadows to tour England in a double decker bus for fourteen months.

dirt, smoke, rain and people

Meadows was born and raised in Great Washbourne, Gloucestershire, and although his time spent as a photography student at Manchester Polytechnic did not negate his appreciation of where he came from, it is clear that Meadows revelled in instilling his independence and resourcefulness in new environments. In January 1972 he rented a dilapidated barber’s shop in Greame Street, Moss Side and converted it into a photographic studio in which any local people who wandered in could have their picture taken, free of charge.

In a typescript, arranged with prints interspersed from his studio at Greame Street, titled ‘What’s The Catch?’, Meadows writes

‘Before coming to Manchester I had always lived in the most isolated and luscious countryside that this country had to offer. Moss Side Manchester is the extreme opposite, and yet, far from yearning for the sight of a cow or the smell of freshly-mown hay, I have come to love it for what it is; dirt, smoke, rain and people.’

On the next page, referencing the coming and going of the people, he writes

‘This is what I particularly like about the shop. As an [sic] habitual photographer of street life I am used to a constantly changing environment . A shop environment, then, seems to be contrary to the candid picture-making of the street. The opposite is true; the shop is merely an extension of the street and the people come in and go out in the same way as they walk the paving stones.’

(MS. Meadows 46, folder 1, ‘What’s The Catch?’)

Daniel Meadows outside his free photographic shop on Greame Street, with Moss Side residents, 1972.
MS. Meadows 46, folder 2. [photographer unknown]

‘I feel that, as a photographer who lives in the area, it is my job to make a record of a way of life which is to be destroyed’

Rather than Meadows actively seeking out photographic subjects for the Greame Street studio, he would take photographs of anybody and everybody who asked.  This is a significant characteristic Meadows would retain throughout the tour of the Free Photographic Omnibus. Through the nights of the tour, Meadows would develop the film and produce two copies of the portrait: one of these copies was always given to the person photographed.

As a student, Meadows’ sincere interest in the people and their everyday lives resonates, and his integrity is there in black and white. Meadows writes in July 1972 that

‘The reason for making photographic portraits of the inhabitants of Moss Side is that, with the demolition of the terraced houses, the population will be dispersed since many of the tenants will not be able to afford the increase in rent […] More than just the Victorian Terraces will go: a close knit community will be split up. I feel that, as a photographer who lives in the area, it is my job to make a record of a way of life which is to be destroyed.’

He goes on to write that Moss Side

‘[…] is, however, not alone in it’s plight among places where the quality of life is threatened by necessity for social change […] Over-population and environmental pollution are the poisons of the age and never before has man been forced into the situation of having to decide what kind of a future he wants for himself and his children. […] The free photographic studio was a pilot scheme for a much larger undertaking, namely to purchase a reconditioned second hand double decker bus for around £250 and travel up and down the length of the country making a record of the quality of life in England in 1973-1974.’

(MS. Meadows 50, folder 1, a circular entitled ‘Details of proposal’ distributed for help with sponsorship for the bus, July 1972)

Daniel Meadows standing in front of his newly purchased (second hand!) Bus on 24 July 1973
MS. Meadows 54. [photographer unknown]

A year later, on 24 July 1973, Meadows purchased the second hand double decker bus from Nottingham, and the journey of the Free Photographic Omnibus’ would begin.

 

 

Developing collections on Gender Equality at the UK Web Archive

The Gender Equality collection

The UK web archive Gender Equality collection and its themed subsections provide a rich insight into attitudes and approaches towards gender equality in contemporary UK society and culture. This was previously discussed in my last blog post about the collection, which you can read here.

Curating the collection

A great deal of the discussion and activity relating to gender equality occurs predominantly in an online space. This means that as a curator for the Gender Equality collection, the harvest is plenty! The type of content being collected by the UK Web Archive includes:

Of course there is some crossover, not only regarding the type of content but also within subsections of the gender equality collection.

This image is made available and reproduced by CC-BY-NC-SA 2.0. [https://creativecommons.org/licenses/by-nc-sa/2.0/legalcode]

Specifically, I find the event sites in the collection really interesting. As well as documenting that the event(s) even existed and happened in the first place, they can give us a snapshot of who organised the event, as well as who the intended audience were. Also, the collection exhibits the evolution of websites related to gender equality over time (which can be very speedy indeed when it comes to sites like twitter accounts!), and the changing priorities, trends, initiatives and more that can tell us about attitudes towards gender equality in the UK. These kinds of websites are being created by and engaged with by humans right now.

Nominate a website!

The endeavour of the UK Web Archive never stops – if you would like to help grow the Gender Equality collection (or indeed, any other collections) click here to nominate a website to save. Go on…whilst you’re at it, you can explore the UK Web Archive’s funky new interface!

 

Image reference: Workers Solidarity Movement (2012) March for Choice

 

Building collections on Gender Equality at the UK Web Archive

The Bodleian is one of the 6 legal deposit libraries in the UK. One of my projects this year as a graduate trainee digital archivist on the Bodleian Libraries’ Developing the Next Generation Archivist programme is to help curate special collections in the UK Web Archive. Since May I’ve been working on the Gender Equality collection. Please note, this post also appears on the British Library UK Web Archive blog.

Why are we collecting?

2018 is the centenary of the 1918 Representation of the People’s Act. UK-wide memorials and celebrations of this journey, and victory of women’s suffrage, are all evident online: from events, exhibitions, commemorations and campaigns. Popular topics being discussed at the moment include the hashtags #timesup and #metoo, gender pay disparity and the recent referendum on the 8th Amendment in the Republic of Ireland. These discussions produce a lot of ephemeral material, and without web archiving this material is at risk of moving or even disappearing. As we can see gender equality is being discussed a lot currently in the media, these discussions have been developing over years.

Through the UK Web Archive SHINE interface we can see that matching text for the phrase ‘gender equality’ increased from a result of 0.002% (24 out of 843,204) of crawled resources in 1996, to 0.044% (23,289 out of 53,146,359) in 2013.

SHINE user interface

If we search UK web content relating to gender equality we will generate so many results; for example, organisations have published their gender pay discrepancy reports online and there is much to engage with from social media accounts of both individuals and organisations relating to campaigning for gender equality. It becomes apparent that when we browse this web content gender equality means something different for so many presences online: charities, societies, employers, authorities, heritage centres and individuals such as social entrepreneurs, teachers, researchers and more.

The Fawcett Society: https://www.fawcettsociety.org.uk/blog/why-does-teaching-votes-for-women-matter-an-a-level-teachers-perspective

What we are collecting?

The Gender Equality special collection, that is now live on the UK Web Archive comprises material which provides a snapshot into attitudes towards gender equality in the UK. Web material is harvested under the areas of:

  • Bodily autonomy
  • Domestic abuse/Gender based violence
  • Gender equality in the workplace
  • Gender identity
  • Parenting
  • The gender pay gap
  • Women’s suffrage

100 years on from women’s suffrage the fight for gender equality continues. The collection is still undergoing curation and growing in archival records – and you can help too!

How to get involved?

If there are any UK websites that you think should be added to the Gender Equality collection then you can take up the UK Web Archive’s call for action and nominate.

 

 

Significance & Authenticity: a Briefing

As an Ancient History graduate, significance and authenticity of source information characterised my university education. Transferring these principles to digital objects in an archival situation is a challenge I look forward to learning more about and embracing. Therefore I set off to Tate Britain on a cold Friday morning excited to explore the Digital Preservation Coalition’s briefing: Significance & Authenticity. Here are some of my reflections.

A dictionary definition is not enough

The morning started with a stimulating discussion led by Sharon McMeekin (DPC), on the definitions of these two concepts within the field of Digital Archives and the context of the varying institutions the delegates were from. Several key points were made, and further questions generated:

Authenticity

  • Authenticity clearly carries with it evidential value; if something is not what it purports to be then how can it (claim to) be authentic?
  • Chains of custody and tracking accidental/intended changes are extremely relevant to maintaining authenticity
  • Further measures such as increasing metadata fields – does this ensure authenticity?

For an archival record to retain authenticity there must be record of the original creation or experience of the digital object; otherwise we are looking at data without context. This also has a bearing on how significant an archival record is. A suggestion was also made that perhaps as a sector too much over-emphasis is placed on integrity checking procedures. Questions surfaced such as: is the digital preservation community too reliant on it? And in turn, is this practical process approach to ensuring authenticity too simplistic?

Significance

  • Records are not just static evidence, they are also for appreciation, education and to use
  • Should the users and re-users (the designated community) be considered more extensively when deciding the significance of a digital object?
  • Emulation as a digital preservation action prioritises the experience of using the data: is this the way to go regarding maintaining both the significant properties together with the authenticity?

There was no doubt left in my mind that the two principles are inextricably linked. However, not only are they increasingly subjective for both the record keeper and the end user, they must be distinguished from one another. For example, if a digital object can be interpreted as both a game and a book, yet the object was created and marketed as a book, does this make it any less significant or authentic? Or is the dispute part of what makes the object significant; the creation, characterisation and presentation of data in digital form is reflective of society today and what researchers may (or may not be) interested in in the future? We do not know and, as a fellow delegate reminded, cannot prejudice future research needs.

Building on the open mindedness that the  discussion encouraged, we were then fortunate enough to hear and learn from practitioners of differing backgrounds regarding how they ensure significance and authenticity of their collections. One particular example had me contemplating all weekend.

Significance & Authenticity of Digital Art by Patricia Falcao & Tom Ensom (Tate)

Patricia and Tom explained that they work with time-based media art and its creators. Working (mostly) with living artists ensures a short chain of provenance, however the nature of the digital art means that applying authenticity and significance is in no way straightforward. A principle which immediately affects the criteria of significance is the fact that it is very important that the Tate can exhibit the works, illustrating that differences in organisations will of course have a bearing on how significant a record is.

One example Tom analysed was the software based Brutalism: Stereo Reality Environment 3 by Peruvian artist Jose Carlos Martinat Mendoza:

Brutalism: Stereo Reality Environment 3 2007 Jose Carlos Martinat Mendoza born 1974 Presented by Eduardo Leme 2007, accessioned 2011 http://www.tate.org.uk/art/work/T13251

The artwork comprises of a range of components: high speed printers, paper rolls,  a web search program and accompanying hardware, movement sensors and a model replica of the Peruvian government building ‘El Petagonito’ which is a symbol of brutalist architecture. The computer is programmed to search the web for references to ‘Brutalism’ and the different extracts of information it gathers are printed from mounted printers on the sculpture, left to fall to the floor around the replica.

Tom explained that retaining authenticity of the digital art was very much a case of the commitment to represent the artists work together with the arrangement and intention. One method of ensuring this is the transfer of a document from the creator called ‘Installation Parameters’. For this particular example, it contained details such as paper type and cabling needs. It also contained display specifications such as the hardware being  a very visible element of the art work.

Further documentation is created and stored to preserve the original authenticity and thus unique significance of the artwork and the integrity of its ‘performance’.  Provenance information such as diagrams, process metadata and the original source code is stored separately to the work itself. However, Tom acknowledged there is no doubt the work will need to change and in turn will be reinterpreted. Interestingly, the point was made that the text itself on the paper itself is time sensitive; live search results related to Brutalism will evolve and change.

Looking ahead, what will happen when the hardware fails? And even, what will happen when nobody uses printers anymore? Stockpiling is only a short term plan for maintaining authenticity and significance. Furthermore, even if hardware can be guaranteed then the program software itself generates different issues. Software emulation, code-change tracking systems and a binary analysis are all to be explored as a means to enable authenticity but there will always be a risk and need for alternative solutions.

Would these changes reduce the authenticity or significance? I believe authenticity is associated with intention and so perhaps if changes are communicated to the user with justifications this could be one way of maintaining this principle. Significance, on the other hand, is more tricky. Without the significant and notable properties of the work, is significance automatically lost?

This case study reinforced that there is much to explore and consider when approaching the principles of authenticity and significance of digital objects. To conclude, Tom and Patricia reinforced that within the artistic context, decisions around authenticity and significance are made through collaborative dialogues with the artist/creator which does indeed provide direction.

Workshop

After 3 more talks and a panel session the briefing ended with a workshop requiring us to evaluate the significance and authenticity of a digital object provided. As a trainee digital archivist I can be guilty of shying away from group discussions/exercises within the community of practice, so I was really pleased to jump in and contribute during the group workshop exercise.

Thank you to the DPC and all involved for a brilliant day.

Subcultures as Integrative Forces in East-Central Europe 1900 – present: a Bodleian Libraries’ Web Archive record

A problem, and a solution in action:

The ephemeral nature of internet content (the average life of a web page is 100 days – illustrating that websites do not need to be purposefully deleted to vanish) is only one contributing factor to data loss. Web preservation is high priority;  action is required. This is a driver for not only Bodleian Libraries’ Web Archive, but digital preservation initiatives on a global scale.

However, today I would like to share the solution in action, an example from BLWA’s University of Oxford Collection: Subcultures as Integrative Forces in East-Central Europe 1900 – present.

On the live web, attempts to access the site are met with automatic redirects to BLWA’s most recent archived capture (24 Jan. 2017). The yellow banner indicates it is part of our archive. Image from http://wayback.archive-it.org/2502/20170124104518/http://subcultures.mml.ox.ac.uk/home.html

Subcultures is a University of Oxford project, backed by the Arts & Humanities Research Council, which through its explorative redefinition of ‘sub-cultures’ aims to challenge the current way of understanding simultaneous identification forms in the region of Eastern Europe through a multi-disciplinary methodology of social anthropology, discourse analysis, historical studies and linguistics. The project ran from 2012-2016.

The Subcultures website is an incredibly rich record of the project and it’s numerous works.  It held cross-continent collaborative initiatives including lectures, international workshops and seminars, as well as an outreach programme including academic publications. Furthermore, comparative micro-studies were conducted in parallel with main collaborative project: Linguistic Identities: L’viv/Lodz, c.1900; Myth and Memory: Jews and Germans, Interwar Romania; Historical Discourses: Communist Silesia and Discursive Constructions: L’viv and Wroclaw to present. The scope and content of the project, including key questions, materials, past and present events and network information is* all hosted on http://subcultures.mml.ox.ac.uk/home.html.

Was*. The site is no longer live on the internet.

However, as well as an automatic re-direction to our most recent archival copy, a search on Bodleian Libraries’ Web Archive generates 6 captures in total:

Search results for Subcultures within BLWA. Image from https://archive-it.org/home/bodleian?q=Subcultures

The materials tab of the site fully functions in the archived capture: you are able to listen to the podcasts and download the papers on theory and case studies as PDF versions.

The use of Subcultures

To explore the importance of web-archiving in this context, let us think about the potential use(rs) of this record and the implications if the website were no longer available:

As the  project comprised a wider outreach programme alongside its research, content such as PDF publications and podcasts were available for download, consultation and further research. The website platform means that these innovative collaborations and the data informed by the primary methodology are available for access. This is of access to the public on a global scale for education and knowledge and interaction with important issues – without even elaborating on how academics, researchers, historians and the wider user community will benefit from the availability of the materials from this web archive. Outreach by its very nature demands an unspecified group of people to lend its services to help.

Listening to the podcast of the project event hosted in Krakow: ‘Hybrid Identity’ in 2014. Rationale, abstracts and biographies from the workshop can also be opened. Image from http://wayback.archive-it.org/2502/20170124104618/http://subcultures.mml.ox.ac.uk/materials/workshop-krakow-hybrid-identity-september-2014.html

Furthermore, the site provides an irreplaceable record of institutional history for University of Oxford as a whole, as well as its research and collaborations. This is a dominant purpose of our University of Oxford collection. The role of preserving for posterity cannot be underplayed. Subcultures provides data that will be used, re-used and of grave importance for decades to come, and also documents decisions and projects of the University of Oxford. For example, the outline and rationale of the project is available in full through the Background Paper – Theory, available for consultation through the archived capture as it would be through the live web. Biographical details of contributors are also hosted on the captures, preserving records of people involved and their roles for further posterity and accountability.

Building on the importance of access to research: internet presence increases scholarly interaction. The scope of the project is of great relevance, and data for research is not only available from the capture of the site, but the use of internet archives as datasets are expected to become more prominent.

Participate!

Here at BLWA the archiving process begins with a nomination for archiving: if you have a site that you believe is of value for preserving as part of one of our collections then please do so here. The nomination form will go to the curators and web-archivists on the  BLWA team for selection checks and further processing. We would love to hear your nominations.

PASIG 2017: Reflections on ‘Digital Preservation at the United Nations Mechanism for International Criminal Tribunals’

Along with my colleagues, I was incredibly grateful to be at Oxford PASIG 2017, hosted at the Oxford University Museum of Natural History from 11-13 September.

A presentation given by Angeline Takawira,  was affirmation indeed as to why advocacy for digital preservation is crucial worldwide.  Angeline gave us an insight into the aims and challenges of digital preservation at the United Nations Mechanism for International Criminal Tribunals (UN MICT).

The Mechanism

Angeline explained that the purpose of the UN MICT is to continue the mandated and essential actions that  have been carried out temporarily by two International Criminal Tribunals: Rwanda (ICTR) from 1993 until 2015 and Yugoslavia (ICTY) since 1994, which will be closing at the end of this year. UN MICT was established in 2010 by the UN Security Council, and is therefore a relatively new organisation. However, like its two predecessors, it is temporary.

We were told about the highly significant and mandated functions of MICT:

  1. To protect and support victims, witnesses and all others affected by war crimes
  2. To enforce sentences and other judicial work
  3. To preserve and manage the archives of the international tribunals.

You can find out more about the important work of the UN MICT here.

Digital Preservation at UN MICT

The Mechanism is made up of two branches: The Hague, Netherlands and Arusha, Tanzania, so the single digital repository is maintained across two continents. Currently the digital records of each of these are a hybrid of both digitised and born-digital material with example files including emails, GIS datasets, websites and CAD files. However, the audio-visual files take up 90% in volume of the digital archives combined.

It is so apparent that UN MICT’s  preservation goals are aligned to their aims as an organisation as a whole; authenticity is imperative for all of their records.  Angeline asserted that their digital preservation goals were to be trustworthy, accessible and useable and ‘demonstrably authentic’ – that is, identical to the digital original in all essential aspects. The digital archive is made up of:

  • Judicial case records – such as court decisions, judgements, court transcripts
  • Records relating to the judicial process – for example detentions of the accused and the protection of witnesses
  • Administrative records of the tribunals as an organisation (and also the Mechanism as an organisation).

Through a range of actions, the development of the digital preservation programme is achieving these aims. Angeline cited the introductions of workflows and compliance with standards, as well as the records being transferred to the repository with an unbroken chain of custody with stringent access controls and fixity checks to ensure no corruption. Furthermore, work continues on defining procedures around migration plans, as the Mechanism wishes to retain an experience of authenticity – which understandably needs a focus on file format characteristics.

Challenges

PASIG definitely taught me that authentic and usable digital preservation is always a trialling undertaking, but the challenges faced when digitally preserving the UN MICT are particularly unique due to its sensitive content and technicalities. For one, the fact that it is a temporary organisation is at odds with the long term endeavour of making these tribunal records accessible for the future and ensuring their protection. A repository transfer as a next step would need extremely critical consideration. Also, the retention schedule of different data is a factor for discussion – so that the UN MICT can fulfil its requirements of deletion in a transparent way.

One of the largest challenges to the future of digital preservation for similar organisations and initiatives, there is limited financial sustainability, resources and staff in order to sustain the long term commitment that digital preservation of records like this really command.

Use

There is no doubt that the digital archive of the UN MICT would be of fundamental significance to an international user community of the global media, legal professionals, academics, researchers and all education in general.  Combine these user groups with the broad range of stakeholders in preserving the Mechanism: the international courts, the security council who gave the mandated the work, there are many to whom this cause, and the information it preserves, will be vital to.  I have visited 4 countries of former Yugoslavia and the digital records of the MICT are surely equally  as compulsory to preserve and learn from as the  physical and tangible evidence of conflict. The need for advocacy of digital preservation is pertinent, and the UN MICT are doing urgent work.

Bountiful Harvest: Curation, Collection and Use of Web Archives

The theme for the ARA Annual Conference 2017 is: ‘Challenge the Past, Set the Agenda’. I was fortunate enough to attend a pre-conference workshop in Manchester, ran by Lori Donovan and Maria Praetzellis from The Internet Archive, about the bountiful harvest that is web content, and the technology, tools and features that enable web archivists to overcome the challenges it presents.

Part I – Collections, Community and Challenges

Lori gave us an insight into the use cases of Archive-it partner organisations to show us the breadth of reasons why other institutions archive the web. The creation of a web collection can be for one of (or indeed, all) the following reasons:

  • To maintain institutional history
  • To document social commentary and the perspectives of users
  • To capture spontaneous events
  • To augment physical holdings
  • Responsibility: Some documents are ONLY digital. For example, if a repository upholds a role to maintain all published records, a website can be moved into the realm of publication material.

When asked about duplication amongst web archives, and whether it was a problem if two different organisations archive the same web content, Lori put forward the argument that duplication is not worrisome. The more captures of a website is good for long term preservation in general – in some cases organisations can work together on collaborative collecting if the collection scope is appropriate.

Ultimately, the priority of crawling and capturing a site is to recreate the same experience a user would have if they were to visit the live site on the day it was archived. Combining this with an appropriate archive frequency  means that change over time can also be preserved. This is hugely important: the ephemeral nature of internet content is widely attested to. Thankfully, the misconception that ‘online content will be around forever’ is being confronted. Lori put forward some examples to illustrate the point for why the archiving of websites is crucial.

In general, a typical website lasts 90-100 days before one of the following happens:

  1. The content changes
  2. The site URL moves
  3. The content disappears completely

A study was carried out on the Occupy Movement sites archived in 2012. Of 582 archived sites, only 41% were still live on the web as of April 2014. (Lori Donovan)

Furthermore, we were told about a 2014 study which concluded that 70% of scholarly articles online with text citations suffered from reference rot over time. This speaks volumes about preserving copies in order for both authentication and academic integrity.

The challenge continues…

Lori also pointed us to the NDSA 2016/2017 survey which outlines the principle concerns within web archiving currently: Social media, (70%); Video, (69%) and Interactive media and Databases, (both 62%).  Any dynamic content can be difficult to capture and curate, therefore sharing advice  and guidelines amongst leaders in the web archiving community is a key factor in determining successful practice for both current web archivists, and those of future generations.

Part II – Current and Future Agenda

Maria then talked us through some key tools and features which enable greater crawling technology, higher quality captures and the preservation of web archives for access and use:

  • Brozzler. Definitely my new favourite portmanteau (browser + crawler = brozzler!), brozzler is the newly developed crawler by The Internet Archive which is replacing the combination of heritrix and umbra crawlers. Brozzler captures http traffic as it is loaded, works with YouTube in order to improve media capture and the data will be immediately written and saved as a WARC file. Also, brozzler uses a real browser to fetch pages, which enables it to capture embedded urls and extract links.
  • WARC. A Web ARChive file format is the ISO standard for web archives. It is a concatenated file written by a crawler, with long term storage and preservation specifically in mind. However, Maria pointed out to us that WARC files are not constructed to easily enable research (more on this below.).
  • Elasticsearch. The full-text search system does not just search the html content displayed on the web pages, it searches PDF, Word and other text-based documents.
  • solr. A metadata-only search tool. Metadata can be added on Archive-it at collection, seed and document level.

Supporting researchers now and in the future

The tangible experience and use of web archives where a site can be navigated as if it was live can shed so much light on the political and social climate of its time of capture. Yet, Maria explained that the raw captured data, rather than just the replay, is obviously a rich area for potential research and, if handled correctly, is an inappropriable research tool.

As well as the use of Brozzler as a new crawling technology, Archive-it research services offer a set of derivative data-set files which are less complex than WARC and allow for data analysis and research. One of these derivative data sets is a Longitudinal Graph Analysis (LGA) dataset file which will allow the researcher to analyse the trend in links between urls over time within an entire web collection.

Maria acknowledged that there are lessons  to be learnt when supporting researchers using web archives, including technical proficiency training and reference resources. The typology of the researchers who use web archives is ever growing: social and political scientists, digital humanities disciplines, computer science and documentary and evidence based research including legal discovery.

What Lori and Maria both made clear throughout the workshop was that the development and growth of web archiving is integral to challenging the past and preserving access on a long term scale. I really appreciated an insight into how the life cycle of web archiving is a continual process, from creating a collection, through to research services, whilst simultaneously managing the workflow of curation.

When in Manchester…

Virtual Archive, Central Library, Manchester

I  couldn’t leave  Manchester without exploring the John Rylands Library and Manchester’s Central Library. In the latter, this interactive digital representation of a physical archive combined choosing a box from how a physical archive may be arranged, and then projected the digitised content onto the screen once selected. A few streets away in Deansgate I had just enough time in John Rylands to learn that the fear of beards is called Pogonophobia. Go and visit yourself to learn more!

Special collections reading room, John Rylands Library, Manchester

Initiating conversation: let’s talk about web content (part 2)

Colin Harris, Superintendent of Special Collections reading rooms. Chosen site: cyndislist.com

‘I am a founding member of Oxfordshire Family History Society and I’ve long been interested in family history. As a phenomena it surged in popularity in the 1970’s. In about 1973 there was great curiosity (in OFHS) in Bicester as everyone was interested in the popular group, The Osmonds (who originated from Bicester!). Every county has a family history society and I would say it’s they who have done the lion’s share of the work. All of their work and indexing…it’s all grist to the mill in terms of recording names and events.

So the website I would like to have access to in 10 years’ time is cyndislist.com, which is one of the world’s largest databases for genealogy. In fact it’s been going for over 21 years already. This was launched on the 4th March 1996. The family history people have been right there from the very beginning, it’s been growing solidly since then; it’s fantastic. It covers 200 categories of subjects, it has links to 332,000 other websites, and it’s the starting point for any genealogical research. The ‘Cyndi’ is Cyndi Howell, an author in genealogy.

Almost every day the site is launching content that might be interesting in some particular subject. So just going back within the last couple of weeks: an article on Telling the Orphan’s story; Archive lab on how to preserve old negatives; The key to family reunion success and DNA: testing at a family reunion! Projects even go beyond individuals…they explore a Yellowstone wolf family. There is virtually nothing that is untouched. Anything with a name to it has potential for exploration.

To be honest, I haven’t been able to do any family history research since 1980, but I am hoping to do some later on this year (when I retire). All these years that have passed has meant that so much is available to be accessed over the internet

Actually I’d love to see genealogy and family history workers and volunteers getting more recognition for the fantastic amount of industrious and tech savvy work they do. Family history is something for people from all walks of life. Our history, your history, my history is something very personal. As I say, 21 years and going strong; I’d love to see the site going stronger still in 10 years’ time.’


 

Pip Willcox, Head of the Centre for Digital Scholarship and Senior Researcher at Oxford e-Research. Chosen site: twitter.com

Twitter is an amazing tool that society has used to show the best of what humanity is at the moment…we share ideas, we share friendship, fun and joy, we communicate with others around the world, people help each other. But, it shows the worst of what humans can do. The news we see is just the tip of the iceberg – the levels of abuse that users, particularly minority groups, receive is appalling. Twitter is a fantastic place to meet people who think very differently from us, people who come from different backgrounds, have had different experiences, who live far from us, or close by but we might not otherwise have met. It is so rich, so full of potential, and some of what we do with it is amazing, yet some of what we do with it is appalling.

The question for the archive is “which Twitter?” There is the general feed, what you see if you don’t sign in. Then there are our individual feeds, where we curate our own filter bubbles, customizing what we see through our accounts. You can create a feed around a hashtag, an event, or slice it by time or location. All of these approaches will affect the version of Twitter we archive and leave for the future to discover.

These filter bubbles are not new: we have always lived in them, even if we haven’t called them that before. Last year there was an experiment where a series of couples who held diametrically opposing views switched Twitter accounts and I found that, and their thoughtful response to it fascinating.

Projects like Cultures of Knowledge, for example, which is based at the History Faculty here at the University of Oxford, traces early modern correspondence. This resource lets you search for who was writing to whom, when, where, and the subjects they were discussing. It’s an enormously rich, people-centred view of the history of ideas and relationships across time and space, and of course it points readers on in interesting directions, to engage closely with the texts themselves. This is possible because the letters were archived and catalogued over the years, over the centuries by experts.

How are we going to trace the conversations of the late 20th and the early 21st centuries? The speed at which ideas flow is faster than ever and their breadth is global. What will future historians make of our age?

I’m interested from a future history as well as a community point of view. The way we are using Twitter has already changed and tracking its use, reach, and power seems to me well worth recording to help us understand it now, and to help explain an aspect of our lives to future societies. For me, Twitter makes the world more familiar, and anything that draws us together as a global community, that reinforces our understanding that we share one planet, that what we have in common vastly outweighs what divides us, and that helps us find ways to communicate is a good and a necessary thing.’

 


 

Will Shire, Library Assistant, Philosophy and Theology Faculty Library. Chosen site: wikipedia.org

‘It’s one of the sites I use the most…it has all of human knowledge. I think it’s a cool idea that anyone can edit it – unlike a normal book it’s updated constantly. I feel it’s derided almost too much by people who automatically think it’s not trustworthy…but I like the fact that it is a range of people coming together to edit and amend this resource. As a kid I bothered my mum all the time with constant questioning of ‘Why is this like this, why does it do that. Nowadays if you have a question about anything you can visit wikipedia.org. It would be really interesting to take a snapshot of one article every month or week in order to see how much it changes through user editing.

 Also, I studied languages and it is extremely useful for learning new vocabulary as the links at the side of the article can take you to the content in other available languages. You can quite easily look at different words or use it as a starter to take you to different articles in other languages that aren’t English.’


 

 

 

 

Why archive the web?

Here at the Bodleian Libraries’ Web Archive (BLWA), the archiving process starts with a nomination – either by our web curators or by you, the public. The nominated URLs the BLWA team then select for archiving are those specifically identified as being of lasting value and significance for preservation.

Not only are the sites chosen from a preservation standpoint – we are also continually seeking to build up the scope and content of our 7 collections within the BLWA: University of Oxford; University of Oxford colleges; University of Oxford museums, libraries and archives; social sciences; arts and humanities; international and science, medicine and technology. Exactly like the use of a physical collection, the sites belonging to the web collection will be used for research, fact checking, discovery and collaboration. There can be no denying that the web is the platform on which so much of contemporary society occurs. In the future then, and indeed now, web archives are providing an insight into our history.

Anti-Apartheid Movement Archives – http://www.aamarchives.org/

The AAMA site is part of our international collection in the BLWA. Within this collection we have captured the aamarchives.org 7 times since 24th November 2015. This online platform is vital for digital access to further research, cross-cultural relationships and efforts towards understanding the history of the British Anti-Apartheid Movement 1959 – 1994. This capture has preserved the navigation and functionality of the site and links still resolve; for example the user community can still browse the archive, learn about campaigns and download resources. The date and time is clearly displayed in the banner at the top.

BLWA’s first capture of the online AAMA

This website can also be used and explored in conjunction with our related physical holdings. Here at the Bodleian Special Collections we have an amazing depth and range of physical material in the Anti-Apartheid Movement archive and our Commonwealth and African studies collections. You can browse the catalogue for this here.

This archived capture is fully functional, like a live site.

This is a tangible example of how digital preservation enhances and complements physical material and ensures records can reach a wider audience. How exciting it is that a researcher can consult manuscript or archived material, alongside captures of websites from the past in order to gain more of an insight and have a wider scope of substance to survey!

Web content like the aamarchives.org/ is not as stable as you might presume. A repository of web based collections enables future discovery of internet sites that are perhaps taken for granted due to the nature of our technological society; everything is just a tap or a click away. In fact, much of the material we interact with today is only available online. The truth is that web content is ephemeral: there is a very real threat that it can rapidly change and disappear altogether. Therefore web archiving initiatives are vital to preserve these valuable resources for good. Through these captures, provenance, arrangement and content have been preserved; and arguably most importantly of all – access.

Both individual collections and the web archive as a whole can be searched for a specific site, or browsed at leisure.

Growth of open access and web based initiatives mean that there is an ever increasing network of digital libraries on a global scale. There is no doubt that the practice of web archiving is a significant contribution towards ensuring knowledge for all. Access to the Internet enabling access to an ever growing knowledge depository is central to the integrity of educational and professional research, web archiving and on a larger scale, digital preservation.

Browse our collections in Bodleian Libraries’ Web Archive

Get involved and help preserve our history! Nominate a site to archive