Tag Archives: Archives & Modern Manuscripts

Collecting Space: The Inaugural Science and Technology Archives Group Conference

On Friday 17th of November I attended the inaugural Science and Technology Archives Group (STAG) conference held at the fantastic Dana Library and Research Centre. The theme was ‘Collecting Space’ and bought together a variety of people working in or with science and technology archives relating to the topic of ‘Space’. The day consisted of a variety of talks (with topics as varied as The Cassini probe to UFOs), a tour of the Skylark exhibition and a final discussion on the future direction of STAG.

What is STAG?

The Science and technology archives group is a recently formed group (September 2016) to celebrate and promote scientific archives and to to engage anyone that has an interest in the creation, use and preservation of such archives. 

The keynote presentation was by Professor Michele Dougherty, who gave us a fascinating insight into the Cassini project, aided by some amazing photos. 

Colour-coded version of an ISS NAC clear-filter image of Enceladus’ near surface plumes at the south pole of the moon. From Porco et al. 2006, doi: 10.1126/science.1123013

Her concern with regards to archiving data was context. We were told how her raw data could be given to an archive however it would be almost meaningless without the relevant information about context, for example calibration parameters. Without it data could be misinterpreted.

Dr James Peters from the University of Manchester told us of the unique challenges of the Jodrell Bank Observatory Archive, also called the ‘sleeping giant’. They have a vast amount of material that has yet to be accessioned but requires highly specialised scientific knowledge to understand it. Highlighting the importance of the relationships between the creator of an archive and the repository. Promoting use of the archive was of particular concern, which was also shared by Dr Sian Prosser of the Royal Astronomical Society archives. She spoke of the challenges for current collection development. I’m looking forward to finding out about the events and activities planned for their bi-centenary in 2020.

We also heard from Dr Tom Lean of the Oral History of British Science at the British library. This was a great example of the vast amount of knowledge and history that is effectively hidden. The success of a project is typically well documented however the stories of the things that went wrong or of the relationships between groups has the potential to be lost. Whilst they may be lacking in scientific research value, they reveal the personal side of the projects and are a reminder of the people and personalities behind world changing projects and discoveries.

Dr David Clarke spoke about the Ministry of Defence UFO files release program. I was surprised to hear that as recently as 2009 there was a government funded UFO desk. In 2009 these surviving records were transferred to the National Archives. All files were digitised and made available online. The demand and reach for this content was huge, with millions of views and downloads from over 160 countries. Such an archive, whilst people may dismiss its relevance and use scientifically, provides an amazing window into the psyche of the society at that time.

Dr Amy Chambers spoke about how much scientific research and knowledge can go into producing a film and used Stanley Kubrick’s 2001: A Space Odyssey as an example. This was described as a science fiction dream + space documentary. Directors like Kubrick would delve deeply into the subject matter and speak to a whole host of professionals in both academia and industry to get the most up to date scientific thinking of the time. Even researching concepts that would potentially never make it on screen. This was highlighted as a way of capturing scientific knowledge and the current thoughts about the future of science at that point in history. Today it is no different, Interstellar, produced by Christopher Nolan, consulted Professor Kip Thorne and the collaboration produced a publication on gravitational lensing in the journal Classical and Quantum Gravity.

It was great to see the Dana research library and a small exhibition of some of the space related material that the Science Museum holds. There was the Apollo 11 flight plan that was signed by all the astronauts that took part and included a letter from the Independent Television News, as they used that book to help with the televised broadcast.

Apollo 11 flight plan

We also got to see the recently opened Skylark exhibition, celebrating British achievements in space research.

Scale model of the Skylark rocket at the exhibition entrance at the Science Museum, London

The final part of the conference was an open discussion focusing on the challenges and future of science and technology archives and how these could be addressed.

Awareness and exposure

From my experience of being a chemistry graduate, I can speak first hand of the lack of awareness of science archives. I feel that I was not alone, as during the course of a science degree, especially for research projects, archives are never really needed compared to other disciplines as most of the material we needed was found in online journals. Although I completed my degree some time ago, I feel this is still the case today when I speak to friends who study and work in the science sector. It seems that promotion of science and technology archives to scientists (at any stage of their career, but especially at the start) will make them aware of the rich source of material out there that can be of benefit to them, and subsequently they will become more involved and interested in creating and maintaining such archives.

Content

Science and technology archives, for an archivist with little to no knowledge of that particular area of science, understanding the vastly complex data and material is a potentially impossible job. The nomenclature used in scientific disciplines can be highly specialised and specific and so deciphering the material can be made extremely difficult.

This problem could be resolved in one of two ways. Firstly, the creator of the material or a scientist working in that area can be consulted. Whilst this can be time consuming, it is a necessity as the highly specialised nature of certain topics, can mean there are only a handful of people that can understand the work. Secondly, when the material is created, the creator should be encouraged to explain and store data in a way that will allow future users to understand and contextualise the data better.

As science and technology companies can be highly secretive entities, problems with exploiting sensitive material arise. It was suggested maybe seeking the advice of other specialist archive groups that have dealt with highly sensitive archives.

It appears that there is still a great deal of work to do to promote access, exploitation and awareness of current science and technology archives (for both creators and users). STAG is a fantastic way to get like minds together to discuss and implement solutions. I’m really looking forward to seeing how this develops and hopefully I will be able to contribute to this exciting, worthwhile and necessary future for science and technology archives.

Subcultures as Integrative Forces in East-Central Europe 1900 – present: a Bodleian Libraries’ Web Archive record

A problem, and a solution in action:

The ephemeral nature of internet content (the average life of a web page is 100 days – illustrating that websites do not need to be purposefully deleted to vanish) is only one contributing factor to data loss. Web preservation is high priority;  action is required. This is a driver for not only Bodleian Libraries’ Web Archive, but digital preservation initiatives on a global scale.

However, today I would like to share the solution in action, an example from BLWA’s University of Oxford Collection: Subcultures as Integrative Forces in East-Central Europe 1900 – present.

On the live web, attempts to access the site are met with automatic redirects to BLWA’s most recent archived capture (24 Jan. 2017). The yellow banner indicates it is part of our archive. Image from http://wayback.archive-it.org/2502/20170124104518/http://subcultures.mml.ox.ac.uk/home.html

Subcultures is a University of Oxford project, backed by the Arts & Humanities Research Council, which through its explorative redefinition of ‘sub-cultures’ aims to challenge the current way of understanding simultaneous identification forms in the region of Eastern Europe through a multi-disciplinary methodology of social anthropology, discourse analysis, historical studies and linguistics. The project ran from 2012-2016.

The Subcultures website is an incredibly rich record of the project and it’s numerous works.  It held cross-continent collaborative initiatives including lectures, international workshops and seminars, as well as an outreach programme including academic publications. Furthermore, comparative micro-studies were conducted in parallel with main collaborative project: Linguistic Identities: L’viv/Lodz, c.1900; Myth and Memory: Jews and Germans, Interwar Romania; Historical Discourses: Communist Silesia and Discursive Constructions: L’viv and Wroclaw to present. The scope and content of the project, including key questions, materials, past and present events and network information is* all hosted on http://subcultures.mml.ox.ac.uk/home.html.

Was*. The site is no longer live on the internet.

However, as well as an automatic re-direction to our most recent archival copy, a search on Bodleian Libraries’ Web Archive generates 6 captures in total:

Search results for Subcultures within BLWA. Image from https://archive-it.org/home/bodleian?q=Subcultures

The materials tab of the site fully functions in the archived capture: you are able to listen to the podcasts and download the papers on theory and case studies as PDF versions.

The use of Subcultures

To explore the importance of web-archiving in this context, let us think about the potential use(rs) of this record and the implications if the website were no longer available:

As the  project comprised a wider outreach programme alongside its research, content such as PDF publications and podcasts were available for download, consultation and further research. The website platform means that these innovative collaborations and the data informed by the primary methodology are available for access. This is of access to the public on a global scale for education and knowledge and interaction with important issues – without even elaborating on how academics, researchers, historians and the wider user community will benefit from the availability of the materials from this web archive. Outreach by its very nature demands an unspecified group of people to lend its services to help.

Listening to the podcast of the project event hosted in Krakow: ‘Hybrid Identity’ in 2014. Rationale, abstracts and biographies from the workshop can also be opened. Image from http://wayback.archive-it.org/2502/20170124104618/http://subcultures.mml.ox.ac.uk/materials/workshop-krakow-hybrid-identity-september-2014.html

Furthermore, the site provides an irreplaceable record of institutional history for University of Oxford as a whole, as well as its research and collaborations. This is a dominant purpose of our University of Oxford collection. The role of preserving for posterity cannot be underplayed. Subcultures provides data that will be used, re-used and of grave importance for decades to come, and also documents decisions and projects of the University of Oxford. For example, the outline and rationale of the project is available in full through the Background Paper – Theory, available for consultation through the archived capture as it would be through the live web. Biographical details of contributors are also hosted on the captures, preserving records of people involved and their roles for further posterity and accountability.

Building on the importance of access to research: internet presence increases scholarly interaction. The scope of the project is of great relevance, and data for research is not only available from the capture of the site, but the use of internet archives as datasets are expected to become more prominent.

Participate!

Here at BLWA the archiving process begins with a nomination for archiving: if you have a site that you believe is of value for preserving as part of one of our collections then please do so here. The nomination form will go to the curators and web-archivists on the  BLWA team for selection checks and further processing. We would love to hear your nominations.

A sympathy for strangers: Oxfam and the history of humanitarianism

On Tuesday 31st October the Oxfam Archive Assistants attended a lecture at St Antony’s College by Princeton University’s Professor Jeremy Adelman, entitled Towards a Global History of Humanitarianism. Professor Adelman’s focus was primarily the nineteenth and early twentieth centuries, but his narrative had implications for the way we might view contemporary humanitarian agencies such as Oxfam.

 

Historians have not always been kind in their assessments of international humanitarianism. Alex de Waal was broadly critical of the role such agencies have played when dealing with famine on the African continent: by supplying aid externally, he argues, they inadvertently undermine the democratic accountability of African governments, disincentivizing humanitarian intervention or crisis prevention as a way of preserving political power.[1] To an extent, Adelman spoke in a similar vein: abolitionists may have helped stimulate the rise of humanitarianism in the nineteenth century but colonial penetration itself was often justified in terms of humanitarian intervention, where the white settler was morally and ethically obliged to ‘civilize’ the unsophisticated ‘native’. Humanitarian discourse, Adelman argued, is by its nature racialized, and it invariably reinforces the self-image of Western nations as occupying the apex of a civilizational hierarchy.

 

This might seem somewhat damning of all Oxfam does and stands for. However, Adelman also spoke of a ‘sympathy for strangers’ which grew out of increasing global connectedness and integration as telegraph cables, railways and steamships curtailed the spatial and intellectual distances between disparate peoples. The camera was, according to Adelman, a fundamental technological innovation in this respect and the relationship between photography and humanitarianism has in many ways been central to the development of charities like Oxfam. Borrowing from Susan Sontag, Adelman suggested that ‘moral witnesses’ – i.e., photographers – record public memories of pain, creating a connection between the ‘victim’ – the subject of the photograph – and the viewer.

 

In the 19th century missionaries armed themselves with Kodak cameras, and by producing lantern slide shows of their experiences in foreign climes hoped to raise money for future missionary work. But in the Congo Free State, rendered a personal possession of King Leopold of Belgium in 1885, missionaries began to use their cameras to record atrocities committed against Congolese rubber plantation workers. In the face of international scrutiny – which admittedly was somewhat more self-interested than compassionate – King Leopold was forced to cede Congo as a personal asset. It could certainly be argued that such photographs exploited the pain of others, titillating public interest at home without any true empathy for or understanding of the Congolese people. According to Susan Sontag, the ‘knowledge gained through still photographs will always be some kind of sentimentalism, whether cynical or humanist’.[2]

 

 

But the power of the photograph to reinforce moral or empathetic feeling can be – and has been – used for the genuine betterment of others. From 1957 to the early 1960s Oxfam sent simple Christmas ‘appeal’ cards to its donors, featuring a simple ‘thank you’ message and photographs of individuals helped by the charity. A card from 1958 showed a huge-eyed little girl, sitting wrapped in a coat and woollen socks with a spoon stuck into a beaker of food. The caption read ‘This little Greek girl was found as a baby hungry and dying… Now she is properly fed… because Oxfam sends food, and years ago was able to plant black-currant bushes in her village which are now bearing fruit.’ This photograph does not simply broadcast the pain of strangers. It broadcasts hope, and promises resolution through charitable action. While a healthy scepticism and constructive interrogation of the conduct of international agencies is to be encouraged, we should be careful not to overlook and devalue the charitable efforts inspired by genuine ‘sympathy for strangers’.

[1] Alex de Waal, Famine Crimes: Politics and the Disaster Relief Industry in Africa (1997)

[2] Susan Sontag, On Photography (1973)

PASIG 2017: Ageing of Digital – Towards Managed Services for Digital Continuity

PASIG 2017 (Preservation and Archiving Special Interest Group) was hosted in Oxford this year at the Natural History Museum by Bodleian Libraries & Digital Preservation at Oxford and Cambridge (DPOC). I attended on all three days (11th -13th September), when I wasn’t working I had the opportunity to listen to some thought provoking talks centered around the issue of digital preservation.

One of the highlights of the conference for me, was a talk given by Natasa Milic-Frayling, the founder of Intact Digital. The presentation entitled  ‘Ageing of Digital: Towards Managed Services for Digital Continuity’ demonstrated the innovative ways in which digital preservation issues are being approached.

Digital technology has a short lifespan; hardware and software become redundant and obsolete in a very short time, essentially outdated. This is  known as ‘Legacy Software’, outdated software that no longer receives vendor support or updates.

This poses the problem – How can we manage the life-cycle of digital in the face of a dynamic and changing computing ecosystem?                                        

Technologies are routinely changed, updated (sometimes at a cost), made redundant and retired. The value of digital assets needs to be protected. In the current climate there is an imbalance of power between the technology producers and providers and the content producers, owners and curators. The providers and producers can move on without the opinion or input of those who use the software.

How do we enable prolonged use of software to protect value of digital assets?

A case study was presented that contextualised the problem and the solution. The vendor Tamal vista Insights provided Cut&Search, a software for automated and semi automated  indexing of digitised manuscripts and digital artefacts that standard OCR can not handle.
The software was supplied to Fo Guang Shan, an International Chinese Buddhist Monastic Order with over 200 branch temples worldwide for use with their digitised manuscript collection. This project is made up of thousands of volunteers and spans years, beyond the providers expected life-cycle for their product, its primary market life-time.
 Intact Digital provide a managed service that allows for digital continuity. There are several steps in the process which then provide a  number of options to software providers and the content producers:
  • Deposit
  • Hosting
  • Remote Access
  • Digital Continuity Assurance Plans

The software can be hosted in a virtual machine and accessed remotely via a browser. The implications of this are far reaching for projects like the ones undertaken by the Fo Guang Shan. They don’t need to worry about the Cut&Search software becoming redundant and their digital assets remain protected. For smaller organisations operating on ever decreasing budgets this is an important step both for asset protection and digital preservation.

Key areas to develop

Although this is an important step, there is still much work to do and some key areas that need to be developed were highlighted. This will result in a sustained use of digital.

  • Economy around “retired” software
  • Legal frameworks and sustainable business models
  • New practices to create demand
  • New services to make it efficient, economical and sustainable

Changes to the Ecosystem

In taking these steps and creating a dialogue between the technology producers/providers and the content producers it changes the dynamic of the ecosystem, readdressing the imbalance in control.

 

The talk ended with two very pertinent statements;

Together we can create new practices and
new models of extending the life of digital”
“Without digital continuity our digital content,
information and knowledge has no future”
As a trainee I still have lots to learn but a major theme running throughout digital archiving and digital preservation is the need for communication, collaboration and dialogue. Working together, sharing ideas and the challenges is key to securing the future of digital content.

 

A complete collection of the slides relating to this topic can be found here;  https://doi.org/10.6084/m9.figshare.5415040.v1  Milic-Frayling, Natasa (2017): Aging of digital: Towards managed services for digital continuity. figshare.

PASIG2017: Preserving Memory

 

The Oxford University Natural History Museum (photo by Roxana Popistasu, twitter)

This year’s PASIG conference, (Preservation and Archiving Special Interest Group) bought together an eclectic mix of individuals from around the world to discuss the very exciting and constantly evolving topic of digital preservation. Held at the Oxford University Natural History Museum, the conference aimed to connect practitioners from a variety of industries with a view to promoting conversation surrounding various digital preservation experiences, designs and best practices. The presentations given comprised a series of lightning talks, speeches and demos on a variety of themes including: the importance of standards, sustainability and copyright within digital preservation.

UNHCR: Archiving on the Edge

UNHCR Fieldworkers digitally preserving refugee records (photo by Natalie Harrower, twitter)

I was particularly moved by a talk given on the third day by Patricia Sleeman, an Archivist working for the UNHCR, a global organisation dedicated to saving lives, protecting rights and building a better future for refugees, forcibly displaced communities and stateless people.

Entitled “Keep your Eyes on the Information” Sleeman’s poignant and thought-provoking presentation discussed the challenges and difficulties faced when undertaking digital preservation in countries devastated by the violence and conflicts of war. Whilst recognising that digital preservation doesn’t immediately save lives in the way that food, water and aid can, Sleeman identified the place of digital preservation as having significant importance in the effort to retain, record and preserve the memory, identity and voice of a people which would otherwise be lost through the destruction and devastation of displacement, war and violence.

About the Archive

Sleeman and her team seek to capture a wide range of digital media including: you tube, websites and social media, each forming a precious snapshot of history, an antidote to the violent acts of mnemnocide- or the destruction of memory.

The digital preservation being undertaken is still in its early stages with focus being given to the creation of good quality captures and metadata. It is hoped in time however that detailed policies and formats will be developed to aid Sleeman in her digital preservation work.

One of the core challenges of this project has been handling highly sensitive material including refugee case files. The preservation of such delicate material has required Sleeman and her team to act slowly and with integrity, respecting the content of information at each stage.

For more information on the UNHCR  please click here.

 

PASIG 2017: Reflections on ‘Digital Preservation at the United Nations Mechanism for International Criminal Tribunals’

Along with my colleagues, I was incredibly grateful to be at Oxford PASIG 2017, hosted at the Oxford University Museum of Natural History from 11-13 September.

A presentation given by Angeline Takawira,  was affirmation indeed as to why advocacy for digital preservation is crucial worldwide.  Angeline gave us an insight into the aims and challenges of digital preservation at the United Nations Mechanism for International Criminal Tribunals (UN MICT).

The Mechanism

Angeline explained that the purpose of the UN MICT is to continue the mandated and essential actions that  have been carried out temporarily by two International Criminal Tribunals: Rwanda (ICTR) from 1993 until 2015 and Yugoslavia (ICTY) since 1994, which will be closing at the end of this year. UN MICT was established in 2010 by the UN Security Council, and is therefore a relatively new organisation. However, like its two predecessors, it is temporary.

We were told about the highly significant and mandated functions of MICT:

  1. To protect and support victims, witnesses and all others affected by war crimes
  2. To enforce sentences and other judicial work
  3. To preserve and manage the archives of the international tribunals.

You can find out more about the important work of the UN MICT here.

Digital Preservation at UN MICT

The Mechanism is made up of two branches: The Hague, Netherlands and Arusha, Tanzania, so the single digital repository is maintained across two continents. Currently the digital records of each of these are a hybrid of both digitised and born-digital material with example files including emails, GIS datasets, websites and CAD files. However, the audio-visual files take up 90% in volume of the digital archives combined.

It is so apparent that UN MICT’s  preservation goals are aligned to their aims as an organisation as a whole; authenticity is imperative for all of their records.  Angeline asserted that their digital preservation goals were to be trustworthy, accessible and useable and ‘demonstrably authentic’ – that is, identical to the digital original in all essential aspects. The digital archive is made up of:

  • Judicial case records – such as court decisions, judgements, court transcripts
  • Records relating to the judicial process – for example detentions of the accused and the protection of witnesses
  • Administrative records of the tribunals as an organisation (and also the Mechanism as an organisation).

Through a range of actions, the development of the digital preservation programme is achieving these aims. Angeline cited the introductions of workflows and compliance with standards, as well as the records being transferred to the repository with an unbroken chain of custody with stringent access controls and fixity checks to ensure no corruption. Furthermore, work continues on defining procedures around migration plans, as the Mechanism wishes to retain an experience of authenticity – which understandably needs a focus on file format characteristics.

Challenges

PASIG definitely taught me that authentic and usable digital preservation is always a trialling undertaking, but the challenges faced when digitally preserving the UN MICT are particularly unique due to its sensitive content and technicalities. For one, the fact that it is a temporary organisation is at odds with the long term endeavour of making these tribunal records accessible for the future and ensuring their protection. A repository transfer as a next step would need extremely critical consideration. Also, the retention schedule of different data is a factor for discussion – so that the UN MICT can fulfil its requirements of deletion in a transparent way.

One of the largest challenges to the future of digital preservation for similar organisations and initiatives, there is limited financial sustainability, resources and staff in order to sustain the long term commitment that digital preservation of records like this really command.

Use

There is no doubt that the digital archive of the UN MICT would be of fundamental significance to an international user community of the global media, legal professionals, academics, researchers and all education in general.  Combine these user groups with the broad range of stakeholders in preserving the Mechanism: the international courts, the security council who gave the mandated the work, there are many to whom this cause, and the information it preserves, will be vital to.  I have visited 4 countries of former Yugoslavia and the digital records of the MICT are surely equally  as compulsory to preserve and learn from as the  physical and tangible evidence of conflict. The need for advocacy of digital preservation is pertinent, and the UN MICT are doing urgent work.

Children’s Papers: Series 1 catalogue of Opie Archive now available

The cataloguing of the first series of the Opie Archive, which comprises children’s papers, as well as related correspondence from school teachers, has now been completed. The catalogue is available to search online here.

The material in the first 13 boxes spans most of the 1950s, during which time, Iona and Peter Opie were working on their book, The Lore and Language of Schoolchildren, which was published towards the end of 1959. They began by placing an advert in the Times Educational Supplement, seeking teachers willing to assist in their research. Those who responded, soon put the Opies in touch with further colleagues in other schools, until they had recruited a wide network of enthusiastic teachers across the country. In order to keep track of their dizzying number of correspondents, the Opies kept meticulous notes in a series of small address books, in which each contact was assigned a reference code. The material in the first 13 boxes is, therefore, arranged in order of the reference codes of those contacts who had sent in each batch of papers. The subsequent 20 boxes, following the publication of The Lore and Language, date mostly from 1960 onwards. From this point, the material is instead arranged alphabetically, by the area the material had come from – from Aberdeen to York.

The Opie address books, which hold the key to all their many correspondents

The papers, often accompanied by colourful illustrations, list the children’s favourite counting out and skipping rhymes, describe games such as ball games, chasing games and marbles, explain slang terms and expressions currently in use, recount the latest playground fads and crazes, and outline various traditions, superstitions and other playground lore that have been passed down to them. Some of the games described would make modern-day readers flinch, such as the popular game “Knifey”, which involves throwing a pocket knife to stick in the ground near the opponent’s leg. The children’s papers are usually prefaced by a note from their teacher, often apologising for spelling mistakes in their pupils’ work, and sometimes recalling their own childhood songs and games. The teachers’ insights are often particularly interesting, such as when one teacher observes that the few English-language songs and rhymes known to the children in their predominantly Welsh-speaking school in Ruthin, north Wales, appear to be the legacy left by children from Liverpool, who had been evacuated there during the war.

The series also includes a sub-section of material received from sources other than schools, such as from fellow researchers working in the same field as the Opies, or a collection of local rhymes and songs from across Scotland, gathered by the editors of the Aberdeen Press and Journal newspaper. This section also includes ten boxes of children’s essays submitted to the Camberwell Public Libraries Essay Competition, passed on to the Opies by Camberwell’s Chief Librarian. These competition entries provide a fascinating glimpse into the children’s thoughts and lives. The essays are very clearly rooted in their time, which is apparent not only through the 1950s and ’60s hairstyles and fashions, discernible in some of the charming, childish illustrations, but also in the children’s responses to essay topics such as “What I want to be when I leave school”, in which all the girls aspire to be nurses, dressmakers and typists, while their male counterparts seek to become firemen, policemen and train drivers. Other interesting responses were elicited by the 1955 essay title “A visit to the moon” – some children setting their stories firmly in the realm of fantasy, imagining being transported to the moon by fairies or goblins, while others wrote of rocket ships, but set their stories in the far distant year 3000, little imagining that the moon landing could become a reality in just over a decade’s time.

Shiny, new, archive boxes, all labelled up and barcoded!

To begin with, the bundles of papers were mostly still packaged in the same old, brown envelopes in which they had been stored by the Opies. Part of our task, in order to preserve the material long-term, was to remove all the harmful fasteners that could cause damage to the papers over time, such as rusty paperclips, pins and staples, as well as brittle, dried-up elastic bands. The papers could then be repackaged into standard, acid-free archive folders and boxes. In those instances where whole batches of papers had been folded or rolled up within their envelopes, the process of unfurling and flattening them to lie safely and neatly in their archive folders, was rather time-consuming.

Some of the rusty fasteners, removed from the Opie schools material

Our final task was foliation – which means physically numbering all the individual leaves (or “folios”) in each box, in pencil, so that the original order of the pages will never become muddled. The foliation process demanded sustained concentration, as it was all too easy to either miscount or accidently skip a page, especially given that the leaves in each bundle were all different sizes. Once such an error is discovered, all the subsequent numbers in the sequence are then, of course, likewise out of sync – a highly frustrating occurrence which we sought to avoid! In total, we numbered over 24 and a half thousand leaves across 46 boxes.

The Opie cataloguing project is generously funded by the Wellcome Trust. While the catalogue of this first series has now been completed, please note that work on the remaining Opie Archive is still ongoing, and sequences of the Opie Archive will continue to become temporarily unavailable whilst preservation, cataloguing and digitisation work is being carried out. We will try to accommodate urgent researchers’ requests for access wherever possible, however, if you need to consult material from the Opie Archive before June 2018, please do ensure that you contact us with as much advance notice as possible, so that we can advise on the availability of the material in question and make any necessary arrangements.

Supported by the Wellcome Trust

Oxfam archive inspires potential University of Oxford students

Nineteen year-12 students recently attended a seminar in the Weston Library’s impressive Bahari Room as part of a summer school organised by Wadham College.

The programme allows students from schools with low application/entry rates into higher education to experience university life through a four-day residential. During the visit, students attended lectures, seminars and tutorials, giving them a taste of what it is like to be an undergraduate at the University of Oxford.

The theme for this year was ‘The Politics of Immigration’ and in the seminar, students had the chance to handle a selection of material taken from the Oxfam archive. They were then asked to discuss the representation of Palestinian refugees in the archival documents dating from the 1960s. The material used was taken from the Communications section of the archive – i.e. records of Oxfam’s external communication with the public – and is just a very small example of the material available to the public in the extensive Oxfam archive (the Communications catalogue is online here).

An example of some of the material that the students were using from the Communications section of the Oxfam archive.

Though initially hesitant, we were pleased when two eager students volunteered to open up the archival boxes and find the files that were needed. After being carefully handled by our volunteers, all the files were laid out for the students to analyse in groups.

Dr. Tom Sinclair and a student unpacking an archival box.

The students then took it in turns to give examples of how Palestinian refugees were represented in the Oxfam material. One of the excellent examples that students spotted was how Oxfam was able to remain politically neutral (a constitutional necessity for charities) by not specifying why the refugees were displaced. Students also remarked that Oxfam preferred to focus on individual stories in their communications – for instance, that of a displaced teenager with aspirations to be an engineer – which the students suggested helped humanise a crisis that could be difficult for the public to comprehend.

The students studied selected material from the Oxfam archive and gave examples of how Palestinian refugees were represented.

Overall, the ‘Politics of Immigration’ seminar was a great success that gave the students a good feel for what it would be like to use the archives to complete research for a dissertation or other academic project.

Dr Tom Sinclair, who organised the summer school, said: “It was such a privilege to be in that lovely room and have such free access to the archives… I really think that a couple of the students were inspired, and I hope they’ll be future Oxford undergraduates visiting the archives again in a few years’ time.”

Bountiful Harvest: Curation, Collection and Use of Web Archives

The theme for the ARA Annual Conference 2017 is: ‘Challenge the Past, Set the Agenda’. I was fortunate enough to attend a pre-conference workshop in Manchester, ran by Lori Donovan and Maria Praetzellis from The Internet Archive, about the bountiful harvest that is web content, and the technology, tools and features that enable web archivists to overcome the challenges it presents.

Part I – Collections, Community and Challenges

Lori gave us an insight into the use cases of Archive-it partner organisations to show us the breadth of reasons why other institutions archive the web. The creation of a web collection can be for one of (or indeed, all) the following reasons:

  • To maintain institutional history
  • To document social commentary and the perspectives of users
  • To capture spontaneous events
  • To augment physical holdings
  • Responsibility: Some documents are ONLY digital. For example, if a repository upholds a role to maintain all published records, a website can be moved into the realm of publication material.

When asked about duplication amongst web archives, and whether it was a problem if two different organisations archive the same web content, Lori put forward the argument that duplication is not worrisome. The more captures of a website is good for long term preservation in general – in some cases organisations can work together on collaborative collecting if the collection scope is appropriate.

Ultimately, the priority of crawling and capturing a site is to recreate the same experience a user would have if they were to visit the live site on the day it was archived. Combining this with an appropriate archive frequency  means that change over time can also be preserved. This is hugely important: the ephemeral nature of internet content is widely attested to. Thankfully, the misconception that ‘online content will be around forever’ is being confronted. Lori put forward some examples to illustrate the point for why the archiving of websites is crucial.

In general, a typical website lasts 90-100 days before one of the following happens:

  1. The content changes
  2. The site URL moves
  3. The content disappears completely

A study was carried out on the Occupy Movement sites archived in 2012. Of 582 archived sites, only 41% were still live on the web as of April 2014. (Lori Donovan)

Furthermore, we were told about a 2014 study which concluded that 70% of scholarly articles online with text citations suffered from reference rot over time. This speaks volumes about preserving copies in order for both authentication and academic integrity.

The challenge continues…

Lori also pointed us to the NDSA 2016/2017 survey which outlines the principle concerns within web archiving currently: Social media, (70%); Video, (69%) and Interactive media and Databases, (both 62%).  Any dynamic content can be difficult to capture and curate, therefore sharing advice  and guidelines amongst leaders in the web archiving community is a key factor in determining successful practice for both current web archivists, and those of future generations.

Part II – Current and Future Agenda

Maria then talked us through some key tools and features which enable greater crawling technology, higher quality captures and the preservation of web archives for access and use:

  • Brozzler. Definitely my new favourite portmanteau (browser + crawler = brozzler!), brozzler is the newly developed crawler by The Internet Archive which is replacing the combination of heritrix and umbra crawlers. Brozzler captures http traffic as it is loaded, works with YouTube in order to improve media capture and the data will be immediately written and saved as a WARC file. Also, brozzler uses a real browser to fetch pages, which enables it to capture embedded urls and extract links.
  • WARC. A Web ARChive file format is the ISO standard for web archives. It is a concatenated file written by a crawler, with long term storage and preservation specifically in mind. However, Maria pointed out to us that WARC files are not constructed to easily enable research (more on this below.).
  • Elasticsearch. The full-text search system does not just search the html content displayed on the web pages, it searches PDF, Word and other text-based documents.
  • solr. A metadata-only search tool. Metadata can be added on Archive-it at collection, seed and document level.

Supporting researchers now and in the future

The tangible experience and use of web archives where a site can be navigated as if it was live can shed so much light on the political and social climate of its time of capture. Yet, Maria explained that the raw captured data, rather than just the replay, is obviously a rich area for potential research and, if handled correctly, is an inappropriable research tool.

As well as the use of Brozzler as a new crawling technology, Archive-it research services offer a set of derivative data-set files which are less complex than WARC and allow for data analysis and research. One of these derivative data sets is a Longitudinal Graph Analysis (LGA) dataset file which will allow the researcher to analyse the trend in links between urls over time within an entire web collection.

Maria acknowledged that there are lessons  to be learnt when supporting researchers using web archives, including technical proficiency training and reference resources. The typology of the researchers who use web archives is ever growing: social and political scientists, digital humanities disciplines, computer science and documentary and evidence based research including legal discovery.

What Lori and Maria both made clear throughout the workshop was that the development and growth of web archiving is integral to challenging the past and preserving access on a long term scale. I really appreciated an insight into how the life cycle of web archiving is a continual process, from creating a collection, through to research services, whilst simultaneously managing the workflow of curation.

When in Manchester…

Virtual Archive, Central Library, Manchester

I  couldn’t leave  Manchester without exploring the John Rylands Library and Manchester’s Central Library. In the latter, this interactive digital representation of a physical archive combined choosing a box from how a physical archive may be arranged, and then projected the digitised content onto the screen once selected. A few streets away in Deansgate I had just enough time in John Rylands to learn that the fear of beards is called Pogonophobia. Go and visit yourself to learn more!

Special collections reading room, John Rylands Library, Manchester

PDF/A: Challenges Meeting the ISO 19005 Standard

Anna Oates (MSLIS Candidate, University of Illinois at Urbana-Champaign and NDNP Coordinator Graduate Assistant, Preservation Services) explaining the differences between PDF and PDF/A

We were excited to attend the recent project presentation entitled: ‘A Case Study on Theses in Oxford’s Institutional Repository: Challenges Meeting the ISO 19005 Standard’ given by Anna Oates, a student involved in the Oxford-Illinois Digital Libraries Placement Programme.

The presentation focused initially on the PDF/A format: PDF/A differs from standard PDF in that it avoids common long term access issues associated with PDF. For example, a PDF created today may look and behave differently in 50 years time. This is because many visual aspects of the PDF are not saved into the file itself, (PDFs use font linking instead of font embedding) the standardised PDF/A format attempts to remedy this by embedding  metadata within the file and restricting certain aspects commonly found in PDF which could inhibit long term preservation.

Aspects excluded from PDF/A include :

  • Audio and video content
  • JavaScript executable files
  • All forms of PDF encryption

PDF/A is better suited therefore for the long term preservation of digital material as it maintains the integrity of the information included in the source files, be this textual or visual. Oates described PDF/A as having multiple ‘flavours’, PDF/A-1 published in 2005 including conformance level A (Accessible – maintains the structure of the file) and B (Basic – maintains the visual aspects only). Versions 2 and 3 published later in 2011 and 2012, were developed to encompass conformance level U (Unicode – enabling the embedding of Unicode information) alongside other features such as JPEG 2000 compression and the embedding of arbitrary file formats within PDF/A documents.

Oates specified that different types of documents benefited from different ‘flavours’ of PDF/A, for example, digitised documents were better suited to conformance level B whereas born digital documents were better suited to level A.

Whilst specifying the benefits of PDF/A, Oates also highlighted the myriad of issues associated with the format.  Firstly, while experimenting with creating and conforming PDF/A documents, she noted the conformed documents had slight differences, such as changes to the colour pixels of embedded image files (PDF/A format showed less difference in the colour of pixels with programs like PDF Studio), this showcased a clear alteration of the authenticity of the original source file.

Oates compared source images to PDF/A converted images and found obvious visual differences.

Secondly,  Oates noted that when converting files from PDF to PDF/A-1b, smart software would change the decode filter of the image (e.g. changing from JPXDecode used for JPEG2000 to DCTDecode accepted by ISO 19005) in order to ensure it would conform to ISO 19005. However, she noted that despite the positives of avoiding non-conformance the software had increased the file size of the PDF by 65%. The file size increase poses obvious issues in regards to storage and cost considerations for organisations using PDF/A.

Oates’ workflow for creation and conformance checking of PDF/A files using different PDF/A software

Format uptake was also discussed by Oates. She found that PDF/A had not been widely utilised by Universities for long term preservation of dissertations and thesis in the UK. However, Oates provided examples of users of PDF/A for Electronic Theses and Dissertations Repositories that included: Concordia University, Johns Hopkins University, McGill University, Rutgers University, University of Alberta, University of Oulu and Virginia Tech.  Alongside this it was mentioned that uptake amongst Research and Cultural Heritage Institutions included: the Archaeology Data Service (ADS), British Library, California Digital Library, Data Archiving and Networked Services (DANS), the Library of Congress and the U.S. National Archives and Records Administration (NARA).

“Adobe Preflight has failed to recognize most of the glyph errors. As such, veraPDF will remain our final tool for validation.” (Anna Oates)

Oates therefore concluded that PDF/A was not the best solution to PDF preservation, she mentioned that the new ISO standard would cause new issues and considerations for PDF/A users.

Following the presentation the audience debated whether PDF/A should still be used. Some considered whether other solutions existed to PDF preservation; an example of a proposed solution was to keep both PDF/A and the original PDFs. However, many still felt that PDF/A provided the best solution available despite its various drawbacks.

Hopefully Oates’  findings will highlight the various areas needed for improvement in both PDF/A  conversion/ validation software and conformance aspects of the ISO 19005 Standard used by PDF/A to ensure it is up to the task of digital preservation.

To learn  more about PDF/A have a look at Adobe’s own e-book PDF/A In a Nutshell.

Alice, Ben and Iram (Trainee Digital Archivists)