Tag Archives: conference

Conference Report: Archives and Records Association Annual Conference 2021

The Archives and Records Association (ARA) Annual Conference 2021 was held 1st–3rd September 2021. In this blog post, Rachael Marsay reports on some of the highlights of the conference, held entirely online this year for the first time.


Logo for the Archive and Records Association 2021 Virtual Conference

There were three themes to this year’s conference: sustainability, diversity, and advocacy. Though each day of the conference covered one theme, one of the stand-outs of the conference was just how interlinked all three strands were.

Day one’s keynote speaker was Jeff James, Chief Executive and Keeper at The National Archives. Jeff talked about environmental sustainability, as well as the sustainability of the record and of the archives sector. He mentioned how The National Archives at Kew are committed to lowering their carbon footprint, which has been reduced by 80% since 2009. This has been achieved by building on scientific research with regards to buildings, bringing both a financial and environmental benefit. He also spoke of records at risk, referring to the work of the Cultural Recovery Fund, the Covid-19 Archives Fund for records at risk and the Crisis Management Team alongside already established fund streams such as the Archives Revealed grant scheme. Digital records were flagged as records at risk and he stressed the need for the sector to work in partnership and collaboration, both together and with digital giants (such as Microsoft and Google) with regards to developing digital products. Sector skills include the need for records professionals to gain digital skills through schemes and strategies such as Plugged In Powered Up, the Novice to Know-How online training resource created by the Digital Preservation Coalition, the Digital Archives Learning Exchange, and the Bridging the Gap traineeship programme.

The fragility of born-digital records, identified as critically endangered by the Digital Preservation Coalition, was a common theme throughout the conference. Even the most modern of records are at risk (CD-Rs for example, have a lifespan of under 10 years). Particular digital records discussed related to oral history interviews, often seen as ‘history from below’, recording the lives of those with ‘hidden histories’ off mainstream records, such as women and members of the LGBTQ+ community. Challenges to preserve digital material include cost, knowledge, skills and training, technology, and resources, as well as issues surrounding ‘gatekeeping’ and access to material. Rachel MacGregor (Digital Preservation Officer at The Modern Records Centre, University of Warwick) emphasised the need to record, describe, and catalogue born digital collections well in order to ensure that that they can be utilised by researchers, and explored some of the standards and guidance currently available.

Day two’s keynote speaker was Arike Oke (Managing Director, Black Cultural Archives) who spoke about experiences with diversity, aptly described as the equitable and mindful bringing together of difference; diversity should not be seen as static, but as a perpetual movement, both including and evolving difference. In her talk, Arike raised the point of classifying and being classified, and several sessions across the three days referred to how language and terminology impacted the use of records or archives created by or for particular communities. The use of historic terminology can be a barrier to access, particularly when words hold negative connotations that can cause distress to users. This was explored in several sessions in relation to LGBTQ+ related records and archives (including those kept at the Parliamentary Archives of the UK Parliament), as well as colonial collections such as the Miscellaneous Reports Collection held by the Royal Botanic Gardens in Kew. Thoughts on how to address the issues included guides or notes explaining the context and why such words were used, including modern terms or names in brackets, inviting feedback, and for events, giving participants time and space to process information.

The importance of being open to keeping more ephemeral material and objects (e.g. pin badges, leaflets and posters) was also highlighted, particularly in shedding light on lives not necessarily recorded in more traditional forms. Christopher Hilton of Britten Pears Arts gave an interesting presentation on the multitude of receipts kept by Benjamin Britten and his partner Peter Pears for tax purposes. The receipts were important in shedding light on their relationship by providing evidence that they maintained clearly separate financial lives, demonstrating how important it was for their professional lives at that period that their records could be used to demonstrate a ‘plausible deniability’ should their personal relationship be questioned. The receipts were also records of businesses in Aldeburgh which are now long gone, provoking memories for older residents and providing a tangible link between the archive and the town.

Day three’s keynote speaker was Deirdre McParland, Senior Archivist at the Electricity Supply Board (Ireland) whose inspirational talk focussed on the importance of advocacy and that ‘archives are for life, not just anniversaries’. Deirdre spoke of how archives should be pro-active and innovative when it comes to advocacy, and that projects should be strategically planned to include promotion as standard. Deirdre’s talk was followed by a talk by Jenny Moran and Robin Jenkins from the Record Office for Leicestershire, Leicester and Rutland, and Richard Wiltshire of the Crisis Management Team. Jenny, Robin and Richard talked about saving the archive of the travel firm Thomas Cook after the company’s sudden collapse: an excellent example of how swift action, negotiation and successful advocacy led to the ensured survival of the archive. The conference was nicely brought to a close by a talk by Alan and Bethan Ward on their project Photographs from Another Place. Their talk, given from the perspective of the archive user, showed how a bit of archival research revealed the names and stories behind a group of forgotten and unlabelled glass plate negatives. It was, for me at least, a timely reminder of the enduring value of archives.


A selection of further reading recommendations made by speakers and participants:

 

Conference Report: IIPC Web Archiving Conference 2021

This year’s International Internet Preservation Consortium Web Archiving Conference was held online from 15-16th June 2021, bringing together professionals from around the world to share their experiences of preserving the Web as a research tool for future generations. In this blog post, Simon Mackley reports back on some of the highlights from the conference.  

How can we best preserve the World Wide Web for future researchers, and how can we best provide access to our collections? These were the questions that were at the forefront of this year’s International Internet Preservation Consortium Web Archiving Conference, which was hosted virtually by the National Library of Luxembourg. Web archiving is a subject of particular interest to me: as one of the Bodleian Library’s Graduate Trainee Digital Archivists, I spend a lot of my time working on our own Web collections as part of the Bodleian Libraries Web Archive. It was great therefore to have the chance to attend part of this virtual conference and hear for myself about new developments in the sector.

One thing that really struck me from the conference was the huge diversity in approaches to preserving the Web. On the one hand, many of the papers concerned large-scale efforts by national legal deposit institutions. For instance, Ivo Branco, Ricardo Basílio, and Daniel Gomes gave a very interesting presentation on the creation of the 2019 European Parliamentary Elections collection at the Portuguese Web Archive. This was a highly ambitious project, with the aim of crawling not just the Portuguese Web domain but also capturing a snapshot of elections coverage across 24 different European languages through the use of an automated search engine and a range of web crawler technologies (see their blog for more details). The World Wide Web is perhaps the ultimate example of an international information resource, so it is brilliant to see web archiving initiatives take a similarly international approach.

At the other end of the scale, Hélène Brousseau gave a fascinating paper on community-based web archiving at Artexte library and research centre, Canada. Within the arts community, websites often function as digital publications analogous to traditional exhibition catalogues. Brousseau emphasised the need for manual web archiving rather than automated crawling as a means of capturing the full content and functionality of these digital publications, and at Artexete this has been achieved by training website creators to self-archive their own websites using Conifer. Given that in many cases web archivists often have minimal or even no contact with website creators, it was fascinating to hear of an approach that places creators at the very heart of the process.

It was also really interesting to hear about the innovative new ways that web archives were engaging with researchers using their collections, particularly in the use of new ‘Labs’-style approaches. Marie Carlin and Dorothée Benhamou-Suesser for instance reported on the new services being planned for researchers at the Bibliothèque nationale de France Data Lab, including a crawl-on-demand service and the provision of web archive datasets. New methodologies are always being developed within the Digital Humanities, and so it is vitally important that web archives are able to meet the evolving needs of researchers.

Like all good conferences, the papers and discussions did not solely focus on the successes of the past year, but also explored the continued challenges of web archiving and how they can be addressed. Web archiving is often a resource-intensive activity, which can prove a significant challenge for collecting institutions. This was a major point of discussion in the panel session on web archiving the coronavirus pandemic, as institutions had to balance the urgency of quickly capturing web content during a fast-evolving crisis against the need to manage resources for the longer-term, as it became apparent that the pandemic would last months rather than weeks. It was clear from the speakers that no two institutions had approached documenting the pandemic in quite the same way, but nonetheless some very useful general lessons were drawn from the experiences, particularly about the need to clearly define collection scope and goals at the start of any collecting project dealing with rapidly changing events.

The question of access presents an even greater challenge. We ultimately work to preserve the Web so that researchers can make use of it, but as a sector we face significant barriers in delivering this goal. The larger legal deposit collections, for instance, can often only be consulted in the physical reading rooms of their collecting libraries. In his opening address to the conference, Claude D. Conter of the National Library of Luxembourg addressed this problem head-on, calling for copyright reform in order to meet reader expectations of access.

Yet although these challenges may be significant, I have no doubt from the range of new and innovative approaches showcased at this conference that the web archiving sector will be able to overcome them. I am delighted to have had the chance to attend the conference, and I cannot wait to see how some of the projects presented continue to develop in the years to come.

Simon Mackley

UK Web Archive mini-conference 2020

On Wednesday 19th November I attended the UK Web Archive (UKWA) mini-conference 2020, my first conference as a Graduate Trainee Digital Archivist. It was hosted by Jason Webber, Engagement Manager at the UKWA and, as normal in these COVID times, it was hosted on Zoom (my first ever Zoom experience!)

The conference started with an introduction and demonstration of the UKWA by Jason Webber. Starting in 2005 the UKWA’s mission is to collect the entire UK webspace, at least once per year, and preserve the websites for future generations. As part of my traineeship I have used the UKWA but it was interesting to hear about the other functions and collections it provides. Along with being able to browse different versions of UK websites it also includes over 100 curated collections on themes ranging from Food to Brexit to Online Enthusiast Communities in the UK. It also features the SHINE tool, which was developed as part of the ‘Big UK Data Arts and Humanities’ project and contains over 3.5 billion items which have been full-text indexed so that every word is searchable. It allows users to perform searches and trend analysis on subjects over a huge range of websites, all you need to use this tool is a bit a Python knowledge. My Python knowledge is a bit basic but Caio Mello, during his researcher talk, provided a useful link for online python tutorials aimed at historians to aid in their research.

In his talk, Caio Mello (School of Advanced Study, University of London) discussed how he used the SHINE tool as part of his work for the CLEOPATRA Project. He was specifically looking at the Olympic legacy of the 2012 Olympics, how it was defined and how the view of the legacy changed over time. He explained the process he used to extract the information and the ways the information can be used for analysis, visualisation and context. My background is in mathematics and the concept of ‘Big Data’ came up frequently during my studies so it was fascinating to see how it can be used in a research project and how the UKWA is enabling research to be conducted over such a wide range of subjects.

The next researcher talk by Liam Markey (University of Liverpool and the British Library) showed a different approach to using the UKWA for his research project into how Remembrance in 20th Century Britain has changed. He explained how he conducted an analysis of archived newspaper articles, using specific search terms, to identify articles that focused on commemoration which he could then use to examine how the attitudes changed over time. The UKWA enabled him to find websites that focused on the war and compare these with mainstream newspapers to see how these differ.

The Keynote speaker was Paul Gooding (University of Glasgow) and was about the use and users of Non-Print Legal Deposit Libraries. His research as part of the Digital Library Futures Project, with the Bodleian Libraries and Cambridge University Library as case study partners, looked at how Academic Deposit libraries were impacted by e-Legal Deposit. It was an interesting discussion around some of the issues of the system, such as balancing the commercial rights with access for users and how highly restrictive access conditions are at odds with more recent legislation, such as the provision for disabled users and 2014 copyright exception for data and text mining for non-commercial uses.

Being new to the digital archiving world, my first conference was a great introduction to web archiving and provided context to the work I am doing. Thank you to the organisers and speakers for giving me insight into a few of the different ways the web archive is used and I have come away with a greater understanding of the scope and importance of digital archiving (as well as a list of blog posts and tutorials to delve into!)

Some Useful Links:

https://www.webarchive.org.uk/

https://programminghistorian.org/

https://blogs.bl.uk/webarchive/2020/11/how-remembrance-day-has-changed.html

http://cleopatra-project.eu/

 

#WeMissiPRES: Preserving social media and boiling 1.04 x 10^16 kettles

This year the annual iPRES digital preservation conference was understandably postponed and in its place the community hosted a 3-day Zoom conference called #WeMissiPRES. As two of the Bodleian Libraries’ Graduate Trainee Digital Archivists, Simon and I were in attendance and blogged about our experiences. This post contains some of my highlights.

The conference kicked off with a keynote by Geert Lovink. Geert is the founding director of the Institute of Network Cultures and the author of several books on critical Internet studies. His talk was wide-ranging and covered topics from the rise of so-called ‘Zoom fatigue’ (I guarantee you know this feeling by now) to how social media platforms affect all aspects of contemporary life, often in negative ways. Geert highlighted the importance of preserving social media in order to allow future generations to be able to understand the present historical moment. However, this is a complicated area of digital preservation because archiving social media presents a host of ethical and technical challenges. For instance, how do we accurately capture the experience of using social media when the content displayed to you is largely dictated by an algorithm that is not made public for us to replicate?

After the keynote I attended a series of talks about the ARCHIVER project. João Fernandes from CERN explained that the goal of this project is to improve archiving and digital preservation services for scientific and research data. Preservation solutions for this type of data need to be cost-effective, scalable, and capable of ingesting amounts of data within the petabyte range. There were several further talks from companies who are submitting to the design phase of this project, including Matthew Addis from Arkivum. Matthew’s talk focused on the ways that digital preservation can be conducted on the industrial scale required to meet the brief and explained that Arkivum is collaborating with Google to achieve this, because Google’s cloud infrastructure can be leveraged for petabyte-scale storage. He also noted that while the marriage of preserved content with robust metadata is important in any digital preservation context, it is essential for repositories dealing with very complex scientific data.

In the afternoon I attended a range of talks that addressed new standards and technologies in digital preservation. Linas Cepinskas (Data Archiving and Networked Services (DANS)) spoke about a self-assessment tool for the FAIR principles, which is designed to assess whether data is Findable, Accessible, Interoperable and Reusable. Later, Barbara Sierman (DigitalPreservation.nl) and Ingrid Dillo (DANS) spoke about TRUST, a new set of guiding principles that are designed to map well with FAIR and assess the reliability of data repositories. Antonio Guillermo Martinez (LIBNOVA) gave a talk about his research into Artificial Intelligence and machine learning applied to digital preservation. Through case studies, he identified that AI is especially good at tasks such as anomaly detection and automatic metadata generation. However, he found that regardless of how well the AI performs, it needs to generate better explanations for its decisions, because it’s hard for human beings to build trust in automated decisions that we find opaque.

Paul Stokes from Jisc3C gave a talk on calculating the carbon costs of digital curation and unfortunately concluded that not much research has been done in this area. The need to improve the environmental sustainability of all human activity could not be more pressing and digital preservation is no exception, as approximately 3% of the world’s electricity is used by data centres. Paul also offered the statistic that enough power is consumed by data centres worldwide to boil 10,400,000,000,000,000 kettles – which is the most important digital preservation metric I can think of.

This conference was challenging and eye-opening because it gave me an insight into (complicated!) areas of digital preservation that I was not familiar with, particularly surrounding the challenges of preserving large quantities of scientific and research data. I’m very grateful to the speakers for sharing their research and to the organisers, who did a fantastic job of bringing the community together to bridge the gap between 2019 and 2021!

#WeMissiPRES: A Bridge from 2019 to 2021

Every year, the international digital preservation community meets for the iPRES conference, an opportunity for practitioners to exchange knowledge and showcase the latest developments in the field. With the 2020 conference unable to take place due to the global pandemic, digital preservation professionals instead gathered online for #WeMissiPRES to ensure that the global community remained connected. Our graduate trainee digital archivist Simon Mackley attended the first day of the event; in this blog post he reflects on some of the highlights of the talks and what they tell us about the state of the field.

How do you keep the global digital preservation community connected when international conferences are not possible? This was the challenge faced by the organisers of #WeMissIPres, a three-day online event hosted by the Digital Preservation Coalition. Conceived as a festival of digital preservation, the aim was not to try and replicate the regular iPRES conference in an online format, but instead to serve as a bridge for the digital preservation community, connecting the efforts of 2019 with the plans for 2021.

As might be expected, the impact of the pandemic loomed large in many of the talks. Caylin Smith (Cambridge University Library) and Sara Day Thomson (University of Edinburgh) for instance gave a fascinating paper on the challenge of rapidly collecting institutional responses to coronavirus, focusing on the development of new workflows and streamlined processes. The difficulties of working from home, the requirements of remote access to resources, and the need to move training online likewise proved to be recurrent themes throughout the day. As someone whose own experience of digital preservation has been heavily shaped by the pandemic (I began my traineeship at the start of lockdown!) it was really useful to hear how colleagues in other institutions have risen to these challenges.

I was also struck by the different ways in which responses to the crisis have strengthened digital preservation efforts. Lynn Bruce and Eve Wright (National Records of Scotland) noted for instance that the experience of the pandemic has led to increased appreciation of the value of web-archiving from stakeholders, as the need to capture rapidly-changing content has become more apparent. Similarly, Natalie Harrower (Digital Repository of Ireland) made the excellent point that the crisis had not only highlighted the urgent need for the sharing of medical research data, but also the need to preserve it: Coronavirus data may one day prove essential to fighting a future pandemic, and so there is therefore a moral imperative for us to ensure that it is preserved.

As our keynote speaker Geert Lovink (Institute of Network Cultures) reminded us, the events of the past year have been momentous quite apart from the pandemic, with issues such as the distorting impacts of social media on society, the climate emergency, and global demands for racial justice all having risen to the forefront of society. It was great therefore to see the role of digital preservation in these challenges being addressed in many of the panel sessions. A personal highlight for me was the presentation by Daniel Steinmeier (KB National Library of the Netherlands) on diversity and digital preservation. Steinmeier stressed that in order for diversity efforts to be successful, institutions needed to commit to continuing programmes of inclusion rather than one-off actions, with the communities concerned actively included in the archiving process.

So what challenges can we expect from the year ahead? Perhaps more than ever, this year this has been a difficult question to answer. Nonetheless, a key theme that struck me from many of the discussions was that the growing challenge of archiving social media platforms was matched only by the increasing need to preserve the content hosted on them. As Zefi Kavvadia (International Institute of Social History) noted, many social media platforms actively resist archiving; even when preservation is possible, curators are faced with a dilemma between capturing user experiences and capturing platform data. Navigating this challenge will surely be a major priority for the profession going forward.

While perhaps no substitute for meeting in person, #WeMissiPRES nonetheless succeeded in bringing the international digital preservation community together in a shared celebration of the progress being made in the field, successfully bridging the gap between 2019 and 2021, and laying the foundations for next year’s conference.

 

#WeMissiPRES was held online from 22nd-24th September 2020. For more information, and for recordings of the talks and panel sessions, see the event page on the DPC website.

WARC Files and Blue Lagoons: The IIPC Web Archiving Conference, 13-15 April 2016 in Reykjavik

The International Internet Preservation Consortium (IIPC) is the leading international organisation dedicated to improving the tools, standards and best practices of web archiving, promoting international collaboration and the broad access and use of web archives for research and as cultural heritage.

logoThis year, for the first time the IIPC’s annual General Assembly in Reykjavik was accompanied by a three-day conference, bringing together web archivists, curators, IT specialists and researchers to discuss challenges related to acquiring, preserving, making available and using web archives.  With over 150 participants, including leading experts – most prominently the internet pioneer Vint Cerf – the conference provided a unique opportunity to learn about web archiving strategies and projects around the world, and to keep up to date with emerging trends in research and latest technological developments.

Vint Cerf, Avoiding a Digital Dark Age

Vint Cerf, Avoiding a Digital Dark Age

The first day, after a warm welcome by Ingibjörk Sverrisdottir, Iceland’s National Librarian, was dedicated to the ‘big questions’ of web archiving: What’s worth saving? (Hjalmar Gislason) and how to avoid a Digital Dark Age? (Vint Cerf). How might new services look like, which tools and strategies for preservation are available (Emulation!), or being developed? Or, in the words of Brewster Kahle, founder of the Internet Archive: ’20 years of Web Archiving – What do we do now?’ (video of his talk introducing the ‘National Library of Atlantis’ prototype for integrated web archive discovery)

Brewster Kahle, What Do We Do Now?

Brewster Kahle, What Do We Do Now?

On the second day, the conference continued with two separate tracks, discussing either policies, practices and strategies for capture and preservation of web material, or looking more at the user side of web archives, and at how web archive data be accessed, searched, analysed and visualised as a resource for research.
The third day was the hands-on day with workshops exploring search interfaces such as the SHINE interface developed at the British Library for the UK Web Archive,  DIY web archiving tools such as webrecorder.io, the open-source platform Warcbase for analysing web archive data, and discussing the future of the WARC archive format.

There was plenty of time for Q&A and discussions between and after the talks and presentations, and open, friendly atmosphere of the conference encouraged informal conversations with web archiving colleagues and networking during coffee and lunch breaks, and on visits like the tour of the National and University Library of Iceland.

The National and University Library of Iceland

The National and University Library of Iceland

Once again it became clear that web archiving practice is at the same time extremely diverse and depending on joint efforts and collaborations:
For example, the priorities in curating a relatively small collection of Electronic Literature at the German Literary Archive Marbach are very different from these in capturing and preserving the .EU domain at the Portuguese National Foundation for Scientific Computing FCCN, owing the scope, size and structure of the collections, and the resources available to build and maintain them. Similarly, quality assurance policies and workflows differ considerably between national domain scale archives, such as the Legal Deposit UK Web Archive containing millions of websites, and specialized archives curated and captured by university libraries like the North Carolina State University. Researchers approach the UK Government Web Archive with different research questions than those they would use to look at archived Twitter data.

But no matter the size and scope of the web archive, the resources available at a web archiving institution, or the focus of a particular project, the underlying challenges are very similar:

  • How do we decide what to capture?
  • How to capture it?
  • How to preserve it for the future?
  • Metadata?
  • How to provide access and facilitate discovery?
  • How to use web archives for research?

Working collaboratively and across disciplines, including perspectives from archivists, curators, IT engineers and researchers seems to be the best way forward, and the practice of sharing knowledge and experience, and to openly discuss problems gets certainly embraced by the web archiving community. A particular project might have ‘failed’ in terms of achieving the intended outcome, but it can still provide valuable lessons for the next project elsewhere, and in the long run, for developing best practice, policies and standards for web archiving as a discipline.

Mistakes are only wrong if you - and others - don't learn from them!

Mistakes are only wrong if you – and others – don’t learn from them!

Curators might be slightly overwhelmed by technical details discussed by web crawl engineers (I certainly was!) and ‘the IT guys (and girls)’ might sometimes be confused by the curatorial way of thinking; web archiving cultures in North America seem to differ considerably from the approaches in Europe, where Legal Deposit regulations have a strong impact on collection strategies and access to archives. STEM researchers look at data in different ways than historians and social scientists.
International conferences like the IIPC Web Archiving Conference 2016 are invaluable for bringing together these different perspectives, for fostering discussion and knowledge sharing and for providing an opportunity to establish new and strengthening existing contacts with web archiving colleagues in archives, (university) libraries and research institutions worldwide.

Archiving social media...

Harvesting social media: Overview…

 

...the details.

…and details.

Web archivists love to produce new social media content:
The conference seen through the participants’ Tweets: #iipcwac16.
(Now we just have to archive that!
)

Not least, the Reykjavik conference provided a rare opportunity to meet web archiving colleagues from other UK Legal Deposit Libraries outside the usual committees and institutional settings. One of the conference lunch breaks was turned into an ad-hoc UK Legal Deposit Web Archive meeting, discussing user interface redevelopment – and where else but in Iceland can you have a Friday late afternoon conference debrief whilst soaking in a giant outdoor geothermal bathtub (aka the Blue Lagoon)?

UK web archivists after conference debrief

Some very clean UK web archivists after the conference debrief

 

 

Catching butterflies

Archival Uncertainties: International Conference on Literary Archives at the British Library – 4 April 2016

This one-day conference focused on digital humanities, with papers from a spectrum of interested parties including academics working on digitisation projects, authors, translators, archivists and curators. I attended three panels on the day and the unifying theme was a contrary message of dispersal and amalgamation (and butterflies).

The first thing that has been dispersed or discarded is any idea of a literary canon. As plenary speaker and archivist Catherine Hobbs pointed out, scholarship now focuses less on established set texts and more on themes like “environmental literature”. Over the past few decades, in response to this, archives have collected more non-traditionally canonical literary papers but, Catherine reminded us, as archivists we can’t stop paying attention to the ways that literature continues to change. We need to keep tabs on what is going on in the literary world in order to document it, and this will include tackling new forms of experimental, avant-garde and self-published writing.

Caterpillar: Schwalbenschwanz (Raupe)

Caterpillars and collection development [By Eric Steinert – photo taken by Eric Steinert at Paussac, France, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=338409]

As Catherine noted, it used to be easy to find the avant-garde – pretty much whoever was hanging out on the Left Bank – but now it’s up to archivists to not only collect this material, but to track it down in the first place, and not to default to the temptingly easy path of collecting only the papers of that tiny sliver of authors considered publishable by mainstream publishers.

Continue reading

Web Archives as Scholarly Sources: Issues, Practices and Perspectives

RESAW conference in Aarhus, 8-10 June 2015

Web archiving has been part of Special Collections work at the Bodleian Library for quite a while now, both in cooperation with other UK Legal Deposit Libraries within the electronic Legal Deposit framework (since 2013) , and through the Bodleian Libraries’ own Web Archive.
But whereas the amount of archived web material – at the Bodleian and elsewhere – is constantly growing, the usage of these new resources has so far been quite low, with, it seems, scholars being largely unaware of the potential web archives have as sources for research or lacking knowledge and skills of how to work with such material, and web archiving institutions lacking resources to promote their web archive collection and support their use.

The Research Infrastructure for the Study of Archived Web Materials (RESAW) network aims to promote the establishing of a collaborative European research infrastructure for the study of archived web materials. This means collaborating internationally as well as interdisciplinary to meet the challenges – and the opportunities – archived web materials bring to develop new methods and approaches in research and teaching.

DSC00997

One of the topics: How to archive Social Media content?  And how to use archived Social Media content as scholarly sources?

Tweets from the conference have been collected via Storify. Thanks to Jane Winters from the Institute for Historical Research, University of London, for having set this up.  

The 2015 RESAW conference, hosted by the University of Aarhus in Denmark, was the third in a series of conferences: the first conference in 2001 focused on how to preserve web content, the second in 2008 on web archives theory, and this year’s third conference on the actual use of web archives in research.
Participants included over 80 web archivists, curators, researchers, and IT experts  from various disciplines  from Canada, Denmark, Finland, France, Germany, Italy, Israel, the Netherlands, Russia, the UK, the United Arabic Emirates, and the USA, representing public and private archives, state and university libraries, research institutions, IT service providers and web archiving consultants.

For an intense three days, keynote speeches, and short and long papers alternated panel discussions, with speakers and presenters reporting on their approaches to and practical experiences in archiving websites and in using archived web material for research.
Whereas the individual case studies came from very different backgrounds – focusing on YouTube or social media, exploring possible new tools and methodologies for web archiving and web archives analysis, dealing with the use of Big Data or small datasets in research disciplines from anthropology and linguistics to international relations and migration studies, looking at academic websites, popular culture, internet governance, citizen involvement and even troll communities – it soon became clear that the individual results would lead to common conclusions:

Archived web materials are ‘different’ from both traditional paper-based resources and from the live web. Therefore, existing research theories and traditional approaches to collecting and curating are often not useful when dealing with web materials; new methodologies need to be developed, new questions to be asked. On a practical basis, there is a big need for new tools to deal with the sheer amount of data available for research,  for example to filter and analyze web archive collections, and to visualize results.
Archiving web materials, curating collections, and using them as scholarly sources requires a great amount of resources  – staff/time, knowledge and expertise, technical infrastructure and tools. To use the existing resources as efficiently as possible, archivists, curators, researches of different disciplines, IT experts and service providers need to collaborate.  Pooling resources across institutions and creating (international) networks to share knowledge and experience seems to be the way forward.

Anna Perricci, Columbia University, on the importance of building web archiving collaborations

Anna Perricci, Columbia University, on the importance of building web archiving collaborations

Communication and openness are key! Archivists and curators should make web archiving processes transparent and explain to scholars what type of material and information they can realistically expect to find in web archives (and what is likely not to be included!).  Researchers should clearly express their needs and expectations, but at the same time, be willing to engage with a new type of resource, requiring new approaches, and at least basic IT skills. IT experts should develop easy to use and transparent tools, and share technical knowledge that helps to interpret archived web materials. Users should feed their experience back to curators and developers to help improve web archives selection, metadata/description and discovery tools.

Web archiving is still a young discipline – and research based on archived web material is an even younger one. There are no golden ‘how to’ rules, standards or ‘ultimate authorities’ yet, everyone is still learning. Individual projects encountering problems, or even ‘failing’ to achieve the desired outcome, can still provide valuable lessons to learn from for others. Successes, e.g. in developing and using methodologies and tools for web archiving and using web archives, can be the starting point for developing best practice guidelines in the medium to long term. Again, this requires communication and collaboration within and across institutions, professions, disciplines and countries.

Gareth Millward sharing his experience from the BUDDAH project

A case study of using Web archives as scholarly sources: Gareth Millward sharing his experience exploring the evolution of  disability organisation websites through the UK Domain Data Archive.

The conference’s big strength, apart from giving web archiving professionals and web archives users the opportunity to present their recent and ongoing projects and – in many cases – asking the other conference participants for input and advice, was certainly to bring together people concerned with web archives from a great variety of backgrounds, thus enabling exchange of ideas, debate and networking. There were many eye-opening moments in terms of discovering someone else, in a different institution in a different country, has been working on similar topics or encountered similar problems.

Knowing how and with which result web archived materials were used in other institutions will be very valuable if and when the Bodleian Libraries decide to promote their own web archive collections. At the same time, getting in touch with web archiving colleagues in the UK and internationally offers much potential for collaborations in future projects.
For example, the Tomsk State University in Russian is currently trying to establish a web archive similar to the Bodleian Libraries’ Web Archives, whilst research projects run at the Institute for Historical Research of the University of London  as part of the Big UK Domain Data for the Arts and Humanities Project in cooperation with the British Library could be used as examples to promote the scholarly use of the UK (Legal Deposit) Web Archive in Oxford.

Special Collections in the Danish Netarkivet

Special Collections in the Danish Web Archive, which is run by the State and University Library in Aarhus and The Royal Libray in Copenhagen. Since 2005 the collection and preservation of the .dk internet is included in the Danish Legal Deposit Law.

At the end of the conference, everyone was buzzing with enthusiasm and new ideas, and agreed that the event was a great success  – not least to the flawless organisation and wonderful Danish hospitality, which included a reception celebrating the anniversary of the Danish Web Archiv netarkivet.dk, lots of Smørrebrød (delicious Danish open sandwiches) and a memorable conference dinner, all adding to the friendly and sociable character of the event.

A similar conference is now envisaged to be held in 2016 or 2017 in London, an opportunity not to be missed to catch up with the latest in Web Archiving and strengthen old and new – forgive the pun – links!

Balisage 2010 The Markup Conference

Balisage 2010 The Markup Conference was
preceded by the International Symposium on XML for the Long Haul Issues in the Long-term Preservation of XML which opened with:

A brief history of markup of social science data: from punched cards to “the life cycle” approach covering the “25-year process of historical evolution leading to DDI, the Data Documentation Initiative, which unites several levels of metadata in one emerging standard.”

Sustainability of linguistic resources revisited looked at some of the difficulties facing language resources over the long-term.

Report from the field: PubMed Central, an XML-based archive of life science journal articles provided insight into the processes deployed to give public access to the full text of more than two million articles.

Portico: A case study in the use of XML for the long-term preservation of digital artifacts discussed some practices that can help assure the semantic stability of digital assets.

The Sustainability of the Scholarly Edition in a Digital World explored the need for “ tools to make XML encoding easier, to encourage collaboration, to exploit social media, and to separate transcriptions of texts from the editorial scholarship applied to
them”.

A formal approach to XML semantics: implications for archive standards examined whether “The application of Montague semantics to markup languages may make it possible to distinguish vocabularies that can last from those which will not last”.

Metadata for long term preservation of product data discussed the “valuable lessons to be learned from the library metadata and packaging standards and how they relate to product metadata”.

The day concluded with Beyond eighteen wheels: Considerations in archiving documents represented using the Extensible Markup Language (XML) which contemplated “strategies for extending the useful life of archived documents”.

Sessions in the main conference 2010 – covered topics such as :

gXML, a new approach to cultivating XML trees in Java which proposed “A single unified Java-based API, gXML, can provide a programming platform for all tree models for which a “bridge” has been developed. gXML exploits the Handle/Body design pattern and supports the XQuery Data Model (XDM)”.

Java integration of XQuery — an information unit oriented approach explored “a novel pattern of cooperation between XQuery and Java developer? A new API, XQJPLUS, makes it possible to let XQuery build “information units” collected into “information trays”.

XML pipeline processing in the browser discussed the benefits that providing XProc as a Javascript-based implementation would offer comprehensive client-side portability for XML pipelines specified in XProc.

Where XForms meets the glass: Bridging between data and interaction design explored using XForms which offers a model-view framework for XML whilst working within the conventions of existing Ajax frameworks such as Dojo as a way to bridge differing development approaches,data-centric versus starting from the user interface .

A packaging system for EXPath demonstrated how to adapt conventional ideas of packaging to work well in the EXPath environment. “EXPath provides a framework for collaborative community-based development of extensions to XPath and XPath-based technologies (including XSLT and Xquery)”.

A streaming XSLT processor Michael Kay (editor of the XSLT 2.1 specification) showed how he has been implementing streaming features in his Saxon XSLT processor;

Processing arbitrarily large XML using a persistent DOM covered moving the DOM out of memory and into persistent storage offering another processing option for large documents, by utilising, an efficient binary representation of the XML document that has been developed, with a supporting Java API.

Scripting documents with XQuery: virtual documents in TNTBase presented a virtual-document facility integrated into TNTBase, an XML database with support for versioning. The virtual documents can be edited, and changes to elements in the underlying XML repository are propagated automatically back to the database.

XQuery design patterns illustrated the benefits that might extend from the application of meta design patterns to Xquery.

-Renhart Gittens