Tag Archives: conference

Highlights and Takeaways from the Association of Internet Reseachers Annual Conference (AoIR) 2024

At the end of October, I had the opportunity to attend the 2024 Association of Internet Researchers (AoIR) conference, which took place in the lovely city of Sheffield. This was my first time attending an AoIR conference and I was grateful to join such a vibrant meeting of Internet researchers from all over the world. As a Curatorial and Policy Research Officer for the Algorithmic Archive Project, currently exploring the ways in which social media and algorithmic data are being used across disciplines, this was a unique opportunity for me to engage with a diverse range of research on the web and social platforms.

This year’s AoIR conference was hosted by the University of Sheffield, with the Student Union building serving as the main venue. This impressive structure spans five floors and includes a cosy lounge area on the third floor, offering attendees a space to relax and network between sessions in a packed 4-day program. The main theme of this year’s AoIR2024 conference was “industry”, inviting the research community to reflect and discuss the relationship between the internet and industry. With over thirteen parallel sessions scheduled for each time block, choosing just one to attend proved to be rather challenging.

A view of the University of Sheffield, Student Union where some of the AoIR2024 conference sessions took place between 30 October – 2 November 2024. Photo taken by B. Cannelli

One aspect that really stood out to me from the conference was the diverse range of research involving information generated on social media platforms, spanning from creators’ economy dynamics, news polarization, AI applied in the context of online communities and content moderation, online pop culture and disinformation across various platforms. There were several panels discussing platform governance – the set of rules, policies and decision-making processes that shape how content is collected, accessed and used within a platform – shedding light on the power dynamics that influence user experience. From an archival perspective, understanding how platforms regulate access to data and the consumption of content is crucial, with significant implications for how this content can be archived by memory institutions.

Among the many sessions exploring virality phenomena and cultures on social media, it is worth mentioning the one reflecting on “mediated memory”. It examined how social platforms like TikTok serve, for instance, as spaces to remember displaced cultures, and how they facilitate the transmission of cultural aspects to younger generations, helping to perpetuate them through time and space. Additionally, the session titled “Times and Transformations” provided some excellent examples of research conducted with web-archived content from research libraries, along with insightful reflections on the epistemology of web archiving.

Firth Court, a Grade II listed Edwardian building that constitutes part of the Western Bank Campus of the University of Sheffield. Photo taken by B. Cannelli

Overall, the conference highlighted the crucial role social media data play in today’s communication landscape and underscored the value of platforms’ user-generated content as a key resource for researchers across a wide range of disciplines. The interplay of light and shadows explored in various panels on platform governance further emphasised the enormous power platforms hold over this user-generated data, as well as the pressing need for support to enable researchers to access and preserve these data over time. 

I left the AoIR2024 conference with so much food for thought! It has also been a fantastic opportunity for networking, which will be important for the scoping phase of the Algorithmic Archive project.

ARA Conference 2024: A New Professional’s Experience

At the end of August, members of the Archives and Records Association gathered in Birmingham for conference, and I was grateful to be among them for a day. I chose to attend the first day, partially for that early-days energy, but mostly because one of the themes was ‘Digital Recordkeeping and the Cloud’. As a trainee in digital archiving, this seemed too on-the-nose to miss.

The venue was as practical as the location, but the sun shone in on a cosy array of sponsor stands and bleary-eyed delegates as we shuffled to find our preferred caffeinated medium. Then took our seats for the keynote. Alistair Brown shared with us his observations on the archive – and wider heritage sector – from the perspective of a funder. He touched on key challenges ahead, particularly climate change and ecosystem concerns which intersect with issues of digitalisation and data use; as well as giving us an overview of the National Lottery’s Heritage 2033 strategy.

The views of Birmingham were quite striking, as can be seen here in this evening view of the Birmingham skyline – featuring a statue of Queen Victoria. Photo taken by E.Morris

Alistair’s keynote foreshadowed the themes of the day: Climate Advocacy and Education, Conservation, Passive and Sustainable Storage, and, of course, Digital Recordkeeping and the Cloud. The last of these proved much more popular than perhaps even the organisers were anticipating and a full lecture room meant that for the first session I instead dipped into a talk focussed on converting existing buildings into suitable archives. A thought-provoking offering from Oberlanders Architects, and an attractive option for those with buildings of note to make use of, potentially the only option for those of limited means, and potentially a greener solution that pleases the local planning committee.

I left lunch early to get a spot in the Digital theme for the afternoon and left the tea-break even earlier so I might get a spot on a chair rather than the floor for the second afternoon session! Across these, six speakers brought their take on topics such as: the carbon footprint of our data, the tension between what to keep and what to delete, encouraging better data storage practise, and of course what methods we might use to achieve these aims.

Anne Grzybowski (Heriot-Watt University) reminded us all of the carbon-footprint incurred by the ROT – the redundant, obsolete, or transitory documents and records we haphazardly accumulate unless management is routine and effective. ROT has always been a challenge for record managers, but are we more tempted to seek ways to simply increase our data budget than sort it out? The digital sphere has the potential to be “out of sight, out of mind” in a way that physical records cannot hope to be, but those charged with managing those records need to have a holistic view of the costs of storage, above and beyond the financial. Laura Peaurt took this further, looking at the options considered by the University of Nottingham for digital storage and how sustainable these were.

Buzzing in my mind as I stretched my legs around Birmingham’s canals between the talks and supper were a couple of thoughts: forefront of these is the trust organisations, particularly archives, must now place in external commercial organisations for the safe-keeping of their records and materials. Very few speakers started from a position other than a subscription to Microsoft 365. We spoke at length about the Cloud – a storage reality that means remote infrastructure, potentially residing in entirely different nations. While the available options are not entirely within our control, it would be naive to think that recordkeeping or archiving will be exempt from issues such as the mass outage Microsoft saw at the end of July. I was surprised that rather than being discussed at all, it seemed taken for granted that we would pin our digital preservation hopes on commerical cloud servers (and such like).

Not far behind this thought was “how will we sort it all out?!” We know that we are creating veritable digi-tonnes of data every day, both as individuals and organisations. Across the speakers I had heard, many attested to the truth we all suspect: many of us are poor at organising our digital lives, wasteful with the space we use, irresponsible with what we keep and what we don’t. So, what will the archivists of the future inherit? As I have discovered in my own work, the best intentions of archivists-past can leave archivists-present scratching their heads (or worse, shaking their fists).

Supper was served in the Banqueting Suite of Birmingham’s Council Chambers. A gorgeous space to reflect and network, or just stare up at the ceiling! Photo taken by E.Morris

If nothing else the ARA conference has inspired me to keep thinking big, and encourage those around me to do so as well. To forge future-oriented solutions, not simply plug the gaps now. With half an eye on what AI might do in this sphere, the time is ripe for us to build systems that just might cause Archivists of the future to say “I’m glad they thought of that”.

Conference Report: Archives and Records Association Annual Conference 2021

The Archives and Records Association (ARA) Annual Conference 2021 was held 1st–3rd September 2021. In this blog post, Rachael Marsay reports on some of the highlights of the conference, held entirely online this year for the first time.


Logo for the Archive and Records Association 2021 Virtual Conference

There were three themes to this year’s conference: sustainability, diversity, and advocacy. Though each day of the conference covered one theme, one of the stand-outs of the conference was just how interlinked all three strands were.

Day one’s keynote speaker was Jeff James, Chief Executive and Keeper at The National Archives. Jeff talked about environmental sustainability, as well as the sustainability of the record and of the archives sector. He mentioned how The National Archives at Kew are committed to lowering their carbon footprint, which has been reduced by 80% since 2009. This has been achieved by building on scientific research with regards to buildings, bringing both a financial and environmental benefit. He also spoke of records at risk, referring to the work of the Cultural Recovery Fund, the Covid-19 Archives Fund for records at risk and the Crisis Management Team alongside already established fund streams such as the Archives Revealed grant scheme. Digital records were flagged as records at risk and he stressed the need for the sector to work in partnership and collaboration, both together and with digital giants (such as Microsoft and Google) with regards to developing digital products. Sector skills include the need for records professionals to gain digital skills through schemes and strategies such as Plugged In Powered Up, the Novice to Know-How online training resource created by the Digital Preservation Coalition, the Digital Archives Learning Exchange, and the Bridging the Gap traineeship programme.

The fragility of born-digital records, identified as critically endangered by the Digital Preservation Coalition, was a common theme throughout the conference. Even the most modern of records are at risk (CD-Rs for example, have a lifespan of under 10 years). Particular digital records discussed related to oral history interviews, often seen as ‘history from below’, recording the lives of those with ‘hidden histories’ off mainstream records, such as women and members of the LGBTQ+ community. Challenges to preserve digital material include cost, knowledge, skills and training, technology, and resources, as well as issues surrounding ‘gatekeeping’ and access to material. Rachel MacGregor (Digital Preservation Officer at The Modern Records Centre, University of Warwick) emphasised the need to record, describe, and catalogue born digital collections well in order to ensure that that they can be utilised by researchers, and explored some of the standards and guidance currently available.

Day two’s keynote speaker was Arike Oke (Managing Director, Black Cultural Archives) who spoke about experiences with diversity, aptly described as the equitable and mindful bringing together of difference; diversity should not be seen as static, but as a perpetual movement, both including and evolving difference. In her talk, Arike raised the point of classifying and being classified, and several sessions across the three days referred to how language and terminology impacted the use of records or archives created by or for particular communities. The use of historic terminology can be a barrier to access, particularly when words hold negative connotations that can cause distress to users. This was explored in several sessions in relation to LGBTQ+ related records and archives (including those kept at the Parliamentary Archives of the UK Parliament), as well as colonial collections such as the Miscellaneous Reports Collection held by the Royal Botanic Gardens in Kew. Thoughts on how to address the issues included guides or notes explaining the context and why such words were used, including modern terms or names in brackets, inviting feedback, and for events, giving participants time and space to process information.

The importance of being open to keeping more ephemeral material and objects (e.g. pin badges, leaflets and posters) was also highlighted, particularly in shedding light on lives not necessarily recorded in more traditional forms. Christopher Hilton of Britten Pears Arts gave an interesting presentation on the multitude of receipts kept by Benjamin Britten and his partner Peter Pears for tax purposes. The receipts were important in shedding light on their relationship by providing evidence that they maintained clearly separate financial lives, demonstrating how important it was for their professional lives at that period that their records could be used to demonstrate a ‘plausible deniability’ should their personal relationship be questioned. The receipts were also records of businesses in Aldeburgh which are now long gone, provoking memories for older residents and providing a tangible link between the archive and the town.

Day three’s keynote speaker was Deirdre McParland, Senior Archivist at the Electricity Supply Board (Ireland) whose inspirational talk focussed on the importance of advocacy and that ‘archives are for life, not just anniversaries’. Deirdre spoke of how archives should be pro-active and innovative when it comes to advocacy, and that projects should be strategically planned to include promotion as standard. Deirdre’s talk was followed by a talk by Jenny Moran and Robin Jenkins from the Record Office for Leicestershire, Leicester and Rutland, and Richard Wiltshire of the Crisis Management Team. Jenny, Robin and Richard talked about saving the archive of the travel firm Thomas Cook after the company’s sudden collapse: an excellent example of how swift action, negotiation and successful advocacy led to the ensured survival of the archive. The conference was nicely brought to a close by a talk by Alan and Bethan Ward on their project Photographs from Another Place. Their talk, given from the perspective of the archive user, showed how a bit of archival research revealed the names and stories behind a group of forgotten and unlabelled glass plate negatives. It was, for me at least, a timely reminder of the enduring value of archives.


A selection of further reading recommendations made by speakers and participants:

 

Conference Report: IIPC Web Archiving Conference 2021

This year’s International Internet Preservation Consortium Web Archiving Conference was held online from 15-16th June 2021, bringing together professionals from around the world to share their experiences of preserving the Web as a research tool for future generations. In this blog post, Simon Mackley reports back on some of the highlights from the conference.  

How can we best preserve the World Wide Web for future researchers, and how can we best provide access to our collections? These were the questions that were at the forefront of this year’s International Internet Preservation Consortium Web Archiving Conference, which was hosted virtually by the National Library of Luxembourg. Web archiving is a subject of particular interest to me: as one of the Bodleian Library’s Graduate Trainee Digital Archivists, I spend a lot of my time working on our own Web collections as part of the Bodleian Libraries Web Archive. It was great therefore to have the chance to attend part of this virtual conference and hear for myself about new developments in the sector.

One thing that really struck me from the conference was the huge diversity in approaches to preserving the Web. On the one hand, many of the papers concerned large-scale efforts by national legal deposit institutions. For instance, Ivo Branco, Ricardo Basílio, and Daniel Gomes gave a very interesting presentation on the creation of the 2019 European Parliamentary Elections collection at the Portuguese Web Archive. This was a highly ambitious project, with the aim of crawling not just the Portuguese Web domain but also capturing a snapshot of elections coverage across 24 different European languages through the use of an automated search engine and a range of web crawler technologies (see their blog for more details). The World Wide Web is perhaps the ultimate example of an international information resource, so it is brilliant to see web archiving initiatives take a similarly international approach.

At the other end of the scale, Hélène Brousseau gave a fascinating paper on community-based web archiving at Artexte library and research centre, Canada. Within the arts community, websites often function as digital publications analogous to traditional exhibition catalogues. Brousseau emphasised the need for manual web archiving rather than automated crawling as a means of capturing the full content and functionality of these digital publications, and at Artexete this has been achieved by training website creators to self-archive their own websites using Conifer. Given that in many cases web archivists often have minimal or even no contact with website creators, it was fascinating to hear of an approach that places creators at the very heart of the process.

It was also really interesting to hear about the innovative new ways that web archives were engaging with researchers using their collections, particularly in the use of new ‘Labs’-style approaches. Marie Carlin and Dorothée Benhamou-Suesser for instance reported on the new services being planned for researchers at the Bibliothèque nationale de France Data Lab, including a crawl-on-demand service and the provision of web archive datasets. New methodologies are always being developed within the Digital Humanities, and so it is vitally important that web archives are able to meet the evolving needs of researchers.

Like all good conferences, the papers and discussions did not solely focus on the successes of the past year, but also explored the continued challenges of web archiving and how they can be addressed. Web archiving is often a resource-intensive activity, which can prove a significant challenge for collecting institutions. This was a major point of discussion in the panel session on web archiving the coronavirus pandemic, as institutions had to balance the urgency of quickly capturing web content during a fast-evolving crisis against the need to manage resources for the longer-term, as it became apparent that the pandemic would last months rather than weeks. It was clear from the speakers that no two institutions had approached documenting the pandemic in quite the same way, but nonetheless some very useful general lessons were drawn from the experiences, particularly about the need to clearly define collection scope and goals at the start of any collecting project dealing with rapidly changing events.

The question of access presents an even greater challenge. We ultimately work to preserve the Web so that researchers can make use of it, but as a sector we face significant barriers in delivering this goal. The larger legal deposit collections, for instance, can often only be consulted in the physical reading rooms of their collecting libraries. In his opening address to the conference, Claude D. Conter of the National Library of Luxembourg addressed this problem head-on, calling for copyright reform in order to meet reader expectations of access.

Yet although these challenges may be significant, I have no doubt from the range of new and innovative approaches showcased at this conference that the web archiving sector will be able to overcome them. I am delighted to have had the chance to attend the conference, and I cannot wait to see how some of the projects presented continue to develop in the years to come.

Simon Mackley

UK Web Archive mini-conference 2020

On Wednesday 19th November I attended the UK Web Archive (UKWA) mini-conference 2020, my first conference as a Graduate Trainee Digital Archivist. It was hosted by Jason Webber, Engagement Manager at the UKWA and, as normal in these COVID times, it was hosted on Zoom (my first ever Zoom experience!)

The conference started with an introduction and demonstration of the UKWA by Jason Webber. Starting in 2005 the UKWA’s mission is to collect the entire UK webspace, at least once per year, and preserve the websites for future generations. As part of my traineeship I have used the UKWA but it was interesting to hear about the other functions and collections it provides. Along with being able to browse different versions of UK websites it also includes over 100 curated collections on themes ranging from Food to Brexit to Online Enthusiast Communities in the UK. It also features the SHINE tool, which was developed as part of the ‘Big UK Data Arts and Humanities’ project and contains over 3.5 billion items which have been full-text indexed so that every word is searchable. It allows users to perform searches and trend analysis on subjects over a huge range of websites, all you need to use this tool is a bit a Python knowledge. My Python knowledge is a bit basic but Caio Mello, during his researcher talk, provided a useful link for online python tutorials aimed at historians to aid in their research.

In his talk, Caio Mello (School of Advanced Study, University of London) discussed how he used the SHINE tool as part of his work for the CLEOPATRA Project. He was specifically looking at the Olympic legacy of the 2012 Olympics, how it was defined and how the view of the legacy changed over time. He explained the process he used to extract the information and the ways the information can be used for analysis, visualisation and context. My background is in mathematics and the concept of ‘Big Data’ came up frequently during my studies so it was fascinating to see how it can be used in a research project and how the UKWA is enabling research to be conducted over such a wide range of subjects.

The next researcher talk by Liam Markey (University of Liverpool and the British Library) showed a different approach to using the UKWA for his research project into how Remembrance in 20th Century Britain has changed. He explained how he conducted an analysis of archived newspaper articles, using specific search terms, to identify articles that focused on commemoration which he could then use to examine how the attitudes changed over time. The UKWA enabled him to find websites that focused on the war and compare these with mainstream newspapers to see how these differ.

The Keynote speaker was Paul Gooding (University of Glasgow) and was about the use and users of Non-Print Legal Deposit Libraries. His research as part of the Digital Library Futures Project, with the Bodleian Libraries and Cambridge University Library as case study partners, looked at how Academic Deposit libraries were impacted by e-Legal Deposit. It was an interesting discussion around some of the issues of the system, such as balancing the commercial rights with access for users and how highly restrictive access conditions are at odds with more recent legislation, such as the provision for disabled users and 2014 copyright exception for data and text mining for non-commercial uses.

Being new to the digital archiving world, my first conference was a great introduction to web archiving and provided context to the work I am doing. Thank you to the organisers and speakers for giving me insight into a few of the different ways the web archive is used and I have come away with a greater understanding of the scope and importance of digital archiving (as well as a list of blog posts and tutorials to delve into!)

Some Useful Links:

https://www.webarchive.org.uk/

https://programminghistorian.org/

https://blogs.bl.uk/webarchive/2020/11/how-remembrance-day-has-changed.html

http://cleopatra-project.eu/

 

#WeMissiPRES: Preserving social media and boiling 1.04 x 10^16 kettles

This year the annual iPRES digital preservation conference was understandably postponed and in its place the community hosted a 3-day Zoom conference called #WeMissiPRES. As two of the Bodleian Libraries’ Graduate Trainee Digital Archivists, Simon and I were in attendance and blogged about our experiences. This post contains some of my highlights.

The conference kicked off with a keynote by Geert Lovink. Geert is the founding director of the Institute of Network Cultures and the author of several books on critical Internet studies. His talk was wide-ranging and covered topics from the rise of so-called ‘Zoom fatigue’ (I guarantee you know this feeling by now) to how social media platforms affect all aspects of contemporary life, often in negative ways. Geert highlighted the importance of preserving social media in order to allow future generations to be able to understand the present historical moment. However, this is a complicated area of digital preservation because archiving social media presents a host of ethical and technical challenges. For instance, how do we accurately capture the experience of using social media when the content displayed to you is largely dictated by an algorithm that is not made public for us to replicate?

After the keynote I attended a series of talks about the ARCHIVER project. João Fernandes from CERN explained that the goal of this project is to improve archiving and digital preservation services for scientific and research data. Preservation solutions for this type of data need to be cost-effective, scalable, and capable of ingesting amounts of data within the petabyte range. There were several further talks from companies who are submitting to the design phase of this project, including Matthew Addis from Arkivum. Matthew’s talk focused on the ways that digital preservation can be conducted on the industrial scale required to meet the brief and explained that Arkivum is collaborating with Google to achieve this, because Google’s cloud infrastructure can be leveraged for petabyte-scale storage. He also noted that while the marriage of preserved content with robust metadata is important in any digital preservation context, it is essential for repositories dealing with very complex scientific data.

In the afternoon I attended a range of talks that addressed new standards and technologies in digital preservation. Linas Cepinskas (Data Archiving and Networked Services (DANS)) spoke about a self-assessment tool for the FAIR principles, which is designed to assess whether data is Findable, Accessible, Interoperable and Reusable. Later, Barbara Sierman (DigitalPreservation.nl) and Ingrid Dillo (DANS) spoke about TRUST, a new set of guiding principles that are designed to map well with FAIR and assess the reliability of data repositories. Antonio Guillermo Martinez (LIBNOVA) gave a talk about his research into Artificial Intelligence and machine learning applied to digital preservation. Through case studies, he identified that AI is especially good at tasks such as anomaly detection and automatic metadata generation. However, he found that regardless of how well the AI performs, it needs to generate better explanations for its decisions, because it’s hard for human beings to build trust in automated decisions that we find opaque.

Paul Stokes from Jisc3C gave a talk on calculating the carbon costs of digital curation and unfortunately concluded that not much research has been done in this area. The need to improve the environmental sustainability of all human activity could not be more pressing and digital preservation is no exception, as approximately 3% of the world’s electricity is used by data centres. Paul also offered the statistic that enough power is consumed by data centres worldwide to boil 10,400,000,000,000,000 kettles – which is the most important digital preservation metric I can think of.

This conference was challenging and eye-opening because it gave me an insight into (complicated!) areas of digital preservation that I was not familiar with, particularly surrounding the challenges of preserving large quantities of scientific and research data. I’m very grateful to the speakers for sharing their research and to the organisers, who did a fantastic job of bringing the community together to bridge the gap between 2019 and 2021!

#WeMissiPRES: A Bridge from 2019 to 2021

Every year, the international digital preservation community meets for the iPRES conference, an opportunity for practitioners to exchange knowledge and showcase the latest developments in the field. With the 2020 conference unable to take place due to the global pandemic, digital preservation professionals instead gathered online for #WeMissiPRES to ensure that the global community remained connected. Our graduate trainee digital archivist Simon Mackley attended the first day of the event; in this blog post he reflects on some of the highlights of the talks and what they tell us about the state of the field.

How do you keep the global digital preservation community connected when international conferences are not possible? This was the challenge faced by the organisers of #WeMissIPres, a three-day online event hosted by the Digital Preservation Coalition. Conceived as a festival of digital preservation, the aim was not to try and replicate the regular iPRES conference in an online format, but instead to serve as a bridge for the digital preservation community, connecting the efforts of 2019 with the plans for 2021.

As might be expected, the impact of the pandemic loomed large in many of the talks. Caylin Smith (Cambridge University Library) and Sara Day Thomson (University of Edinburgh) for instance gave a fascinating paper on the challenge of rapidly collecting institutional responses to coronavirus, focusing on the development of new workflows and streamlined processes. The difficulties of working from home, the requirements of remote access to resources, and the need to move training online likewise proved to be recurrent themes throughout the day. As someone whose own experience of digital preservation has been heavily shaped by the pandemic (I began my traineeship at the start of lockdown!) it was really useful to hear how colleagues in other institutions have risen to these challenges.

I was also struck by the different ways in which responses to the crisis have strengthened digital preservation efforts. Lynn Bruce and Eve Wright (National Records of Scotland) noted for instance that the experience of the pandemic has led to increased appreciation of the value of web-archiving from stakeholders, as the need to capture rapidly-changing content has become more apparent. Similarly, Natalie Harrower (Digital Repository of Ireland) made the excellent point that the crisis had not only highlighted the urgent need for the sharing of medical research data, but also the need to preserve it: Coronavirus data may one day prove essential to fighting a future pandemic, and so there is therefore a moral imperative for us to ensure that it is preserved.

As our keynote speaker Geert Lovink (Institute of Network Cultures) reminded us, the events of the past year have been momentous quite apart from the pandemic, with issues such as the distorting impacts of social media on society, the climate emergency, and global demands for racial justice all having risen to the forefront of society. It was great therefore to see the role of digital preservation in these challenges being addressed in many of the panel sessions. A personal highlight for me was the presentation by Daniel Steinmeier (KB National Library of the Netherlands) on diversity and digital preservation. Steinmeier stressed that in order for diversity efforts to be successful, institutions needed to commit to continuing programmes of inclusion rather than one-off actions, with the communities concerned actively included in the archiving process.

So what challenges can we expect from the year ahead? Perhaps more than ever, this year this has been a difficult question to answer. Nonetheless, a key theme that struck me from many of the discussions was that the growing challenge of archiving social media platforms was matched only by the increasing need to preserve the content hosted on them. As Zefi Kavvadia (International Institute of Social History) noted, many social media platforms actively resist archiving; even when preservation is possible, curators are faced with a dilemma between capturing user experiences and capturing platform data. Navigating this challenge will surely be a major priority for the profession going forward.

While perhaps no substitute for meeting in person, #WeMissiPRES nonetheless succeeded in bringing the international digital preservation community together in a shared celebration of the progress being made in the field, successfully bridging the gap between 2019 and 2021, and laying the foundations for next year’s conference.

 

#WeMissiPRES was held online from 22nd-24th September 2020. For more information, and for recordings of the talks and panel sessions, see the event page on the DPC website.

WARC Files and Blue Lagoons: The IIPC Web Archiving Conference, 13-15 April 2016 in Reykjavik

The International Internet Preservation Consortium (IIPC) is the leading international organisation dedicated to improving the tools, standards and best practices of web archiving, promoting international collaboration and the broad access and use of web archives for research and as cultural heritage.

logoThis year, for the first time the IIPC’s annual General Assembly in Reykjavik was accompanied by a three-day conference, bringing together web archivists, curators, IT specialists and researchers to discuss challenges related to acquiring, preserving, making available and using web archives.  With over 150 participants, including leading experts – most prominently the internet pioneer Vint Cerf – the conference provided a unique opportunity to learn about web archiving strategies and projects around the world, and to keep up to date with emerging trends in research and latest technological developments.

Vint Cerf, Avoiding a Digital Dark Age
Vint Cerf, Avoiding a Digital Dark Age

The first day, after a warm welcome by Ingibjörk Sverrisdottir, Iceland’s National Librarian, was dedicated to the ‘big questions’ of web archiving: What’s worth saving? (Hjalmar Gislason) and how to avoid a Digital Dark Age? (Vint Cerf). How might new services look like, which tools and strategies for preservation are available (Emulation!), or being developed? Or, in the words of Brewster Kahle, founder of the Internet Archive: ’20 years of Web Archiving – What do we do now?’ (video of his talk introducing the ‘National Library of Atlantis’ prototype for integrated web archive discovery)

Brewster Kahle, What Do We Do Now?
Brewster Kahle, What Do We Do Now?

On the second day, the conference continued with two separate tracks, discussing either policies, practices and strategies for capture and preservation of web material, or looking more at the user side of web archives, and at how web archive data be accessed, searched, analysed and visualised as a resource for research.
The third day was the hands-on day with workshops exploring search interfaces such as the SHINE interface developed at the British Library for the UK Web Archive,  DIY web archiving tools such as webrecorder.io, the open-source platform Warcbase for analysing web archive data, and discussing the future of the WARC archive format.

There was plenty of time for Q&A and discussions between and after the talks and presentations, and open, friendly atmosphere of the conference encouraged informal conversations with web archiving colleagues and networking during coffee and lunch breaks, and on visits like the tour of the National and University Library of Iceland.

The National and University Library of Iceland
The National and University Library of Iceland

Once again it became clear that web archiving practice is at the same time extremely diverse and depending on joint efforts and collaborations:
For example, the priorities in curating a relatively small collection of Electronic Literature at the German Literary Archive Marbach are very different from these in capturing and preserving the .EU domain at the Portuguese National Foundation for Scientific Computing FCCN, owing the scope, size and structure of the collections, and the resources available to build and maintain them. Similarly, quality assurance policies and workflows differ considerably between national domain scale archives, such as the Legal Deposit UK Web Archive containing millions of websites, and specialized archives curated and captured by university libraries like the North Carolina State University. Researchers approach the UK Government Web Archive with different research questions than those they would use to look at archived Twitter data.

But no matter the size and scope of the web archive, the resources available at a web archiving institution, or the focus of a particular project, the underlying challenges are very similar:

  • How do we decide what to capture?
  • How to capture it?
  • How to preserve it for the future?
  • Metadata?
  • How to provide access and facilitate discovery?
  • How to use web archives for research?

Working collaboratively and across disciplines, including perspectives from archivists, curators, IT engineers and researchers seems to be the best way forward, and the practice of sharing knowledge and experience, and to openly discuss problems gets certainly embraced by the web archiving community. A particular project might have ‘failed’ in terms of achieving the intended outcome, but it can still provide valuable lessons for the next project elsewhere, and in the long run, for developing best practice, policies and standards for web archiving as a discipline.

Mistakes are only wrong if you - and others - don't learn from them!
Mistakes are only wrong if you – and others – don’t learn from them!

Curators might be slightly overwhelmed by technical details discussed by web crawl engineers (I certainly was!) and ‘the IT guys (and girls)’ might sometimes be confused by the curatorial way of thinking; web archiving cultures in North America seem to differ considerably from the approaches in Europe, where Legal Deposit regulations have a strong impact on collection strategies and access to archives. STEM researchers look at data in different ways than historians and social scientists.
International conferences like the IIPC Web Archiving Conference 2016 are invaluable for bringing together these different perspectives, for fostering discussion and knowledge sharing and for providing an opportunity to establish new and strengthening existing contacts with web archiving colleagues in archives, (university) libraries and research institutions worldwide.

Archiving social media...
Harvesting social media: Overview…

 

...the details.
…and details.

Web archivists love to produce new social media content:
The conference seen through the participants’ Tweets: #iipcwac16.
(Now we just have to archive that!
)

Not least, the Reykjavik conference provided a rare opportunity to meet web archiving colleagues from other UK Legal Deposit Libraries outside the usual committees and institutional settings. One of the conference lunch breaks was turned into an ad-hoc UK Legal Deposit Web Archive meeting, discussing user interface redevelopment – and where else but in Iceland can you have a Friday late afternoon conference debrief whilst soaking in a giant outdoor geothermal bathtub (aka the Blue Lagoon)?

UK web archivists after conference debrief
Some very clean UK web archivists after the conference debrief

 

 

Catching butterflies

Archival Uncertainties: International Conference on Literary Archives at the British Library – 4 April 2016

This one-day conference focused on digital humanities, with papers from a spectrum of interested parties including academics working on digitisation projects, authors, translators, archivists and curators. I attended three panels on the day and the unifying theme was a contrary message of dispersal and amalgamation (and butterflies).

The first thing that has been dispersed or discarded is any idea of a literary canon. As plenary speaker and archivist Catherine Hobbs pointed out, scholarship now focuses less on established set texts and more on themes like “environmental literature”. Over the past few decades, in response to this, archives have collected more non-traditionally canonical literary papers but, Catherine reminded us, as archivists we can’t stop paying attention to the ways that literature continues to change. We need to keep tabs on what is going on in the literary world in order to document it, and this will include tackling new forms of experimental, avant-garde and self-published writing.

Caterpillar: Schwalbenschwanz (Raupe)
Caterpillars and collection development [By Eric Steinert – photo taken by Eric Steinert at Paussac, France, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=338409]
As Catherine noted, it used to be easy to find the avant-garde – pretty much whoever was hanging out on the Left Bank – but now it’s up to archivists to not only collect this material, but to track it down in the first place, and not to default to the temptingly easy path of collecting only the papers of that tiny sliver of authors considered publishable by mainstream publishers.

Continue reading Catching butterflies

Web Archives as Scholarly Sources: Issues, Practices and Perspectives

RESAW conference in Aarhus, 8-10 June 2015

Web archiving has been part of Special Collections work at the Bodleian Library for quite a while now, both in cooperation with other UK Legal Deposit Libraries within the electronic Legal Deposit framework (since 2013) , and through the Bodleian Libraries’ own Web Archive.
But whereas the amount of archived web material – at the Bodleian and elsewhere – is constantly growing, the usage of these new resources has so far been quite low, with, it seems, scholars being largely unaware of the potential web archives have as sources for research or lacking knowledge and skills of how to work with such material, and web archiving institutions lacking resources to promote their web archive collection and support their use.

The Research Infrastructure for the Study of Archived Web Materials (RESAW) network aims to promote the establishing of a collaborative European research infrastructure for the study of archived web materials. This means collaborating internationally as well as interdisciplinary to meet the challenges – and the opportunities – archived web materials bring to develop new methods and approaches in research and teaching.

DSC00997
One of the topics: How to archive Social Media content?  And how to use archived Social Media content as scholarly sources?

Tweets from the conference have been collected via Storify. Thanks to Jane Winters from the Institute for Historical Research, University of London, for having set this up.  

The 2015 RESAW conference, hosted by the University of Aarhus in Denmark, was the third in a series of conferences: the first conference in 2001 focused on how to preserve web content, the second in 2008 on web archives theory, and this year’s third conference on the actual use of web archives in research.
Participants included over 80 web archivists, curators, researchers, and IT experts  from various disciplines  from Canada, Denmark, Finland, France, Germany, Italy, Israel, the Netherlands, Russia, the UK, the United Arabic Emirates, and the USA, representing public and private archives, state and university libraries, research institutions, IT service providers and web archiving consultants.

For an intense three days, keynote speeches, and short and long papers alternated panel discussions, with speakers and presenters reporting on their approaches to and practical experiences in archiving websites and in using archived web material for research.
Whereas the individual case studies came from very different backgrounds – focusing on YouTube or social media, exploring possible new tools and methodologies for web archiving and web archives analysis, dealing with the use of Big Data or small datasets in research disciplines from anthropology and linguistics to international relations and migration studies, looking at academic websites, popular culture, internet governance, citizen involvement and even troll communities – it soon became clear that the individual results would lead to common conclusions:

Archived web materials are ‘different’ from both traditional paper-based resources and from the live web. Therefore, existing research theories and traditional approaches to collecting and curating are often not useful when dealing with web materials; new methodologies need to be developed, new questions to be asked. On a practical basis, there is a big need for new tools to deal with the sheer amount of data available for research,  for example to filter and analyze web archive collections, and to visualize results.
Archiving web materials, curating collections, and using them as scholarly sources requires a great amount of resources  – staff/time, knowledge and expertise, technical infrastructure and tools. To use the existing resources as efficiently as possible, archivists, curators, researches of different disciplines, IT experts and service providers need to collaborate.  Pooling resources across institutions and creating (international) networks to share knowledge and experience seems to be the way forward.

Anna Perricci, Columbia University, on the importance of building web archiving collaborations
Anna Perricci, Columbia University, on the importance of building web archiving collaborations

Communication and openness are key! Archivists and curators should make web archiving processes transparent and explain to scholars what type of material and information they can realistically expect to find in web archives (and what is likely not to be included!).  Researchers should clearly express their needs and expectations, but at the same time, be willing to engage with a new type of resource, requiring new approaches, and at least basic IT skills. IT experts should develop easy to use and transparent tools, and share technical knowledge that helps to interpret archived web materials. Users should feed their experience back to curators and developers to help improve web archives selection, metadata/description and discovery tools.

Web archiving is still a young discipline – and research based on archived web material is an even younger one. There are no golden ‘how to’ rules, standards or ‘ultimate authorities’ yet, everyone is still learning. Individual projects encountering problems, or even ‘failing’ to achieve the desired outcome, can still provide valuable lessons to learn from for others. Successes, e.g. in developing and using methodologies and tools for web archiving and using web archives, can be the starting point for developing best practice guidelines in the medium to long term. Again, this requires communication and collaboration within and across institutions, professions, disciplines and countries.

Gareth Millward sharing his experience from the BUDDAH project
A case study of using Web archives as scholarly sources: Gareth Millward sharing his experience exploring the evolution of  disability organisation websites through the UK Domain Data Archive.

The conference’s big strength, apart from giving web archiving professionals and web archives users the opportunity to present their recent and ongoing projects and – in many cases – asking the other conference participants for input and advice, was certainly to bring together people concerned with web archives from a great variety of backgrounds, thus enabling exchange of ideas, debate and networking. There were many eye-opening moments in terms of discovering someone else, in a different institution in a different country, has been working on similar topics or encountered similar problems.

Knowing how and with which result web archived materials were used in other institutions will be very valuable if and when the Bodleian Libraries decide to promote their own web archive collections. At the same time, getting in touch with web archiving colleagues in the UK and internationally offers much potential for collaborations in future projects.
For example, the Tomsk State University in Russian is currently trying to establish a web archive similar to the Bodleian Libraries’ Web Archives, whilst research projects run at the Institute for Historical Research of the University of London  as part of the Big UK Domain Data for the Arts and Humanities Project in cooperation with the British Library could be used as examples to promote the scholarly use of the UK (Legal Deposit) Web Archive in Oxford.

Special Collections in the Danish Netarkivet
Special Collections in the Danish Web Archive, which is run by the State and University Library in Aarhus and The Royal Libray in Copenhagen. Since 2005 the collection and preservation of the .dk internet is included in the Danish Legal Deposit Law.

At the end of the conference, everyone was buzzing with enthusiasm and new ideas, and agreed that the event was a great success  – not least to the flawless organisation and wonderful Danish hospitality, which included a reception celebrating the anniversary of the Danish Web Archiv netarkivet.dk, lots of Smørrebrød (delicious Danish open sandwiches) and a memorable conference dinner, all adding to the friendly and sociable character of the event.

A similar conference is now envisaged to be held in 2016 or 2017 in London, an opportunity not to be missed to catch up with the latest in Web Archiving and strengthen old and new – forgive the pun – links!