Tag Archives: #SkillsForTheFuture

What’s it like to be a trainee? Marjolein Platjee, Graduate Trainee Digital Archivist, 2018-2020

Having now worked as a trainee with the Bodleian Libraries for a little over a month, I can honestly say that the job was 100% worth the move from The Netherlands to the UK. As my predecessors have written, the job is incredibly varied, interesting and very rewarding.

Like the majority of my fellow trainees I do not have a background in libraries or archives. However, I do have a background in research and using archives from work on my PhD focussing on British Popular Literature. Whilst writing my dissertation I was also working as an Information and Process coordinator. In this role I managed a number of IT related projects, including the implementation of a knowledge management base. It was this project that got me interested in knowledge and records management. So when I stumbled upon this traineeship with the Bodleian it seemed the ideal combination of my love for technology and research. As it turns out, it is indeed.

Although I have only been working at the Bodleian for a short while I have already been given the opportunity to do and learn so much. I have already been taught how to manipulate XML files, how to archive websites and how to digitize cassette tapes and other media. I am currently also being trained to assist in the reading rooms and I have been assigned my very own cataloguing project. Working on the latter has been especially exciting and surprising, as next to documents and books I have also been cataloguing merchandise which includes such ‘exotic’ items as t-shirts, jackets, corkscrews – they come in five different colours and have even been engraved – and temporary tattoos.

The distance study at Aberystwyth really prepares me for the tasks that I face in my work and therewith helps me to gain a better understanding of my job, its importance and the history behind it. It does take some self-discipline to keep up with the course work whilst working 4.5 days a week, but if you manage your time wisely it really is doable.

I look forward to learning even more over the course of the coming two years, and I am sure I will look back upon my decision to apply for this job as one of the best ones I could have made.

Marjolein Platjee, Nov 2018

Attending the ARA Annual Conference 2018

ARA Annual Conference 2018, Grand Central Hotel, Glasgow

ARA Annual Conference 2018, Grand Central Hotel, Glasgow

Having been awarded the Diversity Bursary for BME individuals, sponsored by Kevin J Bolton Ltd., I was able to attend the ARA Annual Conference 2018 held in Glasgow in August.

Capitalising on the host city’s existing ubiquitous branding of People Make Glasgow,  the Conference Committee set People Make Records as this year’s conference theme. This was then divided into three individual themes, one for each day of the conference:

  • People in Records
  • People Using Records
  • People Looking After Records

Examined through the lens of the above themes over the course of three days,  this year’s conference addressed three keys areas within the sector: representation, diversity and engagement.

Following an introduction from Kevin Bolton (@kevjbolton), the conference kicked off with Professor Gus John (@Gus_John) delivering the opening keynote address, entitled “Choices of the Living and the Dead”. With People Make Records the theme for the day, Professor John gave a powerful talk discussing how people are impacting the records and recordkeeping of African (and other) diaspora in the UK, enabling the airbrushing of the history of oppressed communities. Professor John noted yes people make records, but we also determine what to record, and what to do with it once it has been recorded.

Noting the ignorance surrounding racial prejudice and violence, citing the Notting Hill race riots, the Windrush generation,  and Stephen Lawrence as examples, Professor John illustrated how the commemoration of historical events is selective: while in 2018 the 50th anniversary of the Race Relations Act received much attention, in comparison the 500th anniversary of the start of the Transatlantic Slave Trade was largely ignored, by the sector and the media alike.  This culture of oppression, and omission, he said, is leading to ignorance amongst young people about major defining events, contributing to a removal of context to historically oppressed groups.

In response to questions from the audience, Professor John noted that one of the problems facing the sector is the failure to interrogate the ‘business as usual’ climate, and that it may be ‘too difficult to consider what an alternative route might be’. Professor John challenged us to question the status quo: ‘Why is my curriculum white? Why isn’t my lecturer black? What does “de-colonising” the curriculum mean? This is what we must ask ourselves’.

Following Professor John’s keynote and his ultimate call to action, there was a palpable atmosphere of engagement amongst the delegates, with myself and those around me eager to spend the next three days learning from the experiences of others, listening to new perspectives and extracting guidance on the actions we may take to develop and improve our sector, in terms of representation, diversity and engagement.

Various issues relating to these areas were threaded throughout many of the presentations, and as a person of colour at the start of my career in this sector, and recipient of the Diversity Bursary, I was excited to hear more about the challenges facing marginalised communities in archives and records, including some I could relate to on a personal and professional level, and, hopefully, also take away some proposed solutions and recommendations.

I attended an excellent talk by Adele Patrick (@AdelePAtrickGWL),  of Glasgow Women’s Library, who discussed the place for feminism within the archive, noting GWL’s history in resistance, and insistence on a plural representation, when women’s work, past and present, is eclipsed. Dr Alan Butler (@AButlerArchive), Coordinator at Plymouth LGBT Community Archive, discussed his experiences of trying to create a sense of community within a group that is inherently quite nebulous.  Nevertheless, Butler illustrated the importance of capturing LGBTQIA+ history, as people today are increasingly removed from the struggles that previous generations have had to overcome, echoing a similar point Professor Gus John made earlier.

A presentation which particularly resonated with me came from Kirsty Fife (@DIYarchivist) and Hannah Henthorn (@hanarchovist), on the issue of diversity in the workforce. Fife and Henthorn presented the findings from their research, including their survey of experiences of marginalisation in the UK archive sector, highlighting the structural barriers to diversifying the archive sector workforce. Fife and Henthorn identified several key themes which are experienced  by marginalised communities in the sector, including: the feeling of isolation and otherness in both workplace and universities; difficulties in gaining qualifications, perhaps due to ill health/disability/financial barriers/other commitments; feeling unsafe and under confident in professional spaces and a frustration at the lack of diversity in leadership roles.

As a Graduate Trainee Digital Archivist, I couldn’t abandon my own focus on digital preservation and digital archiving, and as such attended various digital-related talks, including “Machines make records: the future of archival processing” by Jenny Bunn (@JnyBn1), discussing the impact of taking a computational approach to archival processing, “Using digital preservation and access to build a sustainable future for your archive” led by Ann Keen of Preservica, with presentations given by various Preservica users, as well as a mini-workshop led by Sarah Higgins and William Kilbride, on ethics in digital preservation, asking us to consider if we need our own code of conduct in digital preservation, and what this could look like.

Image of William Kilbride and Sarah Higgins running their workshop "Encoding ethics: professional practice digital preservation", ARA Annual Conference 2018, Glasgow

William Kilbride and Sarah Higgins running their workshop “Encoding ethics: professional practice digital preservation”, ARA Annual Conference 2018, Glasgow

I have only been able to touch on a very small amount of what I heard and learnt at the many and varied talks, presentations and workshops at the ARA conference,  however,  one thing I took away from the conference was the realisation that archivists and recordkeepers have the power to challenge structural inequalities, and must act now, in order to become truly inclusive. As Michelle Caswell (@professorcaz), 2nd keynote speaker said, we must act with sensitivity, acknowledge our privileges and, above all empower not marginalise. This conference felt like a call to action to the archive and recordkeeping community, in order to include the ‘hard to reach’ communities, or alternatively as Adele Patrick noted, the ‘easy to ignore’. As William Kilbride (@William Kilbride) said, this is an exciting time to be in archives.

I want to thank Kevin Bolton for sponsoring the Diversity Bursary, which enabled me to attend an enriching, engaging and informative event, which otherwise would have been inaccessible for me.

________________________________________
Because every day is a school day, as homework for us all, I made a note of some of the recommendations made by speakers throughout the conference, compiled into this very brief list which I thought I would share:

Reading list

Higher Education Archive Programme Network Meeting on Research Data Management

On 22nd June 2018 I attended the Higher Education Archive Programme (#HEAP) network meeting on Research Data Management (RDM) at the National Archives at Kew Gardens. This allowed me to learn about some of the current thinking in research data management from colleagues and peers currently working in this area through hearing about their own personal experiences.

The day consisted of a series of talks from presenters with a variety of backgrounds (archivists, managers, PhD students) giving their experiences of RDM from their different perspectives (design/implementation of systems, use). I will aim to briefly summarise the main message from a few of them. This was followed by a question and answers session and concluded with a workshop run by John Kaye from JISC.

Having had very little exposure to RDM in my career, it was a great way for me to understand what it is and what is being done in this sector. I have undertaken quantitative research myself during my PhD and so have an understanding of how research data is created, but until my recent move into the archival profession, I rather foolishly gave little thought as to how this data is managed. Events like this help to make people aware of the challenges archivists, information professionals and researchers face.

What is HEAP?

The Higher Education Archive Programme (#HEAP) is part of The National Archives’ continuing programme of engagement and sector support with particular archival constituencies. It is a mixture of strategic and practical work encompassing activity across The National Archives and the wider sector including guidance and training, pilot projects and advocacy. They also run network meetings for anyone involved in university archives, special collections and libraries with a variety of themes.

What is Research Data Management?

Susan Worrall, from University of Birmingham, started the day by explaining to us, what is research data management and why is it of interest to archivists? Put simply, it is the organisation, structuring, storage, care and use of data generated by research. It is important to archivists as these are all common themes of digital archiving and digital preservation, therefore, it suffers from similar issues, such as:

  • Skills gap in the sector
  • Fear of the unknown
  • Funding issues
  • Training

She presented a case study using a Brain imaging experiment, which highlighted the challenges of consent and managing huge amounts of highly specialised data. There are, however, opportunities for archivists; RDM and digital archiving are two sides of the same coin, digital archivists already do a lot of the RDM processes and so have many transferable skills. Online training is also available, University of Edinburgh and The University of North Carolina at Chapel Hill collaborated to create a course on Coursera.

A Digital Archivist’s Perspective

Jenny Mitcham, from University of York, gave us an insight into RDM from her experience as a digital archivist. She highlighted how RDM requires skills from the Library, Archival and IT sectors. Within a department, you may have all of these skills however the roles and responsibilities are not always clear, which can cause issues. She described a fantastic project called ‘Filling the Digital Preservation Gap’ which explored the potential of archivematica for RDM. It was a finalist in the 2016 Digital Preservation Awards and more information about the project can be found on the blog.

Planning, Designing and Implementing an RDM system

Laurain Williamson, from University of Leicester, spoke about how to plan and implement a research data management service. Firstly, she described the current situation within the university and what the project brief involved. Any large scale project will require a large amount of preparation and planning, however she noted that certain elements, such as considering all viable technical solutions was incredibly time consuming, however, it was essential to get the best fit for the institution. Through interviews and case study’s they analysed the thoughts and wants from a variety of stakeholders. 

Their research community wanted:

  • Expertise
  • Knowledge about copyright/publishing
  • Bespoke advice and a flexible service.

Challenges faced by the RDM team were:

  • To manage expectations (they will never be able to do everything, so they must collaborate and prioritise their resources)
  • Last minute requests from researchers
  • Liaising with researchers at an early stage of the project is vital (helping researchers think about file formats early on to aid the preservation process).

Conclusion

Whilst RDM to a layperson may seem simple at first (save it on the cloud or a hard drive) when you delve into the archival theories of correct digital preservation, this becomes an absurdly simplified view. Managing large amounts of data from such specialised experiments (producing niche file formats) requires a huge amount of knowledge, collaboration and expertise.

(CC BY 4.0) Bryant, Rebecca, Brian Lavoie, and Constance Malpas. 2018. Incentives for Building University RDM Services. The Realities of Research Data Management, Part 3. Dublin, OH: OCLC Research. doi:10.25333/C3S62F.

Data produced by universities can be seen as a commodity. The increase in the scholarly norms for open science and sharing data puts higher emphasis on RDM. It is important for the institutions/individuals creating the data (if there is any potential future scholarly or financial gain) and also for scientific integrity (allowing others in the community to review and confirm the results). But not everyone will want to make it open and actually not all of it has or should be open; creating a system and workflow that accounts for both is vital.

An OCLC research report recently stated ‘It would be a mistake to imagine that there is a single, best model of RDM service capacity, or a simple roadmap to acquiring it’. As with most things in the digital sector, this is a fast moving area and new technologies and theories are continually being developed. It will be exciting to see how these will be implemented in the future.

 

Building collections on Gender Equality at the UK Web Archive

The Bodleian is one of the 6 legal deposit libraries in the UK. One of my projects this year as a graduate trainee digital archivist on the Bodleian Libraries’ Developing the Next Generation Archivist programme is to help curate special collections in the UK Web Archive. Since May I’ve been working on the Gender Equality collection. Please note, this post also appears on the British Library UK Web Archive blog.

Why are we collecting?

2018 is the centenary of the 1918 Representation of the People’s Act. UK-wide memorials and celebrations of this journey, and victory of women’s suffrage, are all evident online: from events, exhibitions, commemorations and campaigns. Popular topics being discussed at the moment include the hashtags #timesup and #metoo, gender pay disparity and the recent referendum on the 8th Amendment in the Republic of Ireland. These discussions produce a lot of ephemeral material, and without web archiving this material is at risk of moving or even disappearing. As we can see gender equality is being discussed a lot currently in the media, these discussions have been developing over years.

Through the UK Web Archive SHINE interface we can see that matching text for the phrase ‘gender equality’ increased from a result of 0.002% (24 out of 843,204) of crawled resources in 1996, to 0.044% (23,289 out of 53,146,359) in 2013.

SHINE user interface

If we search UK web content relating to gender equality we will generate so many results; for example, organisations have published their gender pay discrepancy reports online and there is much to engage with from social media accounts of both individuals and organisations relating to campaigning for gender equality. It becomes apparent that when we browse this web content gender equality means something different for so many presences online: charities, societies, employers, authorities, heritage centres and individuals such as social entrepreneurs, teachers, researchers and more.

The Fawcett Society: https://www.fawcettsociety.org.uk/blog/why-does-teaching-votes-for-women-matter-an-a-level-teachers-perspective

What we are collecting?

The Gender Equality special collection, that is now live on the UK Web Archive comprises material which provides a snapshot into attitudes towards gender equality in the UK. Web material is harvested under the areas of:

  • Bodily autonomy
  • Domestic abuse/Gender based violence
  • Gender equality in the workplace
  • Gender identity
  • Parenting
  • The gender pay gap
  • Women’s suffrage

100 years on from women’s suffrage the fight for gender equality continues. The collection is still undergoing curation and growing in archival records – and you can help too!

How to get involved?

If there are any UK websites that you think should be added to the Gender Equality collection then you can take up the UK Web Archive’s call for action and nominate.

 

 

The UK Web Archive: Online Enthusiast Communities in the UK

The beginnings of the Online Enthusiast collection of the UK Web Archive can be traced back to November 2016 and a task to scope out the viability and write a proposal for two potential special collections with a focus on current web use: Mental Health, and Online Enthusiasts.

The Online Enthusiasts special collection was intended to show how people within the UK are using the internet to aid them in practising their hobbies, for example discussing their collections of objects or coordinating their bus spotting. If it was something a person could enthuse about and it was on the internet within the UK then it was in scope. Where many UK Web Archive Special Collections are centred on a specific event and online reactions, this was more an attempt to represent the way in which people are using the internet on an everyday basis.

The first step toward a proposal was to assess the viability of the collection, and this meant searching out any potential online enthusiast sites to judge whether this collection would have enough content hosted within the UK to validate its existence. As it turns out, UK hobbyists are very active in their online communities and finding enough content was, if anything, the opposite of an issue. Difficulty came with trying to accurately represent the sheer scope of content available – it’s difficult to google something that you weren’t aware existed 5 minutes ago. After an afternoon among the forums and blogs of ferry spotters, stamp collectors, homebrewers, yarn-bombers, coffee enthusiasts and postbox seekers, there was enough proof of content to complete the initial proposal stating that a collection displaying the myriad uses hobbyists in the UK have for the internet is not only viable but also worthwhile. Eventually that proposal was accepted and the Online Enthusiast collection was born.

The UKWA Online Enthusiast Communities in the UK collection provides a unique cultural insight into how communities interact in digital spheres. It shows that with the power of the internet people with similar unique hobbies and interests can connect and share and enthuse about their favourite hobbies. Many of these communities grow and shrink at rapid paces and therefore many years of content can be lost if a website is no longer hosted.

With the amount of content on the internet, finding websites had a domino effect, where one site would link to another site for a similar enthusiast community, or we would find lists including hobbies we’d never even considered before. This meant that before long we had a wealth of content that we realised would need categorising. Our main approach to categorising the content was along thematic lines. After identifying what we were dealing with, we created a number of sub-collections, examples of which include: Animal related hobbies, collecting focused hobbies, observation hobbies, and sports.

The approach to selecting content for the collection was mainly focused around identifying UK-centric hobbies and using various search terms to identify active communities. The majority of these communities were forums. These forums provided enthusiasts with a platform to discuss various topics related to their hobbies whilst also providing the opportunity for them to share other forms of media such as video, audio and photographic content. Other platforms such as blogs and other websites were also collected, the blogs often focused on submitting content to the blog owner who would then filter and post related content to the community.

As of May 2018 the collection has over 300 archived websites. We found that the most filled categories for hobbies were Sports, collecting and animal related hobbies.

A few examples of websites related to hobbies that were new to us include:

  • UK Pidgeon Racing Forum: An online enthusiast forum concerned with pigeon racing.
  • Fighting Robots Association Forum: An online enthusiast forum for those involved with the creation of fighting robots.
  • Wetherspoon’s Carpets (Tumblr): A Tumblr blog concerned with taking photographs of the unique carpets inside the Wetherspoon’s chain of pubs across the UK.
  • Mine Exploration and History Forum: An online enthusiast community concerned with mine exploration in the UK.
  • Chinese Scooter Club Forum: An online enthusiast community concerned with all things related to Chinese scooters.
  • Knit The City (now Whodunnknit): A website belonging to a graffiti-knitter/yarnbomber from the UK

The Online Enthusiast Communities in the UK collection is accessible via the UK Web Archive’s new beta interface

The UK Web Archive: Mental Health, Social Media and the Internet Collection

The UK Web Archive hosts several Special Collections, curating material related to a particular theme or subject. One such collection is on Mental Health, Social Media and the Internet.

Since the advent of Web 2.0, people have been using the Internet as a platform to engage and connect, amongst other things, resulting in new forms of communication, and consequently new environments to adapt to – such as social media networks. This collection aims to illustrate how this has affected the UK, in terms of the impact on mental health. This collection will reflect the current attitudes displayed online within the UK towards mental health, and how the Internet and social media are being used in contemporary society.

We began curating material in June 2017, archiving various types of web content, including: research, news pieces, UK based social media initiatives and campaigns, charities and organisations’ websites, blogs and forums.

Material is being collected around several themes, including:

Body Image
Over the past few years, there has been a move towards using social media to discuss body image and mental health. This part of the collection curates material relating to how the Internet and social media affect mental health issues relating to body image. This includes research about developing theory in this area, news articles on various individuals experiences, as well as various material posted on social media accounts discussing this theme.

Cyber-bullying
This theme curates material, such as charities and organisations’ websites and social media accounts, which discuss, raise awareness and tackle this issue. Furthermore, material which examines the impact of social media and Internet use on bullying such as news articles, social media campaigns and blog posts, as well as online resources created to aid with this issue, such as guides and advice, are also collected.

Addiction

This theme collects material around gaming and other  Internet-based activities that may become addictive such as social media, pornography and gambling. It includes recent UK based research, studies and online polls, social media campaigns, online resources, blogs and news articles from individuals and organisations. Discourse, discussions, opinion and actions regarding different aspects of Internet addition are all captured and collected in this overarching catchment term of addiction, including social media addiction.

The Mental Health, Social Media and the Internet Special Collection, is available via the new UK Web Archive Beta Interface!

Co authored with Carl Cooper

New Catalogue: The Archive of Hilary Bailey

The catalogue of the archive of Hilary Bailey is now available online here.

Hilary Bailey (1936 – 2017), was a writer and editor whose career spanned many decades and genres. Her early output largely focussed on science fiction, with many of her short stories, including The Fall of Frenchy Steiner (1964), published in the science fiction publication New Worlds during the 1960s, and during this time she also co-authored The Black Corridor (1969) with her then husband, the science fiction writer Michael Moorcock; Bailey served as editor of New Worlds from 1974 to 1976 .

Her social circle contained a number of science fiction writers who were fellow contributors to New Worlds, including Graham Hall, another science fiction writer and editor of New Worlds whose papers are also included within the archive.

Hilary Bailey’s post-New Worlds output tended not to fall under the genre of science fiction. Her first solo full length novel, Polly Put The Kettle On (1975), was the first Polly Kops novel she wrote, and the character would later feature in Mrs Mulvaney (1978) and As Time Goes By (1988) – novels focussing on a woman in London through the 1960s to the 1980s.

Indeed, much of Bailey’s work had a focus on women, including her retellings and sequels of classic novels – including Frankenstein’s Bride (1995) – an alternate telling wherein Victor Frankenstein agrees to build the monster a wife rather than spurning the suggestion and Mrs Rochester (1997), which imagines Jane Eyre’s life a number of years  into her marriage to Edward Rochester. Women were also the focus of her historical fiction novel, The Cry From Street To Street (1992), which imagined the life of a victim of Jack the Ripper and Cassandra (1993), a retelling of the fall of Troy. She also authored a biography on Vera Brittain.

Draft artwork for the book jacket of As Time Goes By (1988)

Her most recent work ranged from the speculative fiction Fifty-first State (2008), a novel set in the then near-future of 2013, looking at politics within the United Kingdom, to imagining Sherlock Holmes’ sister in The Strange Adventures of Charlotte Holmes (2012).

The archive comprises a large amount of correspondence both personal, with family, friends and other writers and professional, with publishers and literary agents, as well as artwork for book jackets, early draft manuscripts for novels and assorted miscellanea.

Bailey’s archive also includes a small series at the end consisting of correspondence and draft writings belonging to Graham Hall (1947-1980), a friend of Bailey’s and fellow New Worlds contributor, editor, science fiction writer and general science fiction enthusiast. As Hall’s writing career was cut short by his death in 1980, aged just 32, his name is perhaps not as easily recognisable as those of his correspondents. His correspondence contains interesting information regarding science fiction enthusiasts in the 1960s, from Hall’s early involvement with fanzines and hopes to compile bibliographies for the work of more well-known science fiction writers, to his involvement with the scene and time as editor of New Worlds. Hall’s illness and death are chronicled in Michael Moorcock’s novel, Letters from Hollywood (1986).

DPC Email Preservation: How Hard Can It Be? Part 2

Source: https://lu2cspjiis-flywheel.netdna-ssl.com/wp-content/uploads/2015/09/email-marketing.jpg

In July last year my colleague Miten and I attended a DPC Briefing Day titled Email Preservation: How Hard Can It Be?  which introduced me to the work of the Task Force on Technical Approaches to Email Archives  and we were lucky enough to attend the second session last week.

Arranging a second session gave Chris Prom (@chrisprom), University of Illinois at Urbana-Champaign and Kate Murray (@fileformatology), Library of Congress, co-chair’s of the Task Force the opportunity to reflect upon and add the issues raised from the first session to the Task Force Report, and provided the event attendees with an update on their progress overall, in anticipation of their final report scheduled to be published some time in April.

“Using Email Archives in Research”

The first guest presentation was given by Dr. James Baker (@j_w_baker), University of Sussex, who was inspired to write about the use of email archives within research by two key texts; Born-digital archives at the Wellcome Library: appraisal and sensitivity review of two hard drives (2016), an article by Victoria Sloyan, and Dust (2001) a book by Carolyn Steedman.

These texts led Dr. Baker to think of the “imagination of the archive” as he put it, the mystique of archival research, stemming from the imagery of  19th century research processes. He expanded on this idea, stating “physically and ontologically unique; the manuscript, is no longer what we imagine to be an archive”.

However, despite this new platform for research, Dr. Baker stated that very few people outside of archive professionals know that born-digital archives exist, yet alone use them. This is an issue, as archives require evidence of use, therefore, we need to encourage use.

To address this, Dr. Baker set up a Born-Digital Access Workshop, at the Wellcome Library in collaboration with their Collections Information Team, where he gathered people who use born-digital archives and the archivists who make them, and provided them with a set of 4 varying case-studies. These 4 case-studies were designed to explore the following:

A) the “original” environment; hard drive files in a Windows OS
B) the view experience; using the Wellcome’s Viewer
C) levels of curation; comparing reformatted and renamed collections with unaltered ones
D) the physical media; asking does the media hold value?

Several interesting observations came out of this workshop, which Dr. Baker organised in to three areas:

  1. Levels of description; filenames are important, and are valuable data in themselves to researchers. Users need a balance between curation and an authentic representation of the original order.
  2. “Bog-standard” laptop as access point; using modern technology that is already used by many researchers as the mode of access to email and digital archives creates a sense of familiarity when engaging with the content.
  3. Getting the researcher from desk to archive; there is a substantial amount of work needed to make the researcher aware of the resources available to them and how – can they remote access, how much collection level description is necessary?

Dr. Baker concluded that even with outreach and awareness events such as the one we were all attending, born-digital archives are not yet accessible to researchers, and this has made me realise the digital preservation community must push for access solutions,  and get these out to users, to enable researchers to gain the insights they might from our digital collections.

“Email as a Corporate Record”

The third presentation of the day was given by James Lappin (@JamesLappin), Loughborough University, who discussed the issues involved in applying archival policies to emails in a governmental context.

His main point concerned the routine deletion of email that happens in governments around the world. He said there are no civil servants email accounts scheduled to be saved past the next 3 – 4 years – but, they may be available via a different structure; a kind of records management system. However, Lappin pointed out the crux in this scenario: government departments have no budget to move and save many individuals email accounts, and no real idea of the numerics: how much to save, how much can be saved?

“email is the record of our age” – James Lappin

Lappin suggested an alternative: keep the emails of the senior staff only, however, this begs the questions, how do we filter out sensitive and personal content?

Lappin posits that auto-deletion is the solution, aiming to spare institutions from unmanageable volumes of email and the consequential breach of data protection.
Autodeletion encourages:

  •  governments to kickstart email preservation action,
  • the integration of tech for records management solutions,
  • actively considering the value of emails for long-term preservation

But how do we transfer emails to a EDRMS, what structures do we use, how do we separate individuals, how do we enforce the transfer of emails? These issues are to be worked out, and can be, Lappin argues, if we implement auto-deletion as tool to make email preservation less daunting , as at the end of the day, the current goal is to retain the “important” emails, which will make both government departments and historians happy, and in turn, this makes archivists happy. This does indeed seem like a positive scenario for us all!

However, it was particularly interesting when Lappin made his next point: what if the very nature of email, as intimate and immediate, makes governments uncomfortable with the idea of saving and preserving governmental correspondence? Therefore, governments must be more active in their selection processes, and save something, rather than nothing – which is where the implementation of auto-deletion, could, again, prove useful!

To conclude, Lappin presented a list of characteristics which could justify the preservation of an individuals government email accounts, which included:

  • The role they play is of historic interest
  • They expect their account to be permanently preserved
  • They are given the chance to flag or remove personal correspondence
  • Access to personal correspondence is prevented except in case of overriding legal need

I, personally, feel this fair and thorough, but only time will tell what route various governments take.

On a side note: Lappin runs an excellent comic-based blog on Records Management which you can see here.

Conclusions
One of the key issues that stood out for me today was, maybe surprisingly, not to do with the technology used in email preservation, but how to address the myriad issues email preservation brings to light, namely the feasibility of data protection, sensitivity review and appraisal, particularly prevalent when dealing in such vast quantities of material.

Email can only be preserved once we have defined what constitutes ’email’ and how to proceed ethically, morally and legally. Then, we can move forward with the implementation of the technical frameworks, which have been designed to meet our pre-defined requirements, that will enable access to historically valuable, and information rich, email archives, that will yield much in the name of research.

In the tweet below, Evil Archivist succinctly reminds us of the importance of maintaining and managing our digital records…

Email Preservation: How Hard Can It Be? 2 – DPC Briefing Day

On Wednesday 23rd of January I attended the Digital Preservation Coalition briefing day titled ‘Email Preservation: How Hard Can It Be? 2’ with my colleague Iram. As I attended the first briefing day back in July 2017 it was a great opportunity to see what advances and changes had been achieved. This blog post will briefly highlight what I found particularly thought provoking and focus on two of the talks about e-discovery from a lawyers view point.

The day began with an introduction by the co-chair of the report, Chris Prom (@chrisprom), informing us of the work that the task force had been doing. This was followed by a variety of talks about the use of email archives and some of the technologies used for the large scale processing  from the perspective of researchers and lawyers. The day was concluded with a panel discussion (for a twist, we the audience were the panel) about the pending report and the next steps.

Update on Task Force on Technical Approaches to Email Archives Report

Chris Prom told us how the report had taken on the comments from the previous briefing day and also from consultation with many other people and organisations. This led to clearer and more concise messages. The report itself does not aim to provide hard rules but to give an overview of the current situation and some recommendations that people or organisations involved with, interested in or are considering email preservation can consider.

Reconstruction of Narrative in e-Discovery Investigations and The Future of Email Archiving: Four Propositions

Simon Attfield (Middlesex university) and Larry Chapin (attorney) spoke about narrative and e-discovery. It was a fascinating insight into a lawyers requirements for use of email archives. Larry used the LIBOR scandal as an example of a project he worked on and the power of emails in bringing people to justice. E-discovery from his perspective was its importance to help create a narrative and tell a story, something at the moment a computer cannot do. Emails ‘capture the stuff of story making’ as they have the ability to reach into the crevasses of things and detail the small. He noted how emails contain slang and interestingly the language of intention and desire. These subtleties show the true meaning of what people are saying and that is important in the quest for the truth. Simon Attfield presented his research on the coding aspect to aid lawyers in assessing and sorting through these vast data sets. The work he described here was too technical for me to truly understand however it was clear that collaboration between archivist, users and the programmers/researchers will be vital for better preservation and use strategies.

Jason Baron (@JasonRBaron1) (attorney) gave a talk on the future of email archiving detailing four propositions.

Slide detailing the four propositions for the future of email archives. By Jason R Baron 2018

The general conclusions from this talk was that automation and technology will be playing an even bigger part in the future to help with acquisition, review (filtering out sensitive material) and searching (aiding access to larger collections). As one of the leads of the Capstone project, he told us how that particular project saves all emails for a short time and some forever, removing the misconceptions that all emails are going to be saved forever. Analysis of how successful Capstone has been in reducing signal to noise ratio (so only capturing email records of permanent value) will be important going forward.

The problem of scale, which permeates into most aspects of digital preservation, again arose here. For lawyers, they must review any and all information, which when looking at emails accounts can be colossal. The analogy that was given was of finding a needle in a haystack – lawyers need to find ALL the needles (100% recall).

Current predictive coding for discovery requires human assistance. Users have to tell the program whether the recommendations it produced were correct, the program will learn from this process and hopefully become more accurate. Whilst a program can efficiently and effectively sort personal information such as telephone numbers, date of birth etc it cannot currently sort out textual content that required prior knowledge and non-textual content such as images.

Panel Discussion and Future Direction

The final report is due to be published around May 2018. Email is a complex digital object and the solution to its preservation and archiving will be complex also.

The technical aspects of physically preserving emails are available but we still need to address the effective review and selection of the emails to be made available to the researcher. The tools currently available are not accurate enough for large scale processing, however, as artificial intelligence becomes better and more advanced, it appears this technology will be part of the solution.

Tim Gollins (@timgollins) gave a great overview of the current use of technology within this context, and stressed the point that the current technology is here to ASSIST humans. The tools for selection, appraisal and review need to be tailored for each process and quality test data is needed to train the programs effectively.

The non technical aspects further add to the complexity, and might be more difficult to address, as a community we need to find answers to:

  • Who’s email to capture (particularly interesting when an email account is linked to a position rather than a person)
  • How much to capture (entire accounts such as in the case of Capstone or allowing the user to choose what is worthy of preservation)
  • How to get persons of interest engaged (effectiveness of tools that aid the process e.g. drag and drop into record management systems or integrated preservation tools)
  • Legal implications
  • How to best present the emails for scholarly research (bespoke software such as ePADD or emulation tools that recreate the original environment or a system that a user is familiar with) 

Like most things in the digital sector, this is a fast moving area with ever changing technologies and trends. It might be frustrating there is no hard guidance on email preservation, when the Task Force on Technical Approaches to Email Archives report is published it will be an invaluable resource and a must read for anyone with an interest or actively involved in email preservation. The takeaway message was, and still is, that emails matter!    

Web-Archiving: A Short Guide to Proxy Mode

Defining Proxy Mode:

Proxy Mode is an ‘offline browsing’ mode  which provides an intuitive way of checking the quality and comprehensiveness of any web-archived content captured. Proxy Mode enables you to view documents within an Archive-It collection and ascertain which page elements have been captured effectively and which are still being ‘pulled’ from the live site.

Why Use Proxy Mode?

Carrying out QA (Quality Assurance) without proxy mode could lead to a sense of false reassurance about the data that has been captured, since some page elements displayed may actually present those being taken from the live site as opposed to a desired archival capture. Proxy Mode should therefore be employed as part of the standard QA process since it prevents these live-site redirects from occurring and provides a true account of the data captured.

Using Proxy Mode:

Proxy Mode is easy to setup and involves simply downloading an add-on that can be accessed here. There is also an option to setup Proxy Mode manually in Firefox or Chrome.

Potential Issues and Solutions:

Whilst using Proxy Mode a couple of members of the BLWA team (myself included) had issues viewing certain URLs in Proxy Mode often receiving  a ‘server not found’ error message.  After corresponding with Archive-It I discovered that Proxy Mode often has trouble loading https URLs. With this in mind I loaded the same URL but this time removed the ‘s’ from https and reloaded the page. Once Proxy Mode had been enabled this seemed to rectify the issue.

There was one particular instance however where this fix didn’t work and the same ‘server not found’ error message returned, much to my dismay! Browsers can sometimes save a specific version of the URL as the preferred version and will direct to it automatically. I discovered it was just a case of clearing the browser’s: cache, cookies, offline website data and site preferences. Once this had been done I was able to load the site once again using Proxy Mode #bigachievements.