Category Archives: Event

PASIG 2017: Smartphones within the changing landscape of digital preservation

I recently volunteered at the PASIG 2017 Conference in Oxford, it was a great experience to learn more about the archives sector. Many of the talks at the conference focused on the current trends and influences affecting the trajectory of the industry.

A presentation that covered some of these trends in detail was a talk by Somaya Langley from Cambridge University Library (Polonsky Digital Preservation Project), her talk was featured in the ‘Future of DP theory and practice’ session. ‘Realistic digital preservation in the near future: How do we get from A to Z when B already seems too far away?’. Somaya’s presentation considered how we preserve the digital content we receive from donors on smartphones, with her focus being on iOS.

Langley, Somaya (2017): Realistic digital preservation in the near future: How to get from A to Z when B seems too far away?. figshare. https://doi.org/10.6084/m9.figshare.5418685.v1 Retrieved: 08:22, Sep 22, 2017 (GMT)

Somaya’s presentation discussed how in the field of digital preservation ingest suites have  long been used to dealing with CDs, DVDs, Floppys and HDDs. However, are not sufficiently prepared for ingesting smartphones or tablets, and the various issues that are associated with these devices. We must realise that smartphones potentially hold a wealth of information for archives:

‘With the design of the Apple Operation System (iOS) and the large amount of storage space available, records of emails, text messages, browsing history, chat, map searching, and more are all being kept’.

(Forensic Analysis on iOS Devices,  Tim Proffitt, 2012. https://uk.sans.org/reading-room/whitepapers/forensics/forensic-analysis-ios-devices-34092 )

Why iOS? What about Android?

The UK market for the iPhone (unlike the rest of Europe) shows a much closer split: iOS November 2016 Sales 48.3% versus Android 49.6% market share in the UK. This  is contrasted against the global market share that Apple have of 12.1% in Q3 of 2016.

Whatever side of the fence you stand on it is clear that smartphones in digital curation, be they Android or iOS, will both play an important role in our collections. The skills required to extract content differs across platforms, we as digital archivists will have to learn both methods of extraction and leave our consumer preferences at the door.

So how do we get the data off the iPhone?

iOS has long been known as a ‘locked-down’ operating system, and Apple have always had an anti-tinkering stance with many of their products. Therefore it should come as no surprise that locating files on an iPhone is not very straightforward.

As Somaya pointed out in her talk, after spending six hours in the Apple Shop ‘Genius Bar’ she was no closer to understanding from Apple employees what the best course of action would be to locate backups of notes from a ‘bricked’ iPhone. Therefore she used her own method of retrieving the notes, using iExplorer to search through the backups from the iPhone.

She noted however that due to limitations of iOS it was very challenging to locate these files, in some cases it even required command line to access the location for storage backups as they were hidden by default in OSX (MacOS the main operating system used by Apple Computers).

Many tools do exist for the purpose of extracting information from iPhones, the four main methods outlined in the The SANS Institute White Paper on Forensic Analysis on iOS Devices by Tim Proffitt:

  1. Acquisition via iTunes Backups (requires original PC last used to sync the iPhone)
  2. Acquiring Backup Data with iPhone Analyzer (free java-based computer program, issues exist when dealing with encrypted backups)
  3. Acquisition via Logical Methods: (uses a synchronisation method built into iOS to recover data, e.g: programs like iPhone Explorer)
  4. Acquisition via Physical Methods (obtaining a bit-by-bit copy, e.g: Lantern 2 forensics suite)

Encryption is a challenge for retrieving data off the iPhone, especially since iTunes includes an encryption of backups feature when syncing. Proffitt suggests using a password cracker or jail-breaking as solutions to this issue, however, these solutions might not be fully compatible with our archive situations.

Another issue with smartphone digital preservation is platform and version locking. Just because the above methods work for data extraction at the moment it is very possible that future versions of iOS could make then defunct, requiring software developers to consistently update their programs or look for new approaches.

Langley, Somaya (2017): Realistic digital preservation in the near future: How to get from A to Z when B seems too far away?. figshare. https://doi.org/10.6084/m9.figshare.5418685.v1 Retrieved: 08:22, Sep 22, 2017 (GMT)

Final thoughts

One final consideration that can be raised from Somaya’s talk is that of privacy. As with the arrival of computers into our archives, phones will pose similar moral questions for archivists:

Do we ascribe different values to information stored on smartphones?
Do we consider the material stored on phones more personal than data stored on our computers?

As mentioned previously, our phones store everything from emails, geo-tagged photos, phone call information, and now with the growing popularity of smart wearable-technology, health data (including user heart-rate, daily activity, weight etc.) We as digital archivists will be dealing with very sensitive personal information and need to be prepared to understand the responsibility to safeguard it appropriately.

There is no doubt that soon enough we in the archive field will be receiving more and more smartphones and tablets into our archives from donors. Hopefully talks like Somaya’s will start the ball rolling towards the creation of better standards and approaches to smartphone digital curation.

Email Preservation: How Hard Can it Be? DPC Briefing Day

On Thursday 6th July 2017 I attended the Digital Preservation Coalition briefing day in partnership with the Andrew W. Mellon Foundation on email preservation titled ‘Email preservation: how hard can it be?’. It was hosted at The National archives (TNA), this was my first visit to TNA and it was fantastic. I didn’t know a great deal about email preservation prior to this and so I was really looking forward to learning about this topic.

The National Archives, Photograph by Mike Peel (www.mikepeel.net)., CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=9786613

The aim of the day was to engage in discussion about some of the current tools, technologies and thoughts on email preservation. It was orientated around the ‘Task Force on Technical Approaches to Email Preservation’ report that is currently in its draft phase. We also got to hear about interesting case studies from the British library, TNA and Preservica, each presenting their own unique experiences in relation to this topic. It was a great opportunity to learn about this area and hear from the co-chairs (Kate Murray and Christopher Prom) and the audience about their thoughts on the current situation and possible future directions.

We heard from Jonathan Pledge from British library (BL). He told us about the forensic capture expertise gained by the BL and using EnCase to capture email data from hard drives, CD’s and USB’s. We also got an insight into how they are deciding which email archive tool to use. Aid4mail fits better with their work flow however ePADD with its holistic approach was something they were considering. During their ingest they separate the emails from the attachments. They found that after the time consuming process of removing emails that would violate the data protection laws, there was very little usable content left, as often, entire threads would have to be redacted due to one message. This is not the most effective use of an archivist time and is something they are working to address.

We also heard from Anthea Seles who works with government collections at TNA. We learnt that from their research, they discovered that approximately 1TB of data in an organisations own electronic document and records management system is linked to 10TB of related data in shared drives. Her focus was on discovery and data analytics. For example, a way to increase efficiency and focus the attention of the curator on was to batch email. If an email was sent from TNA to a vast number of people, then there is a high chance that the content does not contain sensitive information. However, if it was sent to a high profile individual, then there is a higher chance that it will contain sensitive information, so the curator can focus their attention on those messages.

Hearing from Preservica was interesting as it gave an insight into the commercial side of email archiving. In their view, preservation was not an issue. For them, their attention was focused on addressing issues such as identifying duplicates/unwanted emails efficiently. Developing tools for performing whole collection email analysis and, interestingly, how to solve the problem of acquiring emails via a continuous transfer.

Emails are not going to be the main form of communication forever (the rise in the popularity of instant messaging is clear to see) however we learnt that we are still expecting growth in its use for the near future.

One of the main issues that was bought up was the potential size of future email archives and the issue that come with effective and efficient appraisal. What is large in academic terms, e.g. 100 000 emails, is not in government. The figure of over 200 million emails at the George W. Bush presidential library is a phenomenal amount and the Obama administrations is estimated at 300 million. This requires smart solutions and we learnt how the use of artificial intelligence and machine learning could help.

Continuous active learning was highlighted to improve searches. An example of searching for Miami dolphins was given. The Miami Dolphins are an American football team however someone might so be looking for information about dolphins in Miami. Initially the computer would present different search results and the user would choose which the more relevant result is, over time it will learn what it is the user is looking for in cases where searches can be ambiguous.

Another issue that was highlighted was, how do you make sure that you have searched the correct person? How do you avoid false positives? At TNA the ‘Traces Through Time’ project aimed to do that, initially with World War One records. This technology, using big data analytics can be used with email archives. There is also work on mining the email signature as a way to better determine ownership of the message.

User experience was also discussed. Emulation is an area of particular interest. The positive of this is that it recreates how the original user would have experienced the emails. However this technology is still being developed. Bit level preservation is a solution to make sure we capture and preserve the data now. This prevents loss of the archive and allows the information and value to be extracted in the future once the tools have been developed.

It was interesting to hear how policy could affect how easy it would be to acquire email archives. The new General Data Protection Regulation that will come into effect in May 2018 will mean anyone in breach of this will suffer worse penalties, up to 4% of annual worldwide turnover. This means that companies may air on the side of caution with regards to keeping personal data such as emails.

Whilst the email protocols are well standardised, allowing emails to be sent from one client to another (e.g. AOL account from early 1990’s to Gmail of now) the acquisition of them are not. When archivists get hold of email archives, they are left with the remnants of whatever the email client/user has done to it. This means metadata may have been added or removed and formats can vary. This adds a further level of complexity to the whole process

The day was thoroughly enjoyable. It was a fantastic way to learn about archiving emails. As emails are now one of the main methods of communication, for government, large organisations and personal use, it is important that we develop the tools, techniques and policies for email preservation. To answer the question ‘how hard can it be?’ I’d say very. Emails are not simple objects of text, they are highly complex entities comprising of attachments, links and embedded content. The solution will be complex but there is a great community of researchers, individuals, libraries and commercial entities working on solving this problem. I look forward to hearing the update in January 2018 when the task force is due to meet again.

iPRES 2016

Last month, I attended the 13th International Conference on Digital Preservation, this year hosted in Bern, Switzerland. The four days of papers, panels, posters and workshops were an intensive and exciting opportunity to meet with colleagues working in digital preservation around the world, share ideas, and hear about innovative projects and approaches. The topics ranged widely from technical systems and practices, to quality and risk assessment, and stewardship and sustainability. What follows are just a couple of highlights from a really fascinating week.

Networking wall

The post-it note networking wall: What do you know? What do you want to know?

Net-based and digital art

As email, digital documents and social media replace traditional forms of communication, it is crucial to be able to preserve born-digital material and make it accessible. An area which I hadn’t previously considered was the realm of net-based art. Here, the internet is used as an artistic medium, which of course has implications (and complications) for digital preservation.

In her key-note speech, Sabine Himmelsbach from the House of Electronic Arts in Basel, introduced us to this exciting field, showing artwork such as Olia Lialina’s ‘Summer’, 2013, shown below.

Summer, by Olia Lialina

Screenshot of Summer, Olia Lialina, 2013. Available at https://www.youtube.com/watch?v=SxvHoXdC4Uk

The artwork features an animated loop of Lialina swinging from the browser bar. Each frame is hosted by a different website, and the playback therefore depends on your connection speed. This creative use of technology creates enormous challenges for preservation. Here, rather than preserving artefacts, it is the preservation of behaviours which is crucial, and these behaviours are extremely vulnerable to obsolescence.

Marc Lee’s ‘TV Bot’ is another net-based artwork, which is automated to broadcast current news stories with live TV streams, radio streams and webcam images from around the world. Reliant on technical infrastructure in this way, the shift from Real Player to Adobe Flash Player was one such development which prevented ‘TV Bot’ from functioning. The artist then not only worked on technical migration, but re-interpreted the artwork, modernising the look and feel, resulting in ‘TV Bot 2.0’ in 2010. This process soon happened again, this time including a twitter stream, in ‘TV Bot 3.0’, 2016. In this way, the artist is working against cultural, as well as technical obsolescence.

Marc Lee, 'TV Bot 2.0', 2010. Image from http://ceaac.org/en/artistes/marc-lee

Marc Lee, ‘TV Bot 2.0’, 2010. Image from http://ceaac.org/en/artistes/marc-lee

The heavy involvement from the artist in this case has helped preserve the artwork, but this process cannot be sustained indefinitely. Himmelsbach ended her speech by stressing the need for collaboration and dialogue, which emerged as a central theme of the conference.

A new approach to web archiving

Another highlight was the workshop on Webrecorder lead by Dragan Espenschied from Rhizome. He introduced their new tool which departs from the usual crawling method to capture web content ‘symmetrically’, which results in incredibly high-fidelity captures. The demonstration of how the tool can capture dynamic and interactive content sparked gasps of amazement from the group!

Webrecorder not only captures social media, embedded video and complex javascript (often tricky with current tools), but can actually capture the essence of an individual’s interaction with the web-content.

How it works: Webrecorder records all the content you interact with during the recording session. Users are then able to interact with the content themselves, but anything that was not viewed during the recording session will not be available to them.

Current web archiving strategies aren’t able to capture the personalised nature of web use. How to use this functionality is still a big question, as a web recording in this way would be personal to the web archivist: showing what they decided to explore, unless a systematic approach was designed by an institution. This itself would be very resource-intensive, and is arguably not where the potential of Webrecorder lies: the ability to capture dynamic content, such as net-based artworks. However, the possibility of preserving not only web content, but our interaction with it, is a very exciting development.iPRES 2016 balloon

iPRES 2016 was a fantastic opportunity to gain insight into projects happening around the world to further digital preservation. It showed me that often there are no clear answers to ‘which file format is best for that?’ or ‘how do I preserve this?’ and that seeking advice from others, and experimenting, is often the way forward. What was really clear from attending was that the strength and support of the community is the most valuable digital preservation tool available.

 

Event: Women in Science in the Archives, 8 September 2016

As part of the FitzGerald cataloguing project, we are organising an event around women in science in the archives, to take place on Thursday 8 September, at the Weston Library (Lecture Theatre) from 9.00am to 1.00 pm.

The half-day seminar will look at women’s engagement with science in the past through the Bodleian’s historical archives, trace the changing nature of their role, discuss the experiences of female scientists in the 21st century, and explore the challenges of preserving their archives in the future.

WiS image 10

Women in science, 1780-2016

Continue reading

Web Archiving Micro-internship – Part 2

On 14 and 15 March eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the second of two guest blog posts on the micro-internship.

The most central aspect of modern life is now the proliferation of digital technology. Since the 1990s, it has become a central mode of communication which is often taken for granted. At the start of this micro-internship, we were introduced to the concept of the digital ‘black hole’, a term used to describe the irrevocable loss of this information. Unlike physical correspondence and materials–the letters, writs, and manuscripts of earlier centuries–so much of what we write is fragile and evanescent. To stem the loss of this digital history, we were shown how the Bodleian Libraries and other legal deposit libraries use domain crawls to capture online content at pre-determined intervals using the W3ACT tool. This then preserves a screen grab of the website on the Internet Archive, namely the waybackmachine, before the website is updated.

Web archiving micro-interns working in the Centre for Digital Scholarship, Weston Libary, March 2016.

Web archiving micro-interns working in the Centre for Digital Scholarship, Weston Libary, March 2016.

The right to a copy of electronic and other non-print publications, such as e-journals and CD-ROMs by legal deposit libraries only came into existence on 6th April 2013. This meant that libraries were able to create an archive of all websites with domains based in the United Kingdom. The recent ‘right to be forgotten’ law adopted by the EU is a signal of the fact that the legal status of digital archives is nevertheless becoming increasingly complicated, particularly when compiling archives of events receiving international commentary, like the upcoming EU referendum. Each of us focused on a different aspect of the EU referendum, reflecting our individual interests, ranging from national newspapers and student newspapers to the blogs of Scottish MSPs, Welsh AMs, and MEPs, and the blogs of solicitors and legal firms’ websites offering advice to businesses and refugees in the event of a ‘Brexit’. One of the trickier views to archive was that of British expats living abroad. In this situation, unless the site can be proven to be based in the UK, we would have to write to the owner of the domain to request permission to archive the website. In a situation where permission was given but the person expressing those views subsequently wished to erase this history under the ‘right to be forgotten’ law adopted by the EU, should the UK have voted to leave the EU, this would leave the archived material in a tricky legal position. We learned during the internship that this would most likely result in the relevant archived material being deleted. However, this is exactly what the archive was set up to prevent and so the tension between the right to privacy and freedom of information on a public platform presents considerable problems to the aim of web archives to be fully comprehensive, aggravated further by the omission of websites with pay walls.

After finding this material and ensuring it was covered by the legal deposit law, it was necessary to classify the site accurately, identifying the main language, and providing titles and descriptions. For newspaper articles, this was relatively straightforward, but for Welsh and Irish-language publications produced by political parties, languages which I am studying at Jesus college, this was more complicated as the only languages available to select from were German or English–a testament to the nascent stage of the web archive’s development. In addition, classifying material was very much up to our own individual discretion and the descriptions to our own style. To complicate things further, the order in which searched-for material should be presented raises further issues, which we discussed at the end of the micro-internship. Namely whether results should be arranged by ‘most popular’, by date of publication, or any other criterion. The discussions and practical experience offered by this internship gave us an opportunity to help address the legal and administrative challenges facing web archivists.

Daniel Taylor

Web Archiving Micro-internship – Part 1

On 14 and 15 March eight Oxford University students took part in a web archiving micro-internship at the Weston Library’s Centre for Digital Scholarship. Working with the UK Legal Deposit Web Archive, they contributed to the curation of a special collection of websites on the UK European Referendum. This is the first of two guest blog posts on the micro-internship.

During a micro-internship at the Bodleian’s legal deposit web archives, focusing on the EU referendum collection, we have had an occasion to reflect on the meaning of such an archive, and particularly on its potential for creating meaning.

Web archiving micro-interns on the roof of the Weston Library, March 2016.

Web archiving micro-interns on the roof of the Weston Library, March 2016.

A web archive’s potential document base is clearly much wider than a paper collection’s. No material criteria, such as donations and physical availability, play a defining factor in the content archived. The main restriction placed on this particular archive is that of legal permission, which allows only UK domains to be easily archived. Even so, the scope remains incredibly wide.

Therefore, archiving the web implies a deliberate narrowing of choices on the archivist’s side. Much is left to their discretion.

A lot of what we know of history is defined by the material that is preserved. It is difficult to learn about the working class or women in the past from original sources, as material by and about such people is conspicuously absent from our collections. A contemporary web archivist has the chance to select material that can most broadly represent society. This will make it impossible for future historians to ignore the history of many groups, and will enable research into a variety of thoughts and experiences.

This was reflected and magnified in the approaches that the group of interns took, which evidences the importance of having a range of different people cooperate on the gathering of knowledge. One woman, for example, concentrated on the representation of the Brexit referendum in media specific to certain ethnic and religious groups, such as Judaism. Another made sure to include the views of Scottish, Gaelic and Irish media and organisations, in order to avoid an England-only approach. One of the interns chose to gather information about the way the referendum is seen in small communities, enriching the archive with small local publications. On the first day, I concentrated on the views and representation of immigrants, whose lives will be strongly affected by the referendum. On the second day, I preserved information about women’s roles and views.

Such a wide range of approaches contributes to the broadening and deepening of historical studies. It also positively contributes to contemporary social science. This can happen in two main ways. Firstly, it places virtual documents in a setting that makes their analysis easier. It thus enables social scientists to observe internet trends throughout the years, and compare them to each other. For this purpose, a wide range of archived material is essential, and again the archivist has a role in creating the foundational understanding of British society..

Secondly, and perhaps more interestingly (as the first function can be fulfilled by tools on the live web) they allow social scientists to track trends in academia. A web archive describes what subjects and focuses contemporary academia considers to be salient. It points out what we, as researchers, think is worth being saved from the internet black hole.

The defining potential of this is striking, and this internship allowed us to understand the social, political and historical role of archiving.

Zad El Bacha

Event: Exploring the UK Web, 11 December 2015

 

Wab Archives TalkExploring the UK Web:
An introduction to web archives as scholarly resources

11 December 2015
2.00pm – 4.00pm

Venue: Lecture Theatre, Weston Library

Speakers: Jason Webber, Prof Jane Winters, Dr Gareth Millward, Prof Ralph Schroeder

‘The Web’, in the 25 years of its existence, has become deeply ingrained in modern life: it is where we find information, communicate, research, share ideas, shop, get entertained, set and follow trends and, increasingly, live our social lives.
As much as we rely on traditional paper archives today to find out about the past, for anyone trying to understand life in the late 20th and early 21st century, archived websites will be an invaluable resource.

Join us and our expert panel for an afternoon of exploring the archives of the UK web space, focusing on their potential use for research and teaching. Short presentations will introduce the resources and tools available for web archives research in the UK, and the opportunities (and challenges) they come with in theory and practice: from web archives curation, preservation and research tool development at the British Library, to current research in the Big UK Domain Data for the Arts and Humanities (BUDDAH) Project and at the Oxford Internet Institute.
Afterwards there will be plenty of time for questions and discussion – your chance to ask everything you ever wanted to know about web archives and to contribute your thoughts and ideas to an emerging discipline.

Admission free. All welcome.
To secure a place, please complete our booking form via What’s on

Jason Webber is the Web Archiving Engagement and Liaison Manager at the British Library, working with the UK Web Archive and the Legal Deposit Web Archive.
Jane Winters is Professor of Digital History at the Institute of Historical Research, and Principal Investigator in the BUDDAH Project.
Gareth Millward is a Research Fellow at the London School of Hygiene and Tropical Medicine and one of the BUDDAH Project bursary holders.
Ralph Schroeder is a Senior Research Fellow at the Oxford Internet Institute.