All posts by benjaminpeirsonsmith

PASIG 2017: Smartphones within the changing landscape of digital preservation

I recently volunteered at the PASIG 2017 Conference in Oxford, it was a great experience to learn more about the archives sector. Many of the talks at the conference focused on the current trends and influences affecting the trajectory of the industry.

A presentation that covered some of these trends in detail was a talk by Somaya Langley from Cambridge University Library (Polonsky Digital Preservation Project), her talk was featured in the ‘Future of DP theory and practice’ session. ‘Realistic digital preservation in the near future: How do we get from A to Z when B already seems too far away?’. Somaya’s presentation considered how we preserve the digital content we receive from donors on smartphones, with her focus being on iOS.

Langley, Somaya (2017): Realistic digital preservation in the near future: How to get from A to Z when B seems too far away?. figshare. Retrieved: 08:22, Sep 22, 2017 (GMT)

Somaya’s presentation discussed how in the field of digital preservation ingest suites have  long been used to dealing with CDs, DVDs, Floppys and HDDs. However, are not sufficiently prepared for ingesting smartphones or tablets, and the various issues that are associated with these devices. We must realise that smartphones potentially hold a wealth of information for archives:

‘With the design of the Apple Operation System (iOS) and the large amount of storage space available, records of emails, text messages, browsing history, chat, map searching, and more are all being kept’.

(Forensic Analysis on iOS Devices,  Tim Proffitt, 2012. )

Why iOS? What about Android?

The UK market for the iPhone (unlike the rest of Europe) shows a much closer split: iOS November 2016 Sales 48.3% versus Android 49.6% market share in the UK. This  is contrasted against the global market share that Apple have of 12.1% in Q3 of 2016.

Whatever side of the fence you stand on it is clear that smartphones in digital curation, be they Android or iOS, will both play an important role in our collections. The skills required to extract content differs across platforms, we as digital archivists will have to learn both methods of extraction and leave our consumer preferences at the door.

So how do we get the data off the iPhone?

iOS has long been known as a ‘locked-down’ operating system, and Apple have always had an anti-tinkering stance with many of their products. Therefore it should come as no surprise that locating files on an iPhone is not very straightforward.

As Somaya pointed out in her talk, after spending six hours in the Apple Shop ‘Genius Bar’ she was no closer to understanding from Apple employees what the best course of action would be to locate backups of notes from a ‘bricked’ iPhone. Therefore she used her own method of retrieving the notes, using iExplorer to search through the backups from the iPhone.

She noted however that due to limitations of iOS it was very challenging to locate these files, in some cases it even required command line to access the location for storage backups as they were hidden by default in OSX (MacOS the main operating system used by Apple Computers).

Many tools do exist for the purpose of extracting information from iPhones, the four main methods outlined in the The SANS Institute White Paper on Forensic Analysis on iOS Devices by Tim Proffitt:

  1. Acquisition via iTunes Backups (requires original PC last used to sync the iPhone)
  2. Acquiring Backup Data with iPhone Analyzer (free java-based computer program, issues exist when dealing with encrypted backups)
  3. Acquisition via Logical Methods: (uses a synchronisation method built into iOS to recover data, e.g: programs like iPhone Explorer)
  4. Acquisition via Physical Methods (obtaining a bit-by-bit copy, e.g: Lantern 2 forensics suite)

Encryption is a challenge for retrieving data off the iPhone, especially since iTunes includes an encryption of backups feature when syncing. Proffitt suggests using a password cracker or jail-breaking as solutions to this issue, however, these solutions might not be fully compatible with our archive situations.

Another issue with smartphone digital preservation is platform and version locking. Just because the above methods work for data extraction at the moment it is very possible that future versions of iOS could make then defunct, requiring software developers to consistently update their programs or look for new approaches.

Langley, Somaya (2017): Realistic digital preservation in the near future: How to get from A to Z when B seems too far away?. figshare. Retrieved: 08:22, Sep 22, 2017 (GMT)

Final thoughts

One final consideration that can be raised from Somaya’s talk is that of privacy. As with the arrival of computers into our archives, phones will pose similar moral questions for archivists:

Do we ascribe different values to information stored on smartphones?
Do we consider the material stored on phones more personal than data stored on our computers?

As mentioned previously, our phones store everything from emails, geo-tagged photos, phone call information, and now with the growing popularity of smart wearable-technology, health data (including user heart-rate, daily activity, weight etc.) We as digital archivists will be dealing with very sensitive personal information and need to be prepared to understand the responsibility to safeguard it appropriately.

There is no doubt that soon enough we in the archive field will be receiving more and more smartphones and tablets into our archives from donors. Hopefully talks like Somaya’s will start the ball rolling towards the creation of better standards and approaches to smartphone digital curation.

‘Getting Started with Digital Preservation’ Workshop

On the 17th of May I attended the Digital Preservation Coalition’s (DPC) ‘Getting Started with Digital Preservation’ workshop in London.

The one-day event was a great opportunity to gain clear insights into starting in the digital preservation sector, and provided a useful platform for networking with other archivists. The event consisted of lectures from DPC members on various topics related to starting digital preservation. It also included group exercises that were aimed at putting these ideas into practice.

The day started with a brief overview of digital preservation. The DPC team started by making us focus on identifying the main aspects of traditional archival preservation for physical documents. For example, a document’s physical, robust and tangible nature. Its ability to be independently understandable without relying on technology. The existence of well-established approaches to its preservation. And the existence of a well-established understanding of value-assessment relating to these documents.

This was used as a springboard to introduce us to many issues that we would face transitioning to digital. Issues like the ephemeral and intangible nature of digital (1s & 0s can’t be held in your hands). The need for technology and software for documents to be understood (e.g. a PDF file requires software to open it). Issues of obsolescence (e.g. new hardware and software making older files redundant) and lack of any value-assessment experience in the field (how do we assess the value of a set of data?).

These areas helped us to understand that digital preservation presented its own set of unique challenges that have to be understood within their own context. The question of ‘Why Digitise?’ was then asked to the attendees at the workshop. The responses focused on: legal, research, cultural heritage, funding opportunities, efficiency, contingency and access reasons for digitising. This shows us that digital preservation cannot be seen as a simple solution to a single problem but a complex solution to many.

Bit-Level Preservation was covered in detail at the workshop, this section focused on the potential dangers that could affect data and how to prevent these from occurring. The three main areas were: media obsolescence: where media type is no longer used or the hardware no longer exists to support it, media failure / decay: when the media itself runs to the end of its life cycle or breaks, and natural / human-made disaster: fire, earthquakes etc. Mitigating these dangers is achieved by backing up the data more than 2-3 times (the actual number of copies needed is a subject of debate). Then storing these copies in different geographical locations, and performing periodical migration of media to new storage devices.

The workshop also looked at integrity checks and the role they play in bit-level preservation. Integrity checking is the process of creating a ‘checksum’ or ‘hash value’ (a unique number created when running an integrity checking program like Fixity, ACE and COPTR on a file). This number is unique to that data, like a fingerprint, and can be used to check if the data has changed or become corrupted in any way due to bit-rot or other data corruption.


Later in the workshop characterisation tools were demonstrated. The tool showcased was DROID (Digital, Record Object Identification). DROID is an open-source tool that analyses file types / formats on a system, it then relays this information to PRONOM, a database of file formats. The presentations stressed that the databases the tools used were important, and needed gradual updating to be accurate. Other examples of characterisation tools mentioned: C3PO, JHOVE, TIKKA, FITS.


The presentation on departmental readiness provided useful insights into preparing for digital preservation projects. It focused on the way that maturity models could be used to benchmark your department’s readiness for digital preservation The two main models discussed were: Digital Preservation Capability Maturity Model and the NDSA Levels of Digital Preservation. These models aimed to identify gaps in the institution’s readiness for digital preservation, whilst also focusing on aspects of best practice that they could aim to achieve.


A risk assessment exercise also formed part of the workshop. Those attending were asked to consider how various risks would affect the digital archival process. The risks would then be ranked on their likelihood of occurring, and the potential damage that they might cause. We would then propose potential solutions to help mitigate these risks, and prevent further ‘explosive’ risks from occurring. This was followed by assessing whether the scores for both criteria had improved.

The last presentation was on digital asset registers. It focused on the importance of creating and managing a detailed spreadsheet to hold an institutions digital assets, with the aim of having one organised and accessible source of information on a digital collection. The presentation focused on how this register could be shared with all members of staff to promote a better understanding of a digital collection. It mentioned that this would remove the issue of having one staff member who was a sole specialist on a collection, and promote further transparency throughout the digital preservation process. Another idea mentioned was that the register could be used for promoting further funding into digital collections, by providing a visual representation of the digital preservation process.

I thoroughly enjoyed the DPC workshop and look forward to attending similar workshops.