Tag Archives: archival interfaces

Advisory Board Meeting, 18 March 2011

Our second advisory board meeting took place on Friday. Thanks to everyone who came along and contributed to what was a very useful meeting. For those of you who weren’t there here is a summary of the meeting and our discussions.

The first half of the afternoon took the form of an overview of different aspects of the project.

Overview of futureArch’s progress

Susan Thomas gave us a reminder of the project’s aims and objectives and the progress being made to meet them. After an overview of the percentage of digital accessions coming into the library since 2004 and the remaining storage space we currently have, we discussed the challenge of predicting the size of future digital accessions and collecting digital material. We also discussed what we think researcher demand for born-digital material is now and will be in the future.

Born Digital Archives Case Studies

Bill Stingone presented a useful case study about what the New York Public Library has learnt from the process of making born-digital materials from the Straphangers Campaign records available to researchers.

After this Dave Thompson spoke about some of the technicalities of making all content in the Wellcome Library (born-digital, analogue and digitised) available through the Wellcome Digital Library. Since the project is so wide reaching a number of questions followed about the practicalities involved.

Web archiving update

Next we returned to the futureArch project and Susan gave an overview of the scoping, research and decisions that have been made regarding the web archiving pilot since the last meeting. I then gave an insight into how the process of web archiving will be managed using a tracking database. Some very helpful discussions followed about the practicalities of obtaining permission for archiving websites and the legal risks involved.

After breaking for a well earned coffee we reconvened to look at systems.

Systems for Curators

Susan explained how the current data capture process works for digital collections at the Bodleian including an overview of the required metadata which we enter manually at the moment. Renhart moved on to talk about our intention to use a web-based capture workbench in the future and to give us a demo of the RAP workbench. Susan also showed us how FTK is used for appraisal, arrangement and description of collections and the directions we would like to take in the future.

Researcher Interface

To conclude the systems part of the afternoon, Pete spoke about how the BEAM researcher interface has developed since the last advisory board meeting, the experience of the first stage of testing the interface and the feedback gained so far. He then encouraged everyone to get up and have a go at using the interface for themselves and to comment on it.

Training the next generation of archivists?

With the end of the meeting fast approaching, Caroline Brown from the University of Dundee gave our final talk. She addressed the extent to which different archives courses in the UK cover digital curation and the challenges faced by course providers aiming to include this kind of content in their modules.

With the final talk over we moved onto some concluding discussions around the various skills that digital archivists need. Those of us who were able to stay continued our discussions over dinner.

-Emma Hancox

What I learned from the word clouds…

Now, word clouds are probably bit out of fashion these days. Like a Google Map, they just seem shiny but most of the time quite useless. Still, that hasn’t stopped us trying them out in the interface – because I’m curious to see what interesting (and simple to gather) metadata n-grams & their frequency can suggest.

Take for instance the text of “Folk-Lore and Legends of Scotland” [from Project Gutenberg] (I’m probably not allowed to publish stuff from a real collection here and choose this text because I’m pining for the mountains). It generates a “bi-gram”-based word cloud that looks like this:

Names (of both people and places) quickly become obvious to human readers, as do some subjects (“haunted ships” is my favourite). To make it more useful to machines, I’m pretty sure someone has already tried cross-referencing bi-grams with name authority files. I also imagine someone has used the bi-grams as facets. Theoretically a bi-gram like “Winston Churchill” may well turn up in manuscripts from multiple collections. (Any one know of any successes doing these things?).

Still, for now I’ll probably just add the word clouds of the full-texts to the interface, including a “summary” of a shelfmark, and then see what happens!

I made the (very simple) Java code available on GitHub, but I take no credit for it! It is simply a Java reworking of Jim Bumgardner’s word cloud article using Jonathan Feinberg’s tokenizer (part of Wordle).

-Peter Cliff

Wot I Lernd At DrupalCon

I spent last week in the lovely city of Copenhagen immersed in all things Drupal. It was a great experience, not just because of the city (so many happy cyclists!), but because I’d not seen a large scale Open Source project up close before and it is a very different and very interesting world!

I’m going to pick out some of my highlights here as to cover it all would take days, but if you want to know more I’d encourage you to check out the conference Web site and the presentation videos on archive.org.

So, wot did I lernd?

Drupal Does RDF
OK, so I knew that already, but I didn’t know that from Drupal 7 (release pending) RDF support will be part of the Drupal core, showing a fairly significant commitment in this area. Even better, there is an active Semantic Web Drupal group working on this stuff. While “linked data” remains something of an aside for us (99.9% of our materials will not make their way to the Web any time soon) the “x has relationship y with z” structure of RDF is still useful when building the BEAM interfaces – for example Item 10 is part of shelfmark MS Digital 01, etc. There is also no harm in trying to be future proof (assuming the Semantic Web is indeed the future of the Web! ;-)) for when the resources are released into the wild.

Projects like Islandora and discussions like this suggest growing utility in the use of Drupal as an aspect of an institutional repository, archives or even Library catalogues (this last one my (pxuxp) experiment with Drupal 6 and RDF).

Speaking of IRs…

Drupal Does Publishing
During his keynote, Dries Buytaert (the creator of Drupal) mentioned “distributions”. Much like Linux distributions, these are custom builds of Drupal for a particular market or function. (It is testament the software’s flexibility that this is possible!) Such distributions already exist and I attended a session on OpenPublish because I wondered what the interface would look like and also thought it might be handy if you wanted to build, for instance, an Open Access Journal over institutional repositories. Mix in the RDF mentioned above and you’ve a very attractive publishing platform indeed!

Another distro that might be of interest is OpenAtrium which bills itself as an Intranet in a Box.

Drupal Does Community
One of my motivations in attending the conference was to find out about Open Source development and communities. One of the talks was entitled “Come for the Software, Stay for the Community” and I think part of Drupal’s success is its drive to create and maintain a sharing culture – the code is GPL’d for example. It was a curious thing to arrive into this community, an outsider, and feel completely on the edge of it all. That said, I met some wonderful people, spent a productive day finding my way around the code at the “sprint” and think that a little effort to contribute will go a long way. This is a good opportunity to engage with a real life Open Source community. All I need to do is work out what I have to offer!

Drupal Needs to Get Old School
There were three keynotes in total, and the middle one was by Rasmus Lerdorf of PHP fame, scaring the Web designers in the audience with a technical performance analysis of the core Drupal code. I scribbled down the names of various debugging tools, but what struck me the most was the almost bewildered look on Rasmus’ face when considering that PHP had been used to build a full-scale Web platform. He even suggested at one point that parts of the Drupal core should be migrated to C rather than remain as “a PHP script”. There is something very cool about C. I should dig my old books out! 🙂

HTML5 is Here!
Jeremy Keith gave a wonderful keynote on HTML5, why it is like it is and what happened to xhtml 2.0. Parts were reminiscent of the RSS wars, but mostly I was impressed by the HTML 5 Design Principles which favour a working Web rather than a theoretically pure (XML-based) one. The talk is well worth a watch if you’re interested in such things and I felt reassured and inspired by the practical and pragmatic approach outlined. I can’t decide if I should start to implement HTML5 in our interface or not, but given that 5 is broadly compatible with the hotchpotch of HTMLs we all code in now, I suspect this migration will be gentle and as required rather than a brutal revolution.

Responsive Design
I often feel I’m a little slow at finding things out, but I don’t think I was the only person in the audience to have never heard about responsive Web design, though when you know what it is, it seems the most obvious thing in the world! The problem with the Web has long been the variation in technology used to render the HTML. Different browsers react differently and things can look very different on different hardware – from large desktop monitors, through smaller screens to phones. Adherence to standards like HTML5 and CSS3 will go a long way to solving the browser problem, but what of screen size? One way would be to create a site for each screen size. Another way would be to make a single design that scales well, so things like images disappear on narrower screens, multiple columns become one, etc.

Though not without its problems, this is the essence of responsive design and CSS3 makes it all possible. Still not sure what I’m on about? dconstruct was given as a good example. Using a standards compliant browser (ie. not IE! (yet)) shrink the browser window so it is quite narrow. See what happens? This kind of design, along with the underlying technology and frameworks, will be very useful to our interface so I probably need to look more into it. Currently we’re working with a screen size in mind (that of the reading room laptop) but being more flexible can only be a good thing!

There were so many more interesting things but I hope this has given you a flavour of what was a grand conference.

-Peter Cliff

building castles 1: the problem

It has been an odd couple of days. You know how it is. A problem that needs solving. A seemingly bewildering array of possible solutions and lots of opinions and no clear place to start. In an attempt to bring some shape to the mist, I’m going to start at the start, with the basics.

The Raw Materials
  • A collection of things.
  • A set of born digital items – mostly documents in antique formats.
  • EAD for the collection – hierarchical according to local custom and ISAD(G).
  • A spreadsheet – providing additional information about the digital items, including digests.
The Desired Result
A browser-based reader interface to the digital items that maintains the connections to the analogue components and remains faithful to the structure of the finding aid and presents that structure in such a way as to not confuse the reader. Ideally the interface should also support aspects of a collaborative Web, where people can annotate and comment, as well as offer “basket”-like functionality (“basket” is the wrong term), maybe requests for copies and maybe even the ability to arrange the collection how they’d like to use it.
(I imagine you’ve all got similar issues! :-))
We put together a sketch for the interface to the collection for the Project Advisory Board and got some very useful feedback from that. Our Graduate Trainee Victoria has also done some great research on interfaces to existing archives and some commercial sites which provides some marvellous input on what we should and could build.
But this is where things get misty…
We have some raw materials, we have a vision of the thing we want to build (though that vision is in parts hazy and in parts aiming high! (why not eh?)), so where to we go from here?
(To put it another way, there are the foundations of a “model”, a vision of a “view”; now we need to define the “controller” – the thing that brings the first two together).
  • We could build a database and put all the metadata into it and run the site off that
  • We could build a set of resources (the items, the sub[0,*]series, the collection, the people), link all that data together and run the site off that.
  • We could build a bunch of flat pages which, while generated dynamically once, don’t change once the collection is up.
There is a strong contender for how it’ll be done (the middle one!) and in the next exciting episode I’ll hopefully be able to tell you more about the first tentative steps, but for now I’m open to suggestions – either for alternatives or technologies that’ll help and if you have already built what we’re after then please get in touch… 😉
-Peter Cliff

Advisory board meeting, 24 Sept. 2009

Thanks to everyone who came along and contributed to the project’s first advisory board meeting last Thursday.

We started with some introductory discussions around the Library’s hybrid collections and the futureArch project’s aims and activities. This discussion was wide ranging, touching on a number of subjects including the potential content sources for ‘digital manuscripts’: from mobile phones, to digital media, to cloud materials.

In the past year, we’ve made progress on developing, and beginning to implement, the technical architecture for BEAM (Bodleian Electronic Archives & Manuscripts). Pete Cliff (futureArch Software Engineer) kicked off our session on ‘systems’ with an overview of the architecture, drawing on some particular highlights; it’s worth a look at his slides if you’re interested in finding out more.

Next, a demo of two ingest tools:
1. Renhart Gittens demonstrated the BEAM ingester, our means of committing accessions (under a collection umbrella) to BEAM’s preservation storage.

2. Dave Thompson (Wellcome Library Digital Curator) demonstrated the XIP creator. This tool does a similar job to the BEAM Ingester and forms part of the Tessella digital preservation system being implemented at the Wellcome Library.

Keeping with technical architecture, Neil Jefferies (OULS R&D Project Officer) introduced Oxford University Library Service’s Digital Asset Management System (or DAMS, as we’ve taken to calling it). This is the resilient preservation store upon which BEAM, and other digital repositories, will sit.

How will researchers use hybrid archives?
Next we turned our attention to the needs of the researchers who will use the Library’s hybrid archives. Matt Kirschenbaum (Assoc. Prof. of English & Assoc. Director of MITH at the University of Maryland) got us off to a great start with an overview of his work as a researcher working with born-digital materials. Matt’s talk emphasised digital archives as ‘ material culture’, an aspect of digital manuscripts that can be overlooked when the focus becomes overly content-driven. Some researchers want to explore the writer’s writing environment; this includes seeing the writer’s desktop, and looking at their MP3 playlist, as much as examining the word-processed files generated on a given computer. Look out for the paper Matt has co-authored for iPRES this year.

Next we broke into groups to critique the ‘interim interface’ which will serve as a temporary access mechanism for digital archives while a more sophisticated interface is developed for BEAM. Feedback from the advisory board critique session was helpful and we’ve come away with a to-do list of bug fixes and enhancements for the interim interface as well as ideas for developing BEAM’s researcher interfaces. We expect to take work on researcher requirements further next year (2010) through workshops with researchers.

Finally, we heard from Helen Hockx-Yu (British library’s Web Archiving Programme Manager) on the state of the art in web archiving. Helen kindly agreed to give us an overview of web archiving processes and the range of web archiving solutions available. Her talk covered all the options, from implementing existing tools suites in-house to outsourcing some/all of the activity. This was enormously useful and should inform conversations about the desired scope of web archiving activity at the Bodleian and the most appropriate means by which this could be supported.

Some of us continued the conversation into a sunny autumn evening on the terrace of the Fellows’ Garden of Exeter College, and then over dinner.




-Susan Thomas