Tag Archives: workshop

What is ‘The Future of the Past of the Web’?

‘The Future of the Past of the Web’
Digital Preservation Coalition Workshop
British Library, 7 October 2011
Chrissie Webb and Liz McCarthy

In his keynote address to this event – organised by the Digital Preservation Coalition , the Joint Information Systems Committee and the British Library – Herbert van der Sompel described the purpose of web archiving as combating the internet’s ‘perpetual now’. Stressing the importance to researchers of establishing the ‘temporal context’ of publications and information, he explained how the framework of his Memento Project uses a ‘timegate’ implemented via web plugins to show what a resource was like at a particular date in the past. There is a danger, however, that not enough is being archived to provide the temporal context; for instance, although DOIs provide stable documents, the resources they link to may disappear (‘link rot’).

The Memento Project Firefox plugin uses a sliding timeline (here, just below the Google search box) to let users choose an archived date

A session on using web archives picked up on the theme of web continuity in a presentation by The National Archives on the UK Government Web Archive, where a redirection solution using open source software helps tackle the problems that occur when content is moved or removed and broken links result. Current projects are looking at secure web archiving, capturing internal (e.g. intranet) sources, social media capture and a semantic search tool that helps to tag ‘unstructured’ material. In a presentation that reinforced the reason for the day’s ‘use and impact’ theme, Eric Meyer of the Oxford Internet Institute wondered whether web archives were in danger of becoming the ‘dusty archives’ of the future, contrasting their lack of use with the mass digitisation of older records to make them accessible. Is this due to a lack of engagement with researchers, their lack of confidence with the material or the lingering feeling that a URL is not a ‘real’ source? Archivists need to interrupt the momentum of ‘learned’ academic behaviour, engaging researchers with new online material and developing archival resources in ways that are relevant to real research – for instance, by helping set up mechanisms for researchers to trigger archiving activity around events or interests, or making more use of server logs to help them understand use of content and web traffic.

One of the themes of the second session on emerging trends was the shift from a ‘page by page’ approach to the concept of ‘data mining’ and large scale data analysis. Some of the work being done in this area is key to addressing the concerns of Eric Meyer’s presentation; it has meant working with researchers to determine what kinds and sources of data they could really use in their work. Representatives of the UK Web Archive and the Internet Archive described their innovations in this field, including visualisation and interactive tools. Archiving social networks was also a major theme, and Wim Peters outlined the challenges of the ARCOMEM project, a collaboration between Sheffield and Hanover Universities that is tackling the problems of archiving ‘community memory’ through the social web, confronting extremely diverse and volatile content of varying quality for which future demand is uncertain. Richard Davis of the University of London Computer Centre spoke about the BlogForever project, a multi-partner initiative to preserve blogs, while Mark Williamson of Hanzo Archives spoke about web archiving from a commercial perspective, noting that companies are very interested in preserving the research opportunities online information offers.

The final panel session raised the issue of the changing face of the internet, as blogs replace personal websites and social media rather than discrete pages are used to create records of events. The notion of ‘web pages’ may eventually disappear, and web archivists must be prepared to manage the dispersed data that will take (and is taking) their place. Other points discussed included the need for advocacy and better articulation of the demand for web archiving (proposed campaign: ‘Preserve!: Are you saving your digital stuff?’), duplication and deduplication of content, the use of automated selection for archiving and the question of standards.

Open Development: Building an Engaged Community

I had an interesting day on Monday at the OSS Watch workshop Open Development: Building an Engaged Community (the slides are available from there).

The days aims were

  • Understand how open development works and know the common community structures
  • Be familiar with the skills and processes that encourage community participation
  • Develop ideas for improving the community friendliness of a specific project

I’ve been using open source software all my career but I’ve always been 1) a bit sketchy about how and 2) very nervous of getting involved in any open source software development so this sounded perfect!

From the offset Steve Lee of OSS Watch, in his introduction to the day, made it was clear that open development practice was key to open source software rather than simply access to the source code. (A couple of the presenters said “don’t just throw it over the wall”, referring to the practice of putting your source code some where public and walking away – a very common practice in our field – as this would not lead to a sustainable software product).

The rest of the day supported this ideal… Sebastian Brännström of the Symbian Foundation spoke of how Symbian hoped to make as much of the Symbian operating system (for phones) open source as soon as possible, and outlined the large (and quite formal) organisational structure required to support the 40 million lines of code. For a software project that large, this shouldn’t come as a surprise, but clearly shows that “open sourcing” (I mean, the process to make software open source rather than sourcing work in an open way – though both are valid!) might not always be cheap or a free (beer) option. Indeed, he hoped that there would be a full-time, paid, community leader whose sole role would be to maintain and manage one of the 134 software packages that make up the Symbian OS.

Next up, Sander van der Waal of OSS Watch took us through the developer experience of taking part in an open source project – both from being part of a commercial company in the Netherlands and also working on the OSS Watch project SIMAL. It was very interesting to hear how his team had gone about contributing to Apache Felix & Jackrabbit (Two products very much of interest to our community!). He suggested it was very important to make use of the usual cluster of open source development tools – not just version management, but also mailing lists, bug tracking systems, wikis and the like – and that this was important if you were a “one man band” developer or a whole team. In many ways his experience here helped ease my nerves of contributing to projects.

The final speaker was Mark Johnson, of Taunton’s College, giving his experiences and tips on being involved with the open source course management system, Moodle. In a past life I’ve developed for Moodle, so this was interesting to hear about. His advice was broadly similar to that of the other two speakers, though from a different perspective and here there was evidence of useful reinforcement of ideas rather than repetition, which is always a good thing.

A workshop isn’t complete without a bit of group work and we were asked to complete a questionnaire designed, I think, to get us thinking about the sustainability of our open source projects by highlighting areas we should be considering – licensing, use of standards, documentation, etc.

This was a very useful tool and the questions got me thinking about all sorts of things. The results for futureArch were bad – all “red” (for danger) expect the section of use of standards – but that didn’t come as much of a surprise. I think it would be fair to say that futureArch isn’t an “Open Source Project” per se. Rather we’re avid users of open source software. We, like many, do not have the resources to run a community around anything we build (who has funding for a full-time community manager?) and it would probably be inappropriate to try. But we can and will contribute to other projects and the workshop helped me see that this was both pretty easy (assuming everyone is nice) and desirable.

And, of course, anything we build here – the ingest tool for example or the metadata manager – will probably be “thrown over the wall” and people will be able to find it and others, if they get the urge, will be able to found a community, which I guess shows there is value in simple publication of source code in addition to the (far more preferable and more likely to succeed) development of a community around a product. (The revelation that community building is essential for a sustained software product, probably so obvious to many, sheds light on the reasons behind things like Dev8D too).

Just some final thought then as it grows ever darker and it is good not to cycle too late home!

Firstly, it struck me as people talked, that while open source could be seen as less formal than closed software development, it clearly is not. Development of communities and the subsequent control and management of those communities, requires formal structures making open source anything but an easy option.

Secondly, fascinating were the reasons given to contribute to an open source project. Someone mentioned how by taking part you felt you were not alone, but the overwhelming reason given was “recognition”. By contributing you could get your name (and that of your employer) in lights, that participating in a community could lead to job offers, or other personal success. As most projects are on a meritocratic basis – the more good you do, the more say you have – that success could be to become the community leader or at least one of the controllers of the code – the fabled “commiters”. This is a curious thing – the reason to participate in a “community” is the “selfish” urge to self-promote. Something jars there, but I’m not quite sure what.

-Peter Cliff