Creating Wikipedia articles from research data

Posted on 27 October 2017 by poulterm

Hillfort images shared on Wikimedia Commons

The Atlas of Hillforts of Britain and Ireland is a collaboration between the Universities of Oxford, Edinburgh and Cork, funded by the Arts and Humanities Research Council. It provides a definitive list of hillfort sites in the British Isles- more than four thousand in total. As well as publishing a lot of fieldwork done by expert archaeologists, the site uses crowdsourcing, in that some of the sites were visited by volunteer investigators. The site invites users—expert or amateur—to submit their own photographs of the hillforts.

The Atlas launched in June 2017 and generated national media coverage. An issue for any newly-launched site is how to get incoming links from other sites; how to plumb the site into the existing paths by which people find information. This case study describes how, by sharing selected data from the Atlas, we were able to create thousands of incoming links from Wikipedia and related apps and sites, and to encourage the creation and use of hillfort articles in Wikipedia. Continue reading →

Turning a historical book into a data set

Posted on 24 October 2017 by poulterm

A series of books published around the turn of the 20th century are crucial to modern bibliographic research: they are biographical dictionaries of booksellers and printers, including addresses, dates and significant works printed. Some of these books are out of copyright and available as scanned pages, allowing us not only to copy them into new formats, but adapt them into new kinds of resource.

These scanned books could be made more useful to researchers in a number of ways. Text could be meaningfully segmented, by dictionary entry rather than by page or paragraph. The book’s internal and external citations can become links, for instance linking a proper name to identifiers for the named person. The book can even have an open data representation which other data sets can hook on to, for example to say that a person is described in the book.

This case study describes the transformation of one of these books, Henry Plomer’s A Dictionary of the Booksellers and Printers who Were at Work in England, Scotland and Ireland from 1641 to 1667 using Wikisource, part of the Wikimedia family of sites. As a collaborative platform, Wikisource allowed Bodleian staff to work with Wikisource volunteers. We benefited from many kinds of volunteer labour, from correcting simple errors in the text to creating custom wiki-code to speed up the process.

A lot of important data sets only currently exist in the form of printed books, including catalogues, dictionaries and encyclopedias. We adopted a process that has already been used on some large, multi-volume works and could be used for many more. Continue reading →

Making Sense of Negotiated Text at Scale: a workshop

Posted on 19 October 2017 by Pip Willcox

Register by email: see below for detailsWhat: Making Sense of Negotiated Text at Scale: a workshop

When: 11:30—14:30, Thursday 30 November 2017

Where: Centre for Digital Scholarship, Weston Library (map)

Open to all

Free

Registration is required: please email Pip Willcox with your name, email address, and access and dietary requirements

How do we evaluate the relationship between different iterations of ideas in text form?

Speakers

Nicholas Cole and Alfie Abdul-Rahman: The Quill Project
Radoslaw Zubek, David Doyle, and Abhishek Dasgupta: Measuring Government Policy with Text Analysis project
David Price: DebateGraph—Exploring the Intention to Withdraw from the Union
Félix Krawatzek: Buying Words? The impact of donations on political language

This workshop brings together experts from four projects which are using digital methods to analyze, understand, and re-present negotiated texts. Taking UK government policy documents, the creation of the American Constitution, current political debate, and the economic cost of political language as their subject matter, each speaker will outline the motivation for their work and the approaches they have taken towards answering questions such as:

Are government regulations becoming more or less business friendly?
Which State’s representatives contributed the most successful proposals to the American Constitution?
What common threads of agreement are there in differing political viewpoints?
How much money does it take to change the language in the US Congress?

This workshop will be of interest to people working in history, politics, computational linguistics, visualization, or the application of digital innovation to research.

Alfie Abdul-Rahman is a Research Associate at the University of Oxford e-Research Centre where she develops web-based visualization tools for humanities scholars, including for the Quill Project.

Nicholas Cole is a Senior Research Fellow at Pembroke College Oxford, specializing in the history of political thought and American Constitutional History, and directs the Quill Project.

Abhishek Dasgupta is a doctoral student at Exeter College, studying Foundations, Logic, and Structures in the Department of Computer Science.

David Doyle is an Associate Professor of Latin American Politics in the Department of Politics and International Relations at the University of Oxford, a Fellow of St Hugh’s College, and co-investigator of the Fell-funded Measuring Government Policy with Text Analysis project.

Félix Krawatzek is a British Academy Postdoctoral Fellow based at the University of Oxford’s Department of Politics and International Relations and a Research Fellow at Nuffield College.

David Price co-founded DebateGraph with the former Australian cabinet minister Peter Baldwin and has led DebateGraph’s projects with, amongst others, the UK Prime Minister’s Office, the White House Office of Science and Technology Policy, CNN, the European Commission, and the Bill & Melinda Gates Foundation.

Radoslaw Zubek is an Associate Professor of European Politics, a Tutorial Fellow at Hertford College, and principal investigator of the Fell-funded Measuring Government Policy with Text Analysis project.

This workshop is convened by:

Registration

To register, please email Pip Willcox (pip.willcox@bodleian.ox.ac.uk) with:

Your name
Your email address
Access or dietary requirements

Image credit: Global Academic Forum

Digital Approaches to the History of Science: a successful workshop

Posted on 17 October 2017 by Pip Willcox

‘Digital Approaches to the History of Science’, the first of two planned workshops on this topic, was held at the History Faculty in Oxford on 28 September 2018. A total of nearly sixty attendees assembled to hear presentations from a selection of the most exciting current projects in this field from around the UK.

Professor Rob Iliffe, representing the Newton Project, addressed the ongoing challenges and complexity of digitizing and presenting the manuscript writings of Isaac Newton, and Alison Pearn spoke of the related issues faced by the digital side of the ongoing Darwin Correspondence Project. Lauren Kassell, of the Casebooks Project, introduced a very different type of material and spoke of the need to find new ways of representing, encoding and searching the mass of information contained in early modern medical-astrological casebooks.

After lunch two speakers discussed from complementary perspectives the opportunities represented by the very rich archive of The Royal Society. Louisiane Ferlier discussed the digitization of Royal Society journals and the work needed to clean and link the metadata about the articles in them. Pierpaolo Dondio described his work modelling and visualising the network of authors, editors and referees who controlled the content of those paper, and provided examples of the kinds of research outcomes such work can produce. A final talk turned to the use of digital humanities resources in the university classroom: Kathryn Eccles and Howard Hotson described the Cabinet Project, which has made a rich ecology of digital images and objects available to students on a growing list of Oxford undergraduate papers.

Rich discussions took place both around the individual presentations and over lunch and coffee, and this sell-out event has certainly stimulated interest and ongoing discussion about the distinctive opportunities for history of science created by digital scholarship and resources.

Reflections on discussion topics during the workshop by Pip Willcox

The event was supported by the Centre for Digital Scholarship (Bodleian Libraries), ‘Reading Euclid‘, The Royal Society and the Newton Project, and was organized jointly by the Centre for Digital Scholarship and ‘Reading Euclid’. The date for the second workshop will be announced shortly.

—Benjamin Wardhaugh, ‘Reading Euclid’

Top image credit: René Descartes, Principia philosophiae (Amsterdam, 1644), ‘Cartesian network of vortices of celestial motion’, p. 110. Bodleian Library Savile T 22. Edited in Photoshop by Yelda Nasifoglu.

SWORDV3 stakeholder call

Posted on 13 October 2017 by Emma Stanford

The SWORDV3 project team are looking for expressions of interest from potential stakeholders as they develop a new technical standard and community and governance mechanisms for this updated version of SWORD. From the DPC announcement:

Expressions of interest are sought to become stakeholders in the project: to make suggestions, review activities and meet as required over the coming months.

In particular, the project team is interested in making contact with people who may wish to develop SWORD V3 libraries for their preferred platforms or languages since the aim is to provide some support for such activities during the project. Please contact one of the project team (ideally by mid-October) if you are interested in participating, and indicate if you are interested in the technical or community aspects of the project (or both!).

On the technical side, the project is creating a document that brings together the change requests and new use cases that have collected since the release of SWORDV2, culled from the github site, message posts and preliminary discussions with some stakeholders earlier this year. This has also suggested a way forward that breaks with SWORD’s AtomPub roots in order to provide a more up-to-date and flexible protocol. This will be circulated to stakeholders soon.

On the community side, a similar document outlining possible models for developing the SWORD community in the future will be circulated soon. This is a much more open set of choices since the SWORD user-base has expanded considerably since its first conception, and we are open to further suggestions! The final arrangements must be aligned with community wishes in order to be an effective sustainable solution.

More at http://www.dpconline.org/news/swordv3-project-stakeholder-call.

Working with Spreadsheets: a workshop

Posted on 12 October 2017 by Pip Willcox

Image of hand-drawn spreadsheet

What: Working with Spreadsheets: a workshop

When: 10:00—16:30, Tuesday 21 November

Where: Centre for Digital Scholarship, Weston Library (map)

Access: open to all members of the University

Admission: free

Trainers: Iain Emsley and Pip Willcox

Registration is required: please see below

This workshop is designed for anyone who works with spreadsheets and wants to learn how to explore that data more efficiently and consistently. No prior experience is required. The hands-on workshop teaches basic concepts, skills, and tools for working more effectively and reproducibly with your data.

We will cover data organization in spreadsheets and OpenRefine for managing data.

By the end of the workshop participants will be able to manage and analyze data effectively and be able to apply the tools and approaches directly to their ongoing research.

The workshop draws on lessons prepared by Data Carpentry and adapted by the trainers for use with Early English Books Online Text Creation Partnership data.

The methods that you will learn will be applicable to work in any field that uses spreadsheets. The EEBO-TCP subject matter we will use may be of particular interest to people working with library or early modern data.

Registration

To register, please email Pip Willcox (pip.willcox@bodleian.ox.ac.uk) with:

Your name
Your ox.ac.uk email address
Your departmental affiliation

This workshop is run in collaboration with the Centre for Digital Scholarship and the Reproducible Research Oxford project.

For announcements about future workshops and related activities run by Reproducible Research Oxford, please see the project website, subscribe to the mailing list, and follow the project on Twitter @RR_Oxford.

Equipment

Participants are requested to bring a laptop. To work with with spreadsheets, you will need an application such as Microsoft Excel, Mac Numbers, or OpenOffice.org. If you don’t have a suitable program installed, you might like to use LibreOffice, a free, open source spreadsheet program.

You will also need OpenRefine (formerly Google Refine) and a web browser, and to have Java installed.

If you cannot bring a laptop with you, please let us know before the day.

Trainers

Iain Emsley works for the University of Oxford e-Research Centre on digital library and museums projects. Having recently finished an MSc in Software Engineering, he has started a PhD in Digital Media at Sussex University.

Pip Willcox is the Head of the Centre for Digital Scholarship at the Bodleian Libraries and a Senior Researcher at the University of Oxford e-Research Centre.

Image credit: Stockbyte/Getty Images.

Research Uncovered—Beyond reading: understanding the book through computer vision

Posted on 12 October 2017 by Pip Willcox

Book tickets!What: Research Uncovered—Beyond reading: understanding the book through computer vision

Who: Giles Bergel

When: 13:00—14:00, Thursday 2 November 2017

Where: Weston Library Lecture Theatre (map)

Access: open to all

Admission: free

Registration is required

This talk showcases Oxford’s cutting-edge research at the intersection of book history and computer vision. It aims to make images of books as easy to search, compare and annotate as their texts.

The University’s Visual Geometry Group has a long track record of working with University researchers and collections, building tools to help researchers analyse everything from classical art to fifteenth-century printed books and English broadside ballads, as well as numerous applications in the sciences. Several of these tools have now been openly released for all to use and adapt.

The talk reveals how computer vision, far from detracting from understanding books as material objects, offers a fresh pair of eyes on what remains one of humanity’s most sophisticated inventions and richest forms of heritage.

Dr Giles Bergel is Digital Humanities Research Officer in the University of Oxford’s Visual Geometry Group. He works on printed books, printing materials and the history of the book trade. Find out more information.

Book tickets: http://www.bodleian.ox.ac.uk/whatson/whats-on/upcoming-events/2017/nov/beyond-reading

Reconciling database identifiers with Wikidata

Posted on 12 October 2017 by poulterm

Charles Grey, former Prime Minister, has an entry in Electronic Enlightenment. How do we find his UK National Archives ID, British Museum person ID, History of Parliament ID, National Portrait Gallery ID, and 22 other identifiers? By first linking his Wikidata identifier.

In a previous blog post I stressed the advantage of mapping the identifiers in databases and catalogue to Wikidata. This post describes a few different tools that were used in reconciling more than three thousand identifiers from the Electronic Enlightenment (EE) biographical dictionary.

The advantages to the source database include:

Maintaining links between Wikipedia and the source database. EE and Early Modern Letters Online (EMLO) are two biographical projects that maintain links to Wikipedia. As Wikipedia articles get renamed or occasionally deleted, links can break. It is also easy to miss the creation of new Wikipedia articles. As EE and EMLO links are added to Wikidata, a simple database query gets a list of Wikipedia article links and their corresponding identifiers. Thus we can save work by automatically maintaining the links.
Identifying the Wikipedia articles of individuals in the source database. These are targets for improvement by adding citations of the source database.
Identifying individuals in the source database who lack Wikipedia articles, or who have articles in other language versions of Wikipedia, but not English. New articles can raise the profile of those individuals and can link to the source database. We raised awareness among the Wikipedian community with a project page and blog post. We also arranged with Oxford University Press to give free access to EE for active Wikipedia editors who requested it, via OUP’s existing Wikipedia Library arrangement.

Continue reading →

Bodleian Digital Library

A Bodleian Libraries blog

Monthly Archives: October 2017

Creating Wikipedia articles from research data

Turning a historical book into a data set

Making Sense of Negotiated Text at Scale: a workshop

How do we evaluate the relationship between different iterations of ideas in text form?

Speakers

Registration

Digital Approaches to the History of Science: a successful workshop

SWORDV3 stakeholder call

Working with Spreadsheets: a workshop

Registration

Equipment

Trainers

Research Uncovered—Beyond reading: understanding the book through computer vision

Reconciling database identifiers with Wikidata