Reconciling database identifiers with Wikidata

Charles Grey, former Prime Minister, has an entry in Electronic Enlightenment. How do we find his UK National Archives ID, British Museum person ID, History of Parliament ID, National Portrait Gallery ID, and 22 other identifiers? By first linking his Wikidata identifier.

In a previous blog post I stressed the advantage of mapping the identifiers in databases and catalogue to Wikidata. This post describes a few different tools that were used in reconciling more than three thousand identifiers from the Electronic Enlightenment (EE) biographical dictionary.

The advantages to the source database include:

  • Maintaining links between Wikipedia and the source database. EE and Early Modern Letters Online (EMLO) are two biographical projects that maintain links to Wikipedia. As Wikipedia articles get renamed or occasionally deleted, links can break. It is also easy to miss the creation of new Wikipedia articles. As EE and EMLO links are added to Wikidata, a simple database query gets a list of Wikipedia article links and their corresponding identifiers. Thus we can save work by automatically maintaining the links.
  • Identifying the Wikipedia articles of individuals in the source database. These are targets for improvement by adding citations of the source database.
  • Identifying individuals in the source database who lack Wikipedia articles, or who have articles in other language versions of Wikipedia, but not English. New articles can raise the profile of those individuals and can link to the source database. We raised awareness among the Wikipedian community with a project page and blog post. We also arranged with Oxford University Press to give free access to EE for active Wikipedia editors who requested it, via OUP’s existing Wikipedia Library arrangement.

Continue reading

Report from Wikimania

Last month I had to privilege to attend the Wikimania conference in Montreal, Canada, where 900 people from around the world gathered for two days of hacking and building and then three days of conference sessions. The conference scope includes not just the Wikimedia projects but also the big themes of open education, open access, community building, and privacy and rights in the digital age. One blog post by one attendee is only going to capture a sliver of what went on, and here I am summarising some big projects of most relevance to university research projects and GLAMs.

This time round, Wikidata rather than Wikipedia was generating the most excitement. Wikidata, the free structured knowledge-base, is going through a period of explosive growth, helped in a small part by data shared from partner institutions including Oxford University, and the conference brought together many people using Wikidata to document cultural heritage and current knowledge.

The author and hundreds of other Wikimedians. Photo by Victor Grigas of the Wikimedia foundation, CC-BY-SA 4.0

Continue reading

A step forward in the sharing of open data about theses

Title page of Marie Curie’s doctoral thesis; Yale University via Wikimedia Commons; Public Domain

Theses, particularly doctoral theses, are an important part of the scholarly record. Some are published and become influential books in their own right. As well as demonstrating the author’s ability to do original research, a thesis gives a snapshot of its author’s intellectual development at a formative time. This post reports on work sharing open data about thousands of theses, with links back to their full text in a repository.

The Oxford Research Archive (ORA) has 3237 Oxford doctoral theses on open access for anyone to download and read. Some of the authors have gone on to highly accomplished careers, such as the psychologist Professor Dorothy Bishop or the economist Sir John Vickers. During the confirmation hearings that eventually saw Neil Gorsuch appointed to the US Supreme Court, the interest in his background was such that TIME magazine wrote an article analysing his thesis and linking to ORA. This may well have been prompted by our linking the thesis from the top Google hit about Gorsuch; his Wikipedia biography. Continue reading

Resource discovery and Wikidata

How can I find reference materials about Jane Austen? This query could potentially take me to dozens of different sites and databases, each with different types of material. Project Gutenberg has transcribed text of her works. Librivox has audiobooks. Find A Grave has images of her memorial stone in Winchester Cathedral. The Huygens database of Women Writers has citations for modern research about her. The Stanford project Kindred Britain has her family tree. Across the Wikimedia family of sites, there are articles about Austen in 103 language versions of Wikipedia, quotations in 27 language versions of Wikiquote, and various images in Wikimedia Commons.

Portrait of Jane Austen by her sister, Cassandra. From the National Portrait Gallery via Wikimedia Commons

Title page of a first edition of Pride and Prejudice. Public Domain via Wikimedia Commons

Coat of arms of the Austen family. Public Domain via Wikimedia Commons

How do we capture the fact that all these different resources are about the same person? How do we make a path to these and similar sources, bypassing all the irrelevant links that would come up in a web search? Continue reading