A global collection of astrolabes in linked open data

I previously wrote about how easy it is to describe a GLAM collection item in Wikidata: it’s quicker than writing a blog post in WordPress and the resulting data are endlessly reusable. This time I’ll go into more detail about using Wikidata’s interface to describe items from museum collections, and announcing a new tool to browse the aggregated collection.

The Museum of the History of Science recently shared catalogue data about its outstanding collection of 165 astrolabes on Wikidata. Although Wikidata already had the power to describe astrolabes, very few had been entered, so this donation is a huge leap forward. If nothing comes to mind when I say “astrolabes”, here’s an image gallery generated by a query on Wikidata.

I’m going to take a random entry from David A. King’s “A Catalogue of Medieval
Astronomical Instruments” and describe it in Wikidata. Having checked that it isn’t already there, I click “Create new item” on the left hand side of any Wikidata page. At first I’ll be asked for a name and one-line description in my chosen language.

Continue reading

Translating a blog post into structured data

Timur Beg Gurkhani (1336-1405) plays a small role in our story. Public domain image via Wikimedia Commons

Recently my Bodleian colleague Alasdair Watson posted an announcement about an illuminated manuscript that is newly available online. To get the most long-term value out of the announcement, I decided to express it as Linked Open Data by representing its content in Wikidata. This blog post goes through that process. Continue reading

Some ways Wikidata can improve search and discovery

I have written in the past about how Wikidata enables entity-based browsing, but search is still necessary and it is worth considering how a semantic web database can be useful to a search engine index.

This post is about three ways Wikidata could help search and discovery applications, without replacing them: 1) providing more or less specific terms (hypernyms and hyponyms), 2) providing synonyms for a search term, 3) structuring a thesaurus of topics to provide meaningful connections. I end with the real-world example of Quora.com who are using Wikidata to manage a huge user-generated topic list.

Hypernyms and hyponyms

Continue reading

A Reconciliation Recipe for Wikidata

We have a list of names of things, plus some idea of what type of things they are, and we want to integrate them into a database. I have been working on place names in Chinese, but it could just as well have been a list of author names in Arabic. This post reports on a procedure to get Wikidata identifiers — and thereby lots of other useful information — about the things in the list.

To recap a couple of problems with names covered in a previous post:

  • Things share names. As covered previously, “cancer” names a disease, a constellation, an academic journal, a taxonomic term for crab, an astrological sign and a death metal band.
  • Things have multiple names. One place is known to English speakers as “Beijing”, “Peking” or as “Peiping”. Similarly, there are multiple names for that place even within a single variant of Chinese.

There are some problems specific to historic names for places in China: Continue reading

Deletion is not the end: making an academic article stick on Wikipedia

Identity fusion is a concept central to a lot of research in social psychology and cognitive anthropology. So it is understandable that a member of an anthropology research group wrote an explanation of this concept for Wikipedia, explaining the idea to the widest possible audience and citing the key papers.

Unfortunately, writing an article and getting it accepted by Wikipedia are different things. The draft was rejected multiple times and eventually deleted, removing hours of work. Many academics have at least heard of a similar experience and it can be very discouraging. However, these stories can have a happy ending. We were able to get the draft back and post it as an article where it became one of the top two search engine hits for its topic. This article is about that process, and what academics can do to make sure their articles are accepted by Wikipedia. Continue reading

Semantic data and the stories we’re not telling

One of my earliest memories of television was James Burke’s series Connections. It was fascinating yet accessible: each episode explored technology, history, science and society, jumping across topics based on historical connections or charming coincidences. One episode started with the stone fireplace and ended with Concorde.

In a digital utopia, we would each be our own James Burke, creating and sharing intellectual journeys by following the connections that interest us. We are not there yet. Many very valuable databases exist online, but the connections between them are obscured rather than celebrated, and this is an obstacle for anyone using those data in education or research. In a previous post I described the problems that come from the fact that things have different names in different databases, and described a semantic web approach to link them together.

Building on this approach, web applications can help people create their own stories; choosing their own path through sources of reliable information, building unexpected connections. In this post I describe three design principles behind these applications. Let’s start with a story.

Continue reading

Creating Wikipedia articles from research data

Hillfort images shared on Wikimedia Commons

The Atlas of Hillforts of Britain and Ireland is a collaboration between the Universities of Oxford, Edinburgh and Cork, funded by the Arts and Humanities Research Council. It provides a definitive list of hillfort sites in the British Isles- more than four thousand in total. As well as publishing a lot of fieldwork done by expert archaeologists, the site uses crowdsourcing, in that some of the sites were visited by volunteer investigators. The site invites users—expert or amateur—to submit their own photographs of the hillforts.

The Atlas launched in June 2017 and generated national media coverage. An issue for any newly-launched site is how to get incoming links from other sites; how to plumb the site into the existing paths by which people find information. This case study describes how, by sharing selected data from the Atlas, we were able to create thousands of incoming links from Wikipedia and related apps and sites, and to encourage the creation and use of hillfort articles in Wikipedia. Continue reading

Turning a historical book into a data set

A series of books published around the turn of the 20th century are crucial to modern bibliographic research: they are biographical dictionaries of booksellers and printers, including addresses, dates and significant works printed. Some of these books are out of copyright and available as scanned pages, allowing us not only to copy them into new formats, but adapt them into new kinds of resource.

These scanned books could be made more useful to researchers in a number of ways. Text could be meaningfully segmented, by dictionary entry rather than by page or paragraph. The book’s internal and external citations can become links, for instance linking a proper name to identifiers for the named person. The book can even have an open data representation which other data sets can hook on to, for example to say that a person is described in the book.

This case study describes the transformation of one of these books, Henry Plomer’s A Dictionary of the Booksellers and Printers who Were at Work in England, Scotland and Ireland from 1641 to 1667 using Wikisource, part of the Wikimedia family of sites. As a collaborative platform, Wikisource allowed Bodleian staff to work with Wikisource volunteers. We benefited from many kinds of volunteer labour, from correcting simple errors in the text to creating custom wiki-code to speed up the process.

A lot of important data sets only currently exist in the form of printed books, including catalogues, dictionaries and encyclopedias. We adopted a process that has already been used on some large, multi-volume works and could be used for many more. Continue reading

Reconciling database identifiers with Wikidata

Charles Grey, former Prime Minister, has an entry in Electronic Enlightenment. How do we find his UK National Archives ID, British Museum person ID, History of Parliament ID, National Portrait Gallery ID, and 22 other identifiers? By first linking his Wikidata identifier.

In a previous blog post I stressed the advantage of mapping the identifiers in databases and catalogue to Wikidata. This post describes a few different tools that were used in reconciling more than three thousand identifiers from the Electronic Enlightenment (EE) biographical dictionary.

The advantages to the source database include:

  • Maintaining links between Wikipedia and the source database. EE and Early Modern Letters Online (EMLO) are two biographical projects that maintain links to Wikipedia. As Wikipedia articles get renamed or occasionally deleted, links can break. It is also easy to miss the creation of new Wikipedia articles. As EE and EMLO links are added to Wikidata, a simple database query gets a list of Wikipedia article links and their corresponding identifiers. Thus we can save work by automatically maintaining the links.
  • Identifying the Wikipedia articles of individuals in the source database. These are targets for improvement by adding citations of the source database.
  • Identifying individuals in the source database who lack Wikipedia articles, or who have articles in other language versions of Wikipedia, but not English. New articles can raise the profile of those individuals and can link to the source database. We raised awareness among the Wikipedian community with a project page and blog post. We also arranged with Oxford University Press to give free access to EE for active Wikipedia editors who requested it, via OUP’s existing Wikipedia Library arrangement.

Continue reading