I’ve been experimenting with a way to show how Wikidata represents knowledge; specifically how it makes pathways out of relationships between things. In a previous post I wrote about how Wikidata’s representation enables new pathways between entities. Since those pathways link into a giant web they offer new ways to discover existing collection objects. Now that I have been describing Oxford’s GLAM collections on Wikidata, we can show concrete examples of this expanding knowledge graph.
Normally with Wikidata we specify properties and get results that are identifiable things. For example if we ask for “female historians born in the 1730s with a biography in Electronic Enlightenment”, we get Catherine Macaulay. Here I’m using queries that specify a group of things and request the properties connecting them. So we get a tiny fragment of the Wikidata knowledge graph (which right now has just over 54 million people, places, publications, object and concepts). We can see how different kinds of data (biographical, bibliographic, and catalogue data) are combined in the same model. I’ve captured these graphs as screenshots, but I recommend clicking through to the live query where you get a draggable, stretchy graph.
First, some Treasures of the Bodleian. “With a guitar, to Jane” is a manuscript poem by Percy Bysshe Shelley about a guitar (which is now known as “Shelley’s guitar”). Both the guitar and poem were gifts from Percy to his muse, Jane Williams. Mary Shelley was of course a literary titan in her own right, and among the depictions of Mary held by the Bodleian is an oil painting. If we were to query for “literary works inspired by works of Shakespeare”, “paintings depicting authors of science fiction” or “present locations of objects originally owned by the Shelley family” this graph indicates some results we would get. (Here’s the live query)
Note that these graphs exclude a lot of information for the sake of visual clarity. There are times and dates (so Wikidata knows that the Bodleian acquired the guitar after Jane Williams owned it). There are identifiers for the items (e.g. shelfmarks for the collection items; the dozens of authority file identifiers for Mary and Percy Shelley).
Here’s a query linking items in two GLAM institutions. If you were interested in the 4th century calligrapher Wang Xianzhi, perhaps by hearing about a work displayed in the Palace Museum, Beijing, you might like to know that the Ashmolean Museum has these three works connected to him: a work by him, a work depicting him and a work depicting his brother Wang Huizhi. (Here’s the live query)
An example of the data used by Astrolabe Explorer. Some of the astrolabes associated with Jean Fusoris are in the History of Science Museum here in Oxford, but there are others in other GLAM institutions. Luckily they have all freely shared images of the astrolabes, which appear as thumbnail images in this graph. (Here’s the query)
In a previous post I translated a blog post by my colleague Alasdair Watson into structured data. This is a view of the resulting graph. The Shahnameh node represents the literary work and the Shahnamah of Ibrahim Sultan is a manuscript exemplifying it. There are other exemplars known to Wikidata, so I’ve included the Windsor Shahnameh as an example. (Here’s the query)
This is the same graph as before, but this time I requested labels in Farsi. This makes clear that Wikidata represents things in a language-independent way, and adds labels according to the users’ preferences. A few things and properties in this graph lack labels in Farsi, as is to be expected since it depends on human users to enter labels. There’s no machine-translation here. Still, we see Wikidata is capable of expressing to Farsi readers that Ibrahim Sultan’s manuscript was at one point owned by Sir Gore Ouseley and is now in the collection of the Bodleian Library.
Remember that, if we fully extend them out, all these fragmentary graphs will ultimately connect to each other. Wikidata is one graph connecting many millions of things together, and it in turn is part of the web of Linked Open Data, which connects millions more entites and properties.
These graphs are a way to show what we mean by a knowledge graph, and what we’re doing when we combine separate data sets into a semantic web representation. They are not a navigational tool for the end user but they illustrate the problem we confront when we build interfaces to assist discovery: how do we make the relationships in a data set—the ones that are interesting to users—salient and easily navigable?
Post by Martin Poulter, Wikimedian In Residence
This post licensed under a CC-BY-SA 4.0 license