Welcome Japan Search to the web of Linked Open Data

Bodleian Libraries MS. Jap. c.4(R) depicts the character Urashima Taro, known to Japan Search as 水江浦島子

Japan Search is an aggregator, holding metadata on 17 million items from 38 databases related to Japanese cultural institutions. It is like a Japanese counterpart to Europeana. Hosted by the National Diet Library, it is currently in beta phase – not officially released yet – but already an impressive service. It’s of interest to a Wikimedian working in Oxford because it has been designed in an open way, with connection to other databases and applications built in from the outset.

Part of its database is a table of nearly 8000 named entities: these are artists, depicted entities and sometimes locations. Japan Search has its own system of identifiers based on Japanese names, but thankfully they have incorporated identifiers from other systems, including VIAF, BnF, British Museum, DBPedia, and Wikidata.

Also helpful to joining up these data, there is a SPARQL endpoint. I think of SPARQL endpoints as genies or oracles, each able to answer any question about its domain, but only very pedantically and only if asked in the SPARQL language. The Japan Search genie can answer questions about its world of 17 million works and related entities. The Wikidata genie knows about its overlapping world of 56 million objects and concepts. They use different names for things, but luckily each genie is aware of the existence of other genies and other naming systems.

Here is what I did to link up the two databases:

  • Created an entry on Wikidata for Japan Search, including a pointer to its SPARQL endpoint.
  • Request that Wikidata supported federated search with Japan Search, adding it to the list (Europeana, the Smithsonian and dozens of others) of endpoints that Wikidata can send queries to. The WD genie now speaks to the JS genie. This does not mean that they understand each other straight away, as we’ve seen that they use different names for things.
  • Asked the JS genie for the WD identifiers it knows about, and the corresponding JS identifiers. For example, what is known to WD as Q20078 is known to JS as chname:歌川広重.
  • Requested the creation of a new property on WD, Japan Search Name ID.
  • Added the JP identifiers to WD, using this new property. Now when I ask a question of the WD genie, it can pass along questions to the JP genie, using the JP genie’s names for things.
  • Asked the JS genie for Japanese labels for entities where WD lacked a Japanese label, then added these labels to WD.

We now have at least two ways to pass an identifier from one genie to the other. The WD genie can show JP an identifier such as Q20078 and ask “What is your name for this?” Alternatively, WD can use its own knowledge of JP identifiers, asking “What do you know about the thing you call chname:歌川広重?” In my experiments, the former seemed to be faster, but there may well be tricks that I’m missing.

What can we do with this? Let’s ask for works by Hiroshige, with dates and preview images if possible. 2882 works, from multiple museums, get returned. (SPARQL fans: note that for each of these queries you can see the query code by clicking the link in the bottom left of the screen.) This is an example of a syndicated query. I’m using Wikidata’s query interface, but Wikidata is passing the query on to Japan Search, which returns results which are then processed further by Wikidata. So I benefit from Wikidata’s colourful interface, download options, and APIs.

Let’s look at how Wikidata’s multi-lingual platform can help the discoverability of Japan Search. Japan Search only has human-readable names for things in Japanese (although the contributing databases sometimes have labels in other languages). Wikidata can tell us what the things in Japan Search are called in French. It has French labels for 5093 of those identifiers; 64% of the total. What about Korean? 2705 Japan Search IDs have a corresponding Korean name (not necessarily a name in Korean script: sometimes things are known in Asian languages by a Western name). How do other languages compare in terms of completeness? We can request a table of languages, ordered by how many Japan Search entities they can name.

As well as identifiers, Wikidata holds basic biographical data including gender, birthplace and date, and citizenship. We can ask it which British artists are mentioned in Japan Search: it turns out there are 34 at present. For a given British artist — let’s say Anthony Gormley — what works does it know of? It returns one item, in the National Museum of Modern Art, Tokyo. It is in principle possible to ask about multiple artists, e.g. “all works by artists with British citizenship”, but in practice this times out. See below

And now to my ulterior motive! I’m looking for overlaps between Oxford GLAM collections and the collections indexed by Japan Search. The Ashmolean has an impressive collection of Asian art, and a lot of it is described in Wikidata. I can ask for works in the Ashmolean that are connected in some way to entities known to Japan Search. Because some entities are places like Tokyo or Iran, this throws up a large number of tenuous connections. So I cut it down to just two types of relationship: creator and depicts. In other words, we are looking at objects in the Ashmolean that were either created by, or depict, a person or other entity in Japan Search.

This presently gives 380 connections between the Ashmolean and Japan Search. In theory, a Japanese cultural aggregator — or anybody looking for a more complete list of works by a Japanese artist — could harvest this information, or query it directly from Wikidata. The titles of the art works will likely be in English, but Wikidata can express a lot of its properties in Japanese and can point to the catalogue record and image on the Ashmolean site.

Post Scriptum

When this blog post was published last week, I hadn’t got the impressive queries I was hoping to show. For instance, rather than specify “works by Anthony Gormley”, I wanted to ask for “works by British sculptors”. Since then, I’ve identified a couple of bottlenecks.

  1. I wanted a human readable page about each artwork, and was using ?work jps:sourceInfo/schema:relatedLink ?link. While this works, it this seems to slow down the query a lot. A quicker way to get a human-readable catalogue entry for the artwork is to take the data link (which gives an overview of data available about the item) and replace /data/ with /item/, yielding something that looks more like a museum catalogue entry.
  2. Some instructions such as FILTER and BIND run more quickly outside the SERVICE block, i.e. if Wikidata does these commands itself rather than giving them to Japan Search to do.

Now I can do the sort of queries I want, so here are works in Japan Search by sculptors from the United Kingdom, not just Anthony Gormley. The Wikidata genie prepares a list of sculptors, taking care to only include those that the Japan Search genie knows about (i.e. that have a Japan Search name ID). The WD genie passes that list to the JS genie, who returns a list of works with titles in Japanese and English, and optionally their dates. The WD genie then does some filtering (removing Japanese titles) and processing (fixing the link with the above data/item trick) and adds the English names of the artists.

Here are works by Hiroshige known to Japan Search plus those known to Wikidata (which adds roughly 150 extra) Conceivably a work could appear in both sets.

 

Post by Martin Poulter, Wikimedian In Residence
This post licensed under a CC-BY-SA 4.0 license

Comments are closed.