Translating a blog post into structured data

Timur Beg Gurkhani (1336-1405) plays a small role in our story. Public domain image via Wikimedia Commons

Recently my Bodleian colleague Alasdair Watson posted an announcement about an illuminated manuscript that is newly available online. To get the most long-term value out of the announcement, I decided to express it as Linked Open Data by representing its content in Wikidata. This blog post goes through that process.

The manuscript, the Shahnamah of Ibrahim Sultan, was not represented on Wikidata, although the epic poem itself, the Shahnamah (or Shahnameh) was already present and so were six of its exemplars. 

Three of the grandsons of Tīmūr (Tamerlane) are known to have had lavish copies of Firdawsī’s Shāhnāmah or Persian Book of Kings made for them. The Shāhnāmah of […] Ibrāhīm Sulṭān [is] preserved in the Bodleian Libraries, Oxford,

Ibrahim Sultan is already represented in Wikidata as Q3147516 including his immediate family tree, which connects him to his father Shahrukh (Q553204), who in turn is linked to his father Timur (Q8462).

To create an item for the manuscript, I click on “Create a new item” on the left of any Wikidata page (or, from a script, request a new item through the API). Identifiers are auto-numbered, and more than 53 million have already been allocated, so the Shahnamah of Ibrahim Sultan gets Q53676578. After adding a name and one-line description, the first priority is to say what kind of thing I’m describing:

instance of: illuminated manuscript

and to link it to the item representing the poem:

exemplar of: Shahnameh. language of work or name: Persian

The Wikidata interface makes great use of auto-suggestion and auto-completion, so adding these properties doesn’t require me to type the whole name of the property; just to make a couple of clicks and type the first few letters. We can repeat the process to extract more statements from the text of the blog post.

Thought to have been made in Shiraz…

location of final assembly: Shiraz

…sometime between 1430 and Ibrāhīm Sulṭān’s death in 1435,

inception: 1430s earliest date: 1430 latest date: 1435

The manuscript was acquired by Sir Gore Ouseley, a Diplomat and Linguist, during travels in the East in the early 19th century, and came into the Bodleian in the 1850s along with many other of Sir Gore’s collections.

owned by Gore Ouseley. end time: 1850s. owned by Ibrahim Sultan. end time: 1435

It is now preserved as MS. Ouseley Add. 176.

collection: Bodleian Library. inventory number: MS. Ouseley Add. 176. start time: 1850s

Ibrāhīm Sulṭān’s Shāhnāmah is now digitally available online via Digital.Bodleian.

full work available at: [web address]Now the Wikidata representation of the manuscript has eleven properties, and anyone creating an appropriate query in the Wikidata Query Service will get this manuscript in the results. Let’s ask for the English names of works once owned by grandchildren of Timur, along with the link to view them. That query translates into this SPARQL code:SELECT ?itemLabel ?url WHERE { wd:Q8462 wdt:P40/wdt:P40 ?owner. # Child of a child of Timur ?item wdt:P127 ?owner. # Owner of an item ?item wdt:P953 ?url. # Link to view the item # Give the item name in English SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }

Running that query gives us “Shahnamah of Ibrahim Sultan” and the Digital Bodleian link. At the moment it’s the only result, but as more digitized manuscripts come online, and more of their metadata is shared on Wikidata, the query will return more results over time. More realistic queries are manuscripts whose language is Persian or Middle-Eastern manuscripts held by institutions in England. The relevant query code can be incorporated into a manuscript-browsing app for use by a general audience.

This kind of addition can be made manually through Wikidata’s online interface, or more rapidly in bulk by a script. The sooner digitising institutions put in place workflows to share these metadata, the sooner we all benefit from pathways through biographical, bibliographic, geographical knowledge to the resources we create.

This post licensed under a CC-BY-SA 4.0 license

One thought on “Translating a blog post into structured data

  1. Pingback: Linked Data for Libraries and the Epic History of Greater Persia