Detailed depictions with IIIF, Wikidata and Wikimedia Commons

Extract from “High Street Oxford.” Ashmolean Museum WA2016.48

The International Image Interoperability Framework (IIIF) is a standard, developed by a consortium including the Bodleian Libraries, that allows images and associated metadata to be shared across the web. It’s used by many sites including Digital Bodleian and Wikimedia’s image server, Wikimedia Commons.

As of November this year, Wikidata can point to the IIIF manifests associated with a digitised object (example near the foot of this page). However, the opportunity of Wikidata and IIIF is not just about discoverability of the IIIF data itself. Included in IIIF is the ability to address a specific rectangular region of an image with a URL. Wikidata can use this to express statements about part of an image

Anyone familiar with Turner’s “High Street, Oxford” will recognise several landmarks included in the scene. In this sense, there is a lot of structure in the image that is obvious to humans but not naturally captured in the painting’s digital representation (image + catalogue record). My mission, should I choose to accept it, is to express in open data not just that the painting depicts the Church of St. Mary the Virgin but that a specific part of the image depicts the church.

There are already platforms that let users annotate parts of an image. Flickr is a much-used example, as is Wikimedia Commons itself (mouse over the painting in this example). The downside that those notations are contained within a particular system, not part of an open data set that can be queried. By creating these annotations in Wikidata we don’t commit to an interface for viewing the annotations: they can be used in whatever image-viewing applications people build in future. Wikidata also enables semantic annotation: what we’re attaching to regions of the image isn’t a text string like “Church of St. Mary” but an identifier that picks out one church from its similarly-named alternatives.

Wikidata already has a representation of the Turner painting at Q21061735 and Commons has a photo of the painting. To define reference points within the painting, I need a consistent coordinate system, so my first task is to make a cropped, corrected version of the photo without the frame and without perspective distortion.

To define the region I can use the IIIF Image Cropper tool, part of the Crotos Lab family of tools, in turn adapted from work by Liz Fisher. Clicking and dragging a rectangle over the church gives me the string pct:52.6,16.4,24.9,61 which, in the context of the IIIF standard, defines a region within the image. A “depicts” statement in Wikidata can take a relative position within image qualifier in this format, so that’s what I should add to the painting’s Wikidata item.

Different tools will use this property in different ways. I can get the whole image with the church highlighted as an annotation, or I can get an image just of the rectangle containing the church (at the head of this blog post).

Different queries are possible with the Crotos IIIF tools. We can also get an overview of all the depictions defined in a given image, such as in The Coronation of Napoleon. There are also queries that look for candles, kisses or other items across multiple art-works from multiple collections. If you wanted to train a neural net to recognise not just photos of candles but artistic representations of candles, such a query would give you the start of a data set.

A group including the Smithsonian Institution and Wikimedia DC recently won funding for a project capturing depictions in Wikidata by mining catalogue records and other sources. The Coronation of Napoleon example above shows that depiction data for one art-work can be quite detailed, so l will finish with a couple of examples of what could be done in the Bodleian.

Here’s a cartoon from the Curzon Collection of Political Prints, showing the House of Commons after an imagined French invasion:

“We come to recover your long lost liberties” Curzon b.17(128)

The catalogue record lists some of the depicted people, including William Wilberforce; William Wyndham; Charles James Fox; Henry Phipps, 1st Earl of Mulgrave and Henry Dundas, 1st Viscount Melville. Wikidata already represents all these people, so the potential is there to describe where in the image each appears. That would help the cartoon serve as more of an educational object, but also help us retrieve a gallery of depictions for each of those individuals.

Even without region qualifiers, the depictions themselves could support novel queries like “History of Parliament biographies of people who were caricatured by James Gillray”. A political cartoon can be better understood if we know the offices of state that the individuals held at the time of the cartoon, and here is where it helps to combine artwork data and political data on the same platform.

Another Bodleian image depicts a group of Indian gods. With detailed depictions, this one image could be a educational object that could be used by people with very little knowledge of Indian religion.

[Assembly of the Gods] Bodleian Library MS. Douce Or. a.3 fol. 38r

As with so much of Wikidata, the expressive power is there, the existing data are still very small compared to what could potentially be expressed, and yet things are changing rapidly as institutions and volunteers share data.

Post by Martin Poulter, Wikimedian In Residence
This post licensed under a CC-BY-SA 4.0 license

Bodleian Digital Library

A Bodleian Libraries blog

Detailed depictions with IIIF, Wikidata and Wikimedia Commons