Welcome Japan Search to the web of Linked Open Data

Bodleian Libraries MS. Jap. c.4(R) depicts the character Urashima Taro, known to Japan Search as 水江浦島子

Japan Search is an aggregator, holding metadata on 17 million items from 38 databases related to Japanese cultural institutions. It is like a Japanese counterpart to Europeana. Hosted by the National Diet Library, it is currently in beta phase – not officially released yet – but already an impressive service. It’s of interest to a Wikimedian working in Oxford because it has been designed in an open way, with connection to other databases and applications built in from the outset.

Part of its database is a table of nearly 8000 named entities: these are artists, depicted entities and sometimes locations. Japan Search has its own system of identifiers based on Japanese names, but thankfully they have incorporated identifiers from other systems, including VIAF, BnF, British Museum, DBPedia, and Wikidata. Continue reading

What Wikidata offers Oxford’s GLAM Digital Strategy

As part of Oxford’s GLAM Digital Strategy, there has been some interesting research into audience archetypes. This work examines the many different aims people can have when engaging with our GLAM institutions: from “have fun” to “use collections in teaching”. The technology we use in GLAMs can help users in these goals, or can throw up frustrating barriers, and this strategic work explores how it could help.

Meanwhile, open platforms like Wikipedia continue to be the principal way in which people encounter cultural heritage. A growing “GLAM-Wiki” movement involves cultural institutions and volunteers in sharing collection data and building new tools, with some of those data sets coming from Oxford University. So is there an overlap between Oxford GLAMs’ aspirations and what Wikidata enables? In this post, I draw together some of my previous posts to show Wikidata’s role in advancing some aims mentioned in the document. Continue reading

Making Wikidata visible

→ Cet article en Français

I’ve been experimenting with a way to show how Wikidata represents knowledge; specifically how it makes pathways out of relationships between things. In a previous post I wrote about how Wikidata’s representation enables new pathways between entities. Since those pathways link into a giant web they offer new ways to discover existing collection objects. Now that I have been describing Oxford’s GLAM collections on Wikidata, we can show concrete examples of this expanding knowledge graph.

Normally with Wikidata we specify properties and get results that are identifiable things. For example if we ask for “female historians born in the 1730s with a biography in Electronic Enlightenment”, we get Catherine Macaulay. Here I’m using queries that specify a group of things and request the properties connecting them. So we get a tiny fragment of the Wikidata knowledge graph (which right now has just over 54 million people, places, publications, object and concepts). We can see how different kinds of data (biographical, bibliographic, and catalogue data) are combined in the same model. I’ve captured these graphs as screenshots, but I recommend clicking through to the live query where you get a draggable, stretchy graph. Continue reading

Translating a blog post into structured data

Timur Beg Gurkhani (1336-1405) plays a small role in our story. Public domain image via Wikimedia Commons

Recently my Bodleian colleague Alasdair Watson posted an announcement about an illuminated manuscript that is newly available online. To get the most long-term value out of the announcement, I decided to express it as Linked Open Data by representing its content in Wikidata. This blog post goes through that process. Continue reading

Deletion is not the end: making an academic article stick on Wikipedia

Identity fusion is a concept central to a lot of research in social psychology and cognitive anthropology. So it is understandable that a member of an anthropology research group wrote an explanation of this concept for Wikipedia, explaining the idea to the widest possible audience and citing the key papers.

Unfortunately, writing an article and getting it accepted by Wikipedia are different things. The draft was rejected multiple times and eventually deleted, removing hours of work. Many academics have at least heard of a similar experience and it can be very discouraging. However, these stories can have a happy ending. We were able to get the draft back and post it as an article where it became one of the top two search engine hits for its topic. This article is about that process, and what academics can do to make sure their articles are accepted by Wikipedia. Continue reading

Semantic data and the stories we’re not telling

One of my earliest memories of television was James Burke’s series Connections. It was fascinating yet accessible: each episode explored technology, history, science and society, jumping across topics based on historical connections or charming coincidences. One episode started with the stone fireplace and ended with Concorde.

In a digital utopia, we would each be our own James Burke, creating and sharing intellectual journeys by following the connections that interest us. We are not there yet. Many very valuable databases exist online, but the connections between them are obscured rather than celebrated, and this is an obstacle for anyone using those data in education or research. In a previous post I described the problems that come from the fact that things have different names in different databases, and described a semantic web approach to link them together.

Building on this approach, web applications can help people create their own stories; choosing their own path through sources of reliable information, building unexpected connections. In this post I describe three design principles behind these applications. Let’s start with a story.

Continue reading

Report from Wikimania

Last month I had to privilege to attend the Wikimania conference in Montreal, Canada, where 900 people from around the world gathered for two days of hacking and building and then three days of conference sessions. The conference scope includes not just the Wikimedia projects but also the big themes of open education, open access, community building, and privacy and rights in the digital age. One blog post by one attendee is only going to capture a sliver of what went on, and here I am summarising some big projects of most relevance to university research projects and GLAMs.

This time round, Wikidata rather than Wikipedia was generating the most excitement. Wikidata, the free structured knowledge-base, is going through a period of explosive growth, helped in a small part by data shared from partner institutions including Oxford University, and the conference brought together many people using Wikidata to document cultural heritage and current knowledge.

The author and hundreds of other Wikimedians. Photo by Victor Grigas of the Wikimedia foundation, CC-BY-SA 4.0

Continue reading

A step forward in the sharing of open data about theses

Title page of Marie Curie’s doctoral thesis; Yale University via Wikimedia Commons; Public Domain

Theses, particularly doctoral theses, are an important part of the scholarly record. Some are published and become influential books in their own right. As well as demonstrating the author’s ability to do original research, a thesis gives a snapshot of its author’s intellectual development at a formative time. This post reports on work sharing open data about thousands of theses, with links back to their full text in a repository.

The Oxford Research Archive (ORA) has 3237 Oxford doctoral theses on open access for anyone to download and read. Some of the authors have gone on to highly accomplished careers, such as the psychologist Professor Dorothy Bishop or the economist Sir John Vickers. During the confirmation hearings that eventually saw Neil Gorsuch appointed to the US Supreme Court, the interest in his background was such that TIME magazine wrote an article analysing his thesis and linking to ORA. This may well have been prompted by our linking the thesis from the top Google hit about Gorsuch; his Wikipedia biography. Continue reading

Publicising a historic event in Wikipedia

The front page of English Wikipedia gets around five million hits per day. Highlighted sections of the page, such as “Did you know” and “In the news” trumpet the site’s purpose: sharing knowledge for its own sake. One of these sections, “On this day…” features five different facts each day, with links to relevant articles. These facts in turn are chosen from a large collection of roughly 100 historic events for each date. Many other language versions of Wikipedia have a similar “This day in history” section, though with different sets of facts.

As with everything else on Wikipedia, this collection of historic facts is offered freely for anyone to use for any purpose. “On this day in history” facts are ideal for sharing on social media, for example by Wikipedia’s official presence on Twitter.

Napoléon Bonaparte, listed in Wikipedia’s May 26 article for his coronation as King of Italy on 26 May 1805. Image from the Curzon Collection of political prints, CC-BY the Bodleian Libraries.

To avoid repetition from year to year, it helps to be able to draw on a large pool of historic events, so each day can showcase a variety of types of event, of locations and of eras. There is a relative shortage of events before 1800, so additions are welcome.

Being featured on the front page generates a lot of interest in the article.

  • The Alhambra Decree article typically gets about 300 views per day. When linked from the front page as a recent “On this day” item, it had nearly 10,000.
  • The Treaty of Fontainebleau (1814) article gets 70 to 80 views on a typical day, but had 5,400 when linked from the home page on its anniversary.
  • The article about Suvarnadurg, an Indian fort, usually gets around 30 views a day, but had 8,500 when the fort’s 1755 capture by the East India Company was listed on April 2.

By considering one example, we can look at how a historic event is made visible in Wikipedia.

March 31: 1492 – The Catholic Monarchs of Spain issued the Alhambra Decree, ordering all Jews to convert to Christianity or be expelled from the country.

The typical form is a single sentence, in past tense, linking multiple different Wikipedia articles, with a bold link to the one most closely connected to the fact. Not every historical event qualifies:

  • The event must have happened on a single day, so not a crisis or war, but a precipitating or concluding event such as the signing of a treaty.
  • Births and deaths have their own process for appearing on the front page, so do not qualify for this collection of facts.
  • It must be an event with notable repercussions: one notable figure marrying another, or writing a letter to another, is not always significant in itself, but can be significant by initiating other events.
  • There must be no controversy about the day on which it happened. Reputable sources should agree.
  • The fact must be backed up by at least one reliable source, which must be cited in the article. As with all Wikipedia references, paywalled sources are fine but open-access sources have an advantage because they can be checked by Wikipedians outside subscribing institutions. With software developments over the last couple of years, adding citations has become extremely easy: the Cite tool expands DOIs into full citations and normally succeeds in transforming web links into full citations.

If you have a cited fact that meets the above criteria, it can have multiple mentions in Wikipedia:

  • The fact must be stated in the “home” article, in this case Alhambra Decree.
  • It can also go in the articles about the calendar date and the year. There are English Wikipedia articles about the year 1492 and about the date March 31. Unlike most Wikipedia articles, these are essentially lists of facts under different headings.
  • It can also appear in the biographies of the people, organisations or nations involved (in this case, Isabella of Castille). Some topics have timeline articles which are essentially lists of dates, such as Timeline of Spanish history.

The articles about individual dates, such as March 30, also have lists of births and deaths. In the long term, these will probably be driven by Wikidata, which is ideal for this kind of data. These lists have the same relative paucity of dates before 1800, and the same requirement that dates should be sourced and uncontroversial.

Facts for a particular day are chosen well in advance by an administrator, working behind the scenes in an area called the Selected anniversaries project. It is accepted, even encouraged, for other users to proactively edit in their own suggestions if they know wiki-code. The listing is decided two to four days in advance, so include your suggestion further in advance than that.

The guidelines give preference to events with a significant anniversary (meaning a multiple of 25, e.g. a 325th anniversary), events that differ from the others on the list (in era or geography), and articles that have not been on the front page before. “On this day” articles do not have to be comprehensive, but should be good examples of Wikipedia articles with citations in all sections. Each day’s “staging area” has a list of events that were submitted but did not qualify. Usually the article is rejected for having insufficient citations, so by improving the articles with links to scholarly sources, we can help those links reach the front page.

So there is an opportunity here for heritage organisations and historians to extend awareness of the turning points of history, and the use of biographical papers or databases. We just need to succinctly describe the key events and share citations about them.

—Martin Poulter, Wikimedian in Residence

This post licensed under a CC-BY-SA 4.0 license