What Wikidata offers Oxford’s GLAM Digital Strategy

As part of Oxford’s GLAM Digital Strategy, there has been some interesting research into audience archetypes. This work examines the many different aims people can have when engaging with our GLAM institutions: from “have fun” to “use collections in teaching”. The technology we use in GLAMs can help users in these goals, or can throw up frustrating barriers, and this strategic work explores how it could help.

Meanwhile, open platforms like Wikipedia continue to be the principal way in which people encounter cultural heritage. A growing “GLAM-Wiki” movement involves cultural institutions and volunteers in sharing collection data and building new tools, with some of those data sets coming from Oxford University. So is there an overlap between Oxford GLAMs’ aspirations and what Wikidata enables? In this post, I draw together some of my previous posts to show Wikidata’s role in advancing some aims mentioned in the document. Continue reading

Making Wikidata visible

→ Cet article en Français

I’ve been experimenting with a way to show how Wikidata represents knowledge; specifically how it makes pathways out of relationships between things. In a previous post I wrote about how Wikidata’s representation enables new pathways between entities. Since those pathways link into a giant web they offer new ways to discover existing collection objects. Now that I have been describing Oxford’s GLAM collections on Wikidata, we can show concrete examples of this expanding knowledge graph.

Normally with Wikidata we specify properties and get results that are identifiable things. For example if we ask for “female historians born in the 1730s with a biography in Electronic Enlightenment”, we get Catherine Macaulay. Here I’m using queries that specify a group of things and request the properties connecting them. So we get a tiny fragment of the Wikidata knowledge graph (which right now has just over 54 million people, places, publications, object and concepts). We can see how different kinds of data (biographical, bibliographic, and catalogue data) are combined in the same model. I’ve captured these graphs as screenshots, but I recommend clicking through to the live query where you get a draggable, stretchy graph. Continue reading

Translating a blog post into structured data

Timur Beg Gurkhani (1336-1405) plays a small role in our story. Public domain image via Wikimedia Commons

Recently my Bodleian colleague Alasdair Watson posted an announcement about an illuminated manuscript that is newly available online. To get the most long-term value out of the announcement, I decided to express it as Linked Open Data by representing its content in Wikidata. This blog post goes through that process. Continue reading

Deletion is not the end: making an academic article stick on Wikipedia

Identity fusion is a concept central to a lot of research in social psychology and cognitive anthropology. So it is understandable that a member of an anthropology research group wrote an explanation of this concept for Wikipedia, explaining the idea to the widest possible audience and citing the key papers.

Unfortunately, writing an article and getting it accepted by Wikipedia are different things. The draft was rejected multiple times and eventually deleted, removing hours of work. Many academics have at least heard of a similar experience and it can be very discouraging. However, these stories can have a happy ending. We were able to get the draft back and post it as an article where it became one of the top two search engine hits for its topic. This article is about that process, and what academics can do to make sure their articles are accepted by Wikipedia. Continue reading

Semantic data and the stories we’re not telling

One of my earliest memories of television was James Burke’s series Connections. It was fascinating yet accessible: each episode explored technology, history, science and society, jumping across topics based on historical connections or charming coincidences. One episode started with the stone fireplace and ended with Concorde.

In a digital utopia, we would each be our own James Burke, creating and sharing intellectual journeys by following the connections that interest us. We are not there yet. Many very valuable databases exist online, but the connections between them are obscured rather than celebrated, and this is an obstacle for anyone using those data in education or research. In a previous post I described the problems that come from the fact that things have different names in different databases, and described a semantic web approach to link them together.

Building on this approach, web applications can help people create their own stories; choosing their own path through sources of reliable information, building unexpected connections. In this post I describe three design principles behind these applications. Let’s start with a story.

Continue reading

Report from Wikimania

Last month I had to privilege to attend the Wikimania conference in Montreal, Canada, where 900 people from around the world gathered for two days of hacking and building and then three days of conference sessions. The conference scope includes not just the Wikimedia projects but also the big themes of open education, open access, community building, and privacy and rights in the digital age. One blog post by one attendee is only going to capture a sliver of what went on, and here I am summarising some big projects of most relevance to university research projects and GLAMs.

This time round, Wikidata rather than Wikipedia was generating the most excitement. Wikidata, the free structured knowledge-base, is going through a period of explosive growth, helped in a small part by data shared from partner institutions including Oxford University, and the conference brought together many people using Wikidata to document cultural heritage and current knowledge.

The author and hundreds of other Wikimedians. Photo by Victor Grigas of the Wikimedia foundation, CC-BY-SA 4.0

Continue reading

A step forward in the sharing of open data about theses

Title page of Marie Curie’s doctoral thesis; Yale University via Wikimedia Commons; Public Domain

Theses, particularly doctoral theses, are an important part of the scholarly record. Some are published and become influential books in their own right. As well as demonstrating the author’s ability to do original research, a thesis gives a snapshot of its author’s intellectual development at a formative time. This post reports on work sharing open data about thousands of theses, with links back to their full text in a repository.

The Oxford Research Archive (ORA) has 3237 Oxford doctoral theses on open access for anyone to download and read. Some of the authors have gone on to highly accomplished careers, such as the psychologist Professor Dorothy Bishop or the economist Sir John Vickers. During the confirmation hearings that eventually saw Neil Gorsuch appointed to the US Supreme Court, the interest in his background was such that TIME magazine wrote an article analysing his thesis and linking to ORA. This may well have been prompted by our linking the thesis from the top Google hit about Gorsuch; his Wikipedia biography. Continue reading

Publicising a historic event in Wikipedia

The front page of English Wikipedia gets around five million hits per day. Highlighted sections of the page, such as “Did you know” and “In the news” trumpet the site’s purpose: sharing knowledge for its own sake. One of these sections, “On this day…” features five different facts each day, with links to relevant articles. These facts in turn are chosen from a large collection of roughly 100 historic events for each date. Many other language versions of Wikipedia have a similar “This day in history” section, though with different sets of facts.

As with everything else on Wikipedia, this collection of historic facts is offered freely for anyone to use for any purpose. “On this day in history” facts are ideal for sharing on social media, for example by Wikipedia’s official presence on Twitter.

Napoléon Bonaparte, listed in Wikipedia’s May 26 article for his coronation as King of Italy on 26 May 1805. Image from the Curzon Collection of political prints, CC-BY the Bodleian Libraries.

To avoid repetition from year to year, it helps to be able to draw on a large pool of historic events, so each day can showcase a variety of types of event, of locations and of eras. There is a relative shortage of events before 1800, so additions are welcome.

Being featured on the front page generates a lot of interest in the article.

  • The Alhambra Decree article typically gets about 300 views per day. When linked from the front page as a recent “On this day” item, it had nearly 10,000.
  • The Treaty of Fontainebleau (1814) article gets 70 to 80 views on a typical day, but had 5,400 when linked from the home page on its anniversary.
  • The article about Suvarnadurg, an Indian fort, usually gets around 30 views a day, but had 8,500 when the fort’s 1755 capture by the East India Company was listed on April 2.

By considering one example, we can look at how a historic event is made visible in Wikipedia.

March 31: 1492 – The Catholic Monarchs of Spain issued the Alhambra Decree, ordering all Jews to convert to Christianity or be expelled from the country.

The typical form is a single sentence, in past tense, linking multiple different Wikipedia articles, with a bold link to the one most closely connected to the fact. Not every historical event qualifies:

  • The event must have happened on a single day, so not a crisis or war, but a precipitating or concluding event such as the signing of a treaty.
  • Births and deaths have their own process for appearing on the front page, so do not qualify for this collection of facts.
  • It must be an event with notable repercussions: one notable figure marrying another, or writing a letter to another, is not always significant in itself, but can be significant by initiating other events.
  • There must be no controversy about the day on which it happened. Reputable sources should agree.
  • The fact must be backed up by at least one reliable source, which must be cited in the article. As with all Wikipedia references, paywalled sources are fine but open-access sources have an advantage because they can be checked by Wikipedians outside subscribing institutions. With software developments over the last couple of years, adding citations has become extremely easy: the Cite tool expands DOIs into full citations and normally succeeds in transforming web links into full citations.

If you have a cited fact that meets the above criteria, it can have multiple mentions in Wikipedia:

  • The fact must be stated in the “home” article, in this case Alhambra Decree.
  • It can also go in the articles about the calendar date and the year. There are English Wikipedia articles about the year 1492 and about the date March 31. Unlike most Wikipedia articles, these are essentially lists of facts under different headings.
  • It can also appear in the biographies of the people, organisations or nations involved (in this case, Isabella of Castille). Some topics have timeline articles which are essentially lists of dates, such as Timeline of Spanish history.

The articles about individual dates, such as March 30, also have lists of births and deaths. In the long term, these will probably be driven by Wikidata, which is ideal for this kind of data. These lists have the same relative paucity of dates before 1800, and the same requirement that dates should be sourced and uncontroversial.

Facts for a particular day are chosen well in advance by an administrator, working behind the scenes in an area called the Selected anniversaries project. It is accepted, even encouraged, for other users to proactively edit in their own suggestions if they know wiki-code. The listing is decided two to four days in advance, so include your suggestion further in advance than that.

The guidelines give preference to events with a significant anniversary (meaning a multiple of 25, e.g. a 325th anniversary), events that differ from the others on the list (in era or geography), and articles that have not been on the front page before. “On this day” articles do not have to be comprehensive, but should be good examples of Wikipedia articles with citations in all sections. Each day’s “staging area” has a list of events that were submitted but did not qualify. Usually the article is rejected for having insufficient citations, so by improving the articles with links to scholarly sources, we can help those links reach the front page.

So there is an opportunity here for heritage organisations and historians to extend awareness of the turning points of history, and the use of biographical papers or databases. We just need to succinctly describe the key events and share citations about them.

—Martin Poulter, Wikimedian in Residence

This post licensed under a CC-BY-SA 4.0 license

Wikimedia for public engagement

By Dr Martin Poulter, Wikimedian in Residence at the University of Oxford

A takin is a Himalayan goat-antelope with whom I feel a personal connection, and the reason goes back to an event I attended in 2011. The wildlife charity ARKive had allowed some of its descriptions of threatened species to be copied into Wikipedia. After presentations introducing both ARKive and Wikipedia, we split up the room. One table took birds, another took lizards, and I must have been among the mammals. After carefully reading what ARKive and Wikipedia said about the takin, I found a couple of sentences that could be copied from one to the other. Everyone in the room made a small but concrete improvement to their target Wikipedia article.

The trainer at the event, Andy Mabbett, thanked me afterwards with a message through Wikipedia. Making that change, and being recognised for it, connected me to the topic that a film or lecture could not. Somehow the takin had become my endangered mammal. People had turned up with a general curiosity about threatened species, engaged with the question of how to describe a specific species and had a positive experience with a peer-reviewed source – the ARKive site.

How do we create similar events where people are not just informed about a topic or a resource, but engage with it in a way that makes a lasting impression? Here are some suggested requirements for a public engagement event:

  • a collaborative task around a topic;
  • that requires thinking and reading, but not expertise, so anyone can take part;
  • that can be broken down into small chunks, identified beforehand;
  • that can be done in-person or remotely;
  • with a way to track individual contributions. We want to thank and reward contributors, and it’s also useful to assess the quality of their work. For a big, long-term project we might want something like a leader-board or a participation award.

Wikimedia platforms

Wikipedia and its sister projects are ideal platforms for meeting the above criteria. They

  • cover all academic subjects;
  • support collaboration between experts and non-experts;
  • have various tools to generate lists of “targets”: things that need improvement;
  • can be accessed by anyone with an internet connection;
  • have contributor records which publicly show what changes each user has made, even allowing ‘thank you’ messages for individual changes.

Perhaps most relevant is that Wikimedia resources do not exist in isolation but are derived from something else. A fact in Wikipedia or Wikidata needs to be backed up with a citation of a reliable source. A photo in Wikimedia Commons needs a description of where it was taken, or a citation of the collection it is drawn from. A transcribed text in Wikisource needs a pointer to the page scans that were transcribed, and ultimately to the physical copy of the book. So a Wikimedia event is always necessarily about a Wikimedia project and something else: a scholarly site or database, or physical exhibits, books or artworks.

Four Wikimedia projects hold distinct types of information about the same subject.

The best-established type of public event is a Wikipedia editathon, in which visitors are invited to write Wikipedia articles. Newcomer participants in editathons usually achieve little, because a lot of time and thought is needed to get to grips with Wikipedia’s interface, with Wikipedia’s culture and norms, and with the sources they will be using. Editathons can be very productive if participants are confident wiki editors, but that confidence does not come immediately. Fortunately, the Wikimedia projects offer simpler, less demanding ways for the public to engage with a subject.

An example: a Wikisource transcribe-a-thon

Looking to create events for the Ada Lovelace Bicentennial in 2015, I read about Mary Somerville, a 19th century mathematician and scientist who tutored Lovelace and for whom Somerville College is named. I could find none of Somerville’s works in electronic form, but some were available as scanned documents in the Internet Archive. This suggested how we could engage an audience interested in women scientists.

Wikisource is a platform for sharing and connecting out-of-copyright or freely-licensed text. Wikipedia’s article about Lord Byron summarizes his life, with brief mentions or quotes from his work. Wikisource, on the other hand, has a brief description of who he was and the full text of many of his poems and other works. Naturally, the two profiles link to each other. Most works on Wikisource come from scanned books which have been put through Optical Character Recognition (OCR) and then manually fixed. Each page has to be checked and approved by at least two different users before it is considered “validated” and ready for public readers.

Attendees at the event were given a shortened URL for the transcription and each got a post-it note with the page number that they should fix. They adapted to Wikisource at different rates, but that worked out fine because the quicker, more confident people checked and tweaked the work of those who made slower progress. In two hours, we got through one paper and a large proportion of a book by Somerville. After some further checking, these texts were linked from the front page of Wikisource. Feedback on the event was very good: participants recognised they were doing something important; not just learning about Somerville but helping to republish her work. The “transcribe-a-thon” format has been repeated as a conference session.

A transcription project on Wikisource: pages are yellow when they’ve been approved by one user, and green for two users.

A transcribe-a-thon needs some careful preparation in choosing the text, importing it into Wikisource and preparing it for transcription. The import process on Wikisource is documented, but not very intuitive, even for experienced wiki editors. Not all scans are suitable: if the images are poor quality, the OCR does not produce usable text and if there is non-standard text such as mathematical formalism, transcription will be too difficult for newcomers. A little more work is necessary once all the pages are validated, to assemble them into a single work.

Photographs and WikiShootMe

Wikipedia articles about a place or building usually have a geographical point, defined by latitude/ longitude pairs, attached to them in a machine-readable way. For example, the article about Oxford has coordinates that correspond to the central junction at Carfax. Wikidata, another sister project, has many more entries with locations, for items such as listed buildings. On some historic streets, almost every building has a Wikidata entry.

WikiShootMe is an online mapping tool that shows these articles and Wikidata items, colour-coded according to whether they have an image. It also allows users to upload images, but they need to register an account first on Wikimedia Commons. The images do not have to be professional quality, and photos taken with a smartphone or cheap digital camera are often suitable. As more images are uploaded, red dots disappear from the map.

A WikiShootMe scan around the North end of St. Giles, Oxford

So for an event or campaign that gets people engaged with local history, public art, or architecture, the group can decide on places to photograph and describe, then go to the location, and either upload their images from home or return to a central computer room and transfer images from their devices.

A tip to monitor contributions: when uploading an image, users are prompted for image categories. If they all add the same category, then it is possible to track images uploaded with that tag using PetScan (explained below). However, categories are case-sensitive so you have to make sure people type the category tag exactly as instructed. Commons helps by colouring the text red if it does not correspond to an existing category.

Instructions to attendees:

  • Create an account on commons.wikimedia.org
    • A Wikipedia account will work if you already have one of those.
  • Open in another tab
  • Click on the red dot on the map where your photo was taken
  • Press the button to ‘Authorise uploading’
  • Click ‘Allow’. This will permit WikiShootMe to accept your photo, and return you to the map.
  • Navigate through the map to the red dot again. This time when you click the dot, the button says ‘Upload image’
  • Select the image on the computer or device
  • Give the image a title, description (say what you’ve photographed, e.g. the address of the building) and date.
  • In the Category box, type Buildings in Oxford with that exact capitalization.

PetScan is a tool for customised queries  If you are running an event or campaign where people create or upload images to a given category, use this procedure to get an overview of their contributions.

  • Go to https://petscan.wmflabs.org/
  • Click ‘Commons’
  • Enter the category “Buildings in Oxford”
  • Select the Page properties tab and click the checkbox next to File.
  • Select the Output tab, then choose Sort by date, Sort order descending.
  • Click ‘Do it!’ and on the resulting page, bookmark the link ‘for the query you just ran’.

This gives you a list of images in the category, most recent additions first. Clicking on an entry in the list will take you to the full description of that file, including the user profile of the uploader.

Other quick ideas

  • Use a biographical source to add individual facts, such as universities attended or birthplaces, to Wikidata entries or Wikipedia articles.
  • Examine a free image source (e.g. Europeana’s World War I collection) and find Wikipedia articles that the images can illustrate.
  • Search through audio archives for short clips that can be uploaded to Commons and embedded in Wikipedia articles about a person or event.

This post licensed under a CC-BY-SA 4.0 license