Creating Wikipedia articles from research data

Hillfort images shared on Wikimedia Commons

The Atlas of Hillforts of Britain and Ireland is a collaboration between the Universities of Oxford, Edinburgh and Cork, funded by the Arts and Humanities Research Council. It provides a definitive list of hillfort sites in the British Isles- more than four thousand in total. As well as publishing a lot of fieldwork done by expert archaeologists, the site uses crowdsourcing, in that some of the sites were visited by volunteer investigators. The site invites users—expert or amateur—to submit their own photographs of the hillforts.

The Atlas launched in June 2017 and generated national media coverage. An issue for any newly-launched site is how to get incoming links from other sites; how to plumb the site into the existing paths by which people find information. This case study describes how, by sharing selected data from the Atlas, we were able to create thousands of incoming links from Wikipedia and related apps and sites, and to encourage the creation and use of hillfort articles in Wikipedia.

What to share?

For any database that wants to join up with the open web, there is a question of what and how much to share. The Atlas has a prose description of each hillfort as well as detailed, structured data from visits. This primary research is the “meat” of the site and the links we create should drive interest towards it.

After reviewing the data fields in the Atlas, and the feasibility of sharing them with Wikidata, we settled on enough fields to answer four basic questions.

  • What is it called? both in the sense of human language and identifiers in an authority file. A single hillfort site can be known by several different names. Damerham Knoll Camp in Hampshire is also known as Damerham Camp, Knoll Camp, Rockbourne Camp or Rockbourne Knoll Camp. Many hillforts have Scheduled Ancient Monument numbers provided by heritage registers, and Wikidata already had a lot of data from these registers.
  • Where is it? Most important for locating hillforts on a map was the latitude/longitude pair but we also imported a National Grid Reference. The Atlas includes the county and parish. Wikidata has a property for the administrative territory (usually a county) that a place is situated in, and already represents the counties, cities and similar administrative regions of the UK and Ireland. For instance it knows that Powys is a principal area in Wales which abuts other counties and principal areas.
  • What is it? Each individual item in Wikidata needs an instance of property to say whether it describes a person, event, structure and so on. The Atlas uses its own classification of eight types of hillfort. Before importing the data, we created Wikidata entries for each of those types, so new items could have the property instance ofcontour fort, for example.
  • Most crucial was to answer Where is the authoritative information about it? by providing the link to the relevant Atlas entry.

Building links into the database

This work required one design change to the Atlas while it was being constructed. It had to be possible to directly link individual entries by adding each hillfort’s identifier onto a stem. Wikidata uses this process to turn identifiers into active links. This also enables citation of the individual Atlas entries in Wikipedia. A Wikipedia volunteer created a template that, given an Atlas ID, makes a citation to that entry in the Atlas. Bookmarkable links for each entry also make the site more compatible with social media: users can share the link about the hillfort they are interested in, rather than pointing to the whole Atlas and telling people to search for the specific hillfort.

The Atlas ID can also be used as an authority file identifier within another database. Wikimedia Commons has more than four thousand images of hillforts in the UK or Ireland. Often the descriptions of these images are scant. We added links from Wikimedia Commons images and categories to the Atlas and state the Atlas identifier in the description. We also added these images and categories to the hillforts’ entries in Wikidata. As of 24th October, there were 205 links from Wikimedia Commons to the Atlas. This work is ongoing as more Commons images are matched with Atlas records.

Commons holds tens of millions of images, with more being added all the time, and only a tiny proportion of newly uploaded images are of hillforts. To find the relevant uploads we used a tool called PETscan, which found newly uploaded images that had been appropriately tagged. We also shared instructions for uploaders which explained how to add the tags.

How are the data used?

We shared data about 4,147 hillforts, each with its four-digit identifier. As well as sourcing for existing articles, the shared data allowed us to see how incomplete Wikipedia’s coverage was for this topic, and what needed to be added.

Some language versions of Wikipedia have ”article placeholders”; pages that give basic information and links about a topic when no article has been written for it. These data and links are drawn from Wikidata, so the hillforts appear with  Welsh Wicipedia lacks an article about Hen Gastell hillfort near Swansea, but gives some basic facts, cited to the relevant entry in the Atlas. Multiple versions of Wikipedia use article placeholders, so the incoming links to the Atlas are several times the 4,147 records shared.

Wikipedia has a tool to build a map of all the places mentioned in an article, given in the form of a list of latitude/longitude pairs. It also has a Nearby function, most useful in the mobile app, which shows articles about things within a few miles of the user.

Since this is freely reusable data, other people can build apps and tools from it. Monumental is one such application: it helps people to find officially-recognised monuments around the world, either in a named location or in the vicinity of the user.

A hillfort in Monumental

During the month of September, Wikimedia ran a global campaign called Wiki Loves Monuments, in which the public were encouraged to upload their photographs of monuments. The target locations were chosen in a site and app powered by Wikidata, so hillforts from the Atlas were included. There were 31 hillfort photographs shared during the campaign, by eight different users. Photographs continue to stream in after the campaign: a tool called WikiShootMe, available all year round, shows users locations that lack a photograph, and gives them an easy interface to upload one.

Visualisation and reconciliation

In the data dump from the Atlas, the types of hillfort and the counties needed to be identified by Wikidata’s internal codes rather than by their English names. For example, “Aberdeenshire” needed to be replaced by “Q189912”. OpenRefine and Google Sheets (discussed in another case study) both helped with this. The most time-consuming task was finding if there were existing Wikidata records for the imported data to be matched against. A large number of English Heritage sites had already been imported into Wikidata, and these provided Scheduled Ancient Monument numbers that could give an automatic match. However, the match was not always perfect, as discussed later.

Map of imported sites from the Atlas, colour-coded by type of hillfort

We did not want to match records automatically based on geographic location, since there are often multiple hillforts in one small area. The most helpful tool for matching records was based on a technique previously used when importing records about Biosphere nature reserves as part of UNESCO’s partnership with Wikimedia. The Wikidata query service can display a set of query results as a map. We created two maps; one for sites imported from the Atlas, the other for hillfort records in Wikidata that lacked an Atlas identifier. By zooming in on both, we could find sites with the same location and then check each record to decide whether to merge them. As the matching progressed, dots disappeared from the latter map.

Maps generated from Wikidata can colour-code and filter by type of item. Thus we could create maps that show immediately that promontory forts are concentrated on Western coasts, while hillslope forts are concentrated in South East Scotland.

Data complications

In the process of matching the imported data against what already existed in Wikipedia and Wikidata, we discovered some cases where the match was not exact, and which called for more sophisticated knowledge representation.

Some Scheduled Ancient Monument numbers correspond nearly but not exactly with what is described in the Atlas. For instance, the Historic England entry for Ritton Castle, SAM number 1020150, covers both an ancient hillfort and a medieval castle. One property in Wikidata that turned out to be useful was has part. Another useful property was structure replaced by. With these, we could connect separate records for a heritage site and for the hillfort that forms part of it, or for a prehistoric castle and a medieval castle built on the same site. Another useful property is “located on terrain feature”, for instance to distinguish a mountain from a hillfort located on it.

With Wikipedia articles, there was a similar problem of not-quite-exact matches. Wikipedia will sometimes have an article about a village, mentioning that the village includes a church and is located on the site of an ancient hillfort. In terms of structured data, the church, village, hillfort, and hill deserve separate representations. They have different properties, and likely have different identifiers in external databases. In practice, this means some “List of hillforts…” articles in Wikipedia have entries that are not hillforts, but villages or land features where a hillfort is found.

Creating lists in Wikipedia

Listeria is a bot written, like many of the tools in this case study, by Magnus Manske. It takes a Wikidata query and generates a list-format Wikipedia article with links. When run on the English Wikipedia, it creates an English-language article; on the Russian Wikipedia, it outputs Russian, and so on. The Wikipedia communities differ in their attitudes to it: Catalan Wikipedia, for example, uses Listeria to create lists of artists’ works and other list articles. English Wikipedia does not allow Listeria to create articles directly, but it can be used to create drafts which are then reviewed and added to the encyclopaedia by human users.

We customised the output of Listeria somewhat so that each list item included a citation of the relevant page in the Atlas. Since we are generating articles with hundreds of items, file size was another concern, so the customisation also tried to make informative lists with minimal HTML. Each statement in Wikidata can be tagged with its source, so we could query for only the facts about a hillfort that were stated in the Atlas, as the most up-to-date and comprehensive source. List of hillforts in Ireland is one of the articles created using this method. Each item in the list has a link to the relevant entry in the Atlas, totalling 475 links in this case.

Behind the scenes of Wikipedia, there are many project pages used to organise work that improves a specific topic, such as archaeology, the Celts, or ancient sites in Somerset. We created a project page specifically for the hillfort data import, explaining what had been imported and explaining activities that could be done. We also posted notices on several existing project pages to inform the volunteers who work on those articles.

As a result of this work, there are many hundreds of links to the Atlas from articles in the English Wikipedia, in addition to the thousands of incoming links from apps and other Wikipedia languages explained earlier. Only a small proportion of the hillforts have articles in English Wikipedia, but the red links mark them out as topics that ought to have articles eventually. Even without complete articles on Wikipedia, the list articles help users answer “What hillforts are in my area, and where can I find out more about them?”

 

One thought on “Creating Wikipedia articles from research data

  1. Pingback: A guide to the past: hillforts and Wikimedia | the Wikimedia UK blog!