EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database. The general aim of EEBO-TCP is to encode one copy (usually the first edition) of every monographic English-language title published between 1473-1700 available in EEBO.
EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (TEI).
The EEBO-TCP project was divided into two phases. The 25,363 texts created during Phase 1 of the project have been released into the public domain as of 1 January 2015. Anyone can now take and use these texts for their own purposes, but we respectfully request that due credit and attribution is given to their original source.
Users should be aware of the process of creating the TCP texts, and therefore the assumptions that can be made about the data.
Text selection was based on the New Cambridge Bibliography of English Literature (NCBEL). If an author (or for an anonymous work, the title) appears in NCBEL, then their works are eligible for inclusion. Selection was intended to range over a wide variety of subject areas, to reflect the true nature of the print record of the period. In general, first editions of a works in English were prioritized, although there are a number of works in other languages, notably Latin and Welsh, included and sometimes a second or later edition of a work was chosen if there was a compelling reason to do so.
2. Accuracy and the production process
Image sets were sent to external keying companies for transcription and basic encoding. Quality assurance was then carried out by editorial teams in Oxford and Michigan. 5% (or 5 pages, whichever is the greater) of each text was proofread for accuracy and those which did not meet QA standards were returned to the keyers to be redone. After proofreading, the encoding was enhanced and/or corrected and characters marked as illegible were corrected where possible up to a limit of 100 instances per text. Any remaining illegibles were encoded as <GAP>s. Understanding these processes should make clear that, while the overall quality of TCP data is very good, some errors will remain and some readable characters will be marked as illegible. Users should bear in mind that in all likelihood such instances will never have been looked at by a TCP editor.
3. Special characters and non-Roman alphabets
Special characters such as alchemical or astronomical symbols have been captured, but material in non-Roman alphabets has not been captured, except where it forms part or all of the title of a work, or there is another practical reason to include it. Any handwritten material has been excluded, and damage is indicated usually by a <GAP DESC=”illegible”>, but no information about the reason for damage is given.
4. Consistency and depth of encoding
Encoding of TCP texts is, in the main, structural. Textual divisions (and hierarchies) are defined and given a type. Features such as lists, tables, speeches, speakers, stage directions, verse (<l> and <lg>) and prose (<p>) are encoded, as are quotations and letters (including opening and closing elements). Features which would require more significant editorial input or specialist expertise, such as complex mathematical or musical notation are not. The presence of such complex material is indicated by e.g. <GAP DESC=”math”> or <GAP DESC=”music”>.
Every effort has been made to be consistent across the project, but inevitably differences in the encoding of similar features will be found, especially when comparing text created in the very early days of the project with those created years later when editorial guidelines were more fully established.
Further information may be found on the Text Creation Partnership (TCP) website.