Text Encoding Initiative
The XML Version of the TEI Guidelines
18 Transcription of Primary Sources
This chapter defines an optional additional tag set intended for use in the transcription of primary sources, in particular manuscripts, and describes how some elements defined in the core tag set should be used for this work. It is expected that this tag set will also be useful in the preparation of critical editions, but the tag set defined here is distinct from that defined in chapter 19 Critical Apparatus, and may be used independently of it.
Scholars may wish to record information concerning individual readings of letters, words or larger units, both within transcriptions and within editions. They may also wish to include other editorial material within transcriptions, such as comments on the status or possible origin of particular readings, corrections, or text supplied to fill lacunae. Further, it is customary in transcriptions to register certain features of the source, such as ornamentation, underlining, deletion, areas of damage and lacunae. This chapter indicates means to record such information:
These recommendations are not intended to meet every transcriptional circumstance likely to be faced by any scholar. Rather, they should be regarded as a base which can be elaborated if necessary by different scholars in different disciplines, with distinct scholarly domains eventually developing their own document types. In time, the feature structure notation developed in chapter 16 Feature Structures, may also permit scholars to tailor the encoding of complex transcriptional information in ways not here anticipated.
It should be noted that this chapter focuses primarily upon problems associated with the transcription of manuscript materials, and that consequently problems of codicology other matters peculiar to early printed materials are not specifically addressed here. Nevertheless, many of the recommendations presented may — mutatis mutandis — also be applied in the encoding of printed matter. We are conscious that a great deal of work remains to done in these areas, and that the encoder will need to take even more individual responsibility than usual in applying the recommendations of this chapter in such contexts, but believe that these recommendations form a good basis for such future work.
Many of the descriptions below use terms like `scribe', `author', `editor', `annotator', `corrector', `transcriber', and `encoder', to make clear how they apply in cases where these roles are distinct. To the extent that these roles are not distinct (for example, in authorial manuscripts where the author and the scribe are the same person) the interpretation of the markup should be adjusted appropriately. Many of the elements defined here apply (within limits) also in cases of printed materials, so `compositor', etc., may also be understood as applying where appropriate.
As a rule, all elements which may be used in the course of a transcription of a single witness may also be used in a critical apparatus, i.e. within the elements proposed in chapter 19 Critical Apparatus. This can generally be achieved by nesting a particular reading containing tagged elements from a particular witness within the <rdg> element in an <app> structure.
Just as a critical apparatus may contain transcriptional elements within its record of variant readings in various witnesses, one may record variant readings in an individual witness by use of the apparatus mechanisms <app> and <rdg>. This is discussed in section 19.3 Using Apparatus Elements in Transcriptions.
The tag set defined in this chapter may be selected using the mechanisms described in section 3.3 Invocation of the TEI DTD; in a document using this tag set, the document-type-declaration subset should contain the following declaration of the parameter entity TEI.transcr, or the equivalent:
<!ENTITY % TEI.transcr 'INCLUDE' >In an XML document using this tag set together with that for textual criticism and the base tag set for verse, the entire document type declaration might resemble the following:
<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "tei2.dtd" [ <!ENTITY % TEI.XML 'INCLUDE' > <!ENTITY % TEI.prose 'INCLUDE' > <!ENTITY % TEI.transcr 'INCLUDE' > <!ENTITY % TEI.textcrit 'INCLUDE' > ]>
<!-- 18.: Transcription of Primary Sources--> <!--Text Encoding Initiative Consortium: Guidelines for Electronic Text Encoding and Interchange. Document TEI P4, 2002. Copyright (c) 2002 TEI Consortium. Permission to copy in any form is granted, provided this notice is included in all copies. These materials may not be altered; modifications to these DTDs should be performed only as specified by the Guidelines, for example in the chapter entitled 'Modifying the TEI DTD' These materials are subject to revision by the TEI Consortium. Current versions are available from the Consortium website at http://www.tei-c.org--> [declarations from 18.1.4: Added and Deleted Spans inserted here ] [declarations from 18.1.6: Cancelled Deletions inserted here ] [declarations from 18.1.7: Supplied Text inserted here ] [declarations from 18.2.1: Hand Shifts inserted here ] [declarations from 18.2.3: Damage and Illegiblity inserted here ] [declarations from 18.2.5: Spaces in the source inserted here ] [declarations from 18.3: Headers and footers inserted here ] <!-- end of 18.-->
<!-- 18.: Attributes for Transcription of Primary Sources--> <!--Text Encoding Initiative Consortium: Guidelines for Electronic Text Encoding and Interchange. Document TEI P4, 2002. Copyright (c) 2002 TEI Consortium. Permission to copy in any form is granted, provided this notice is included in all copies. These materials may not be altered; modifications to these DTDs should be performed only as specified by the Guidelines, for example in the chapter entitled 'Modifying the TEI DTD' These materials are subject to revision by the TEI Consortium. Current versions are available from the Consortium website at http://www.tei-c.org--> <!ENTITY % a.edit ' resp IDREF %INHERITED; cert CDATA #IMPLIED'> <!-- end of 18.-->
In the detailed transcription of any source, it may prove necessary to record various types of actual or potential alteration of the text: expansion of abbreviations, correction of the text (by the author, by a scribe, by a later hand, by previous editors or scholars, or by the current editor or encoder), addition, deletion, or substitution of material, and the like. The sections below describe how such phenomena may be encoded using either elements defined in the core tag set (defined in chapter 6 Elements Available in All TEI Documents) or specialized elements available only when the additional tag set described in this chapter is available.
In transcribing individual sources of any type, encoders may record their corrections, normalizations, expansions of abbreviations, additions, and omissions using the elements described in section 6.5 Simple Editorial Changes. Those particularly relevant to this chapter include:
When the additional tag set for transcription of primary sources is selected, these elements all gain two specialized attributes for specifying who is responsible for certain aspects of the interpretation and markup, and the certainty attributed to the interpretation:
The following sections describe how the core elements just named may be used in the transcription of primary source materials. Examples of more complex application in scholarly transcriptions of these core elements are given, and of their extension by linkage with the <note>, <respons>, and <certainty> elements. Where the core elements do not satisfy the needs of scholarly transcription, additional elements are defined.
The writing of manuscripts by hand lends itself to the use of abbreviation to shorten scribal labour. Commonly occurring letters, groups of letters, words or even whole phrases, may be represented by significant marks. This phenomenon of manuscript abbreviation is so widespread and so various that no taxonomy of it is here attempted. Instead, methods are shown which allow abbreviations to be encoded using the core elements mentioned above.
A manuscript abbreviation may be viewed in two ways. One may transcribe it as a particular sequence of letters or marks upon the page: thus, a ‘p with a bar through the descender’, a ‘superscript hook’, a ‘macron’. One may also interpret the abbreviation in terms of the letter or letters it is seen as standing for: thus, ‘per’, ‘re’, ‘n’. Both of these views are supported by these Guidelines. The entity reference system allows the encoder to declare whatever entities are needed, using entity names like p-underbar, sup-hook, or macron. Furthermore, each entity reference may be linked to an image of the abbreviation itself, so that the reader might see a rendering of the text's appearance. Alternatively, the encoder may transcribe the letter or letters he or she believes the abbreviation stands for, as the content of an <expan> element: thus
<expan>per</expan> <expan>re</expan> <expan>n</expan>
These two methods of coding abbreviation may also be combined. An encoder may record, for any abbreviation, both the sequence of letters or marks which constitutes it, and its sense, that is, the letter or letters for which it is believed to stand. For example, the abbreviations of ‘euery persone’ in the following fragment137 may be transcribed as follows, using the <expan> element, with the abbr attribute to hold an entity reference for the brevigraph or other sign indicating the abbreviation in the manuscript:
eu<expan abbr="&er;" resp="mp">er</expan>y <expan abbr="&p-underbar;">per</expan>sone that loketh after heuen hath a place in this ladderAlternatively, the abbreviations may be encoded using the <abbr> element.
eu<abbr expan="er" resp="mp">&er;</abbr>y <abbr expan="per">&p-underbar;</abbr>sone that loketh after heuen hath a place in this ladder
The choice between the <expan> and <abbr> elements is left to the encoder. As a rule, the <abbr> element should be preferred where it is wished to signify that the content of the element is an abbreviation, without necessarily indicating what the abbreviation may stand for. The <expan> element should be used where it is wished to signify that the content of the element is an expanded text, without necessarily indicating the abbreviation used in the original. The decision as to which (<abbr> or <expan>) to use may vary from abbreviation to abbreviation; there is no requirement that the one system be used throughout a transcription. However, processing may be simplified if one only of these is used throughout a transcription. The choice is likely to be a matter of editorial policy, which might be applied consistently throughout. If the highest priority is to transcribe the text literatim, while indicating the presence of abbreviations, the choice will be to use <abbr> throughout. If the highest priority is to present a reading transcription, while indicating that some letters or words are expansions of abbreviations, the choice will be to use <expan> throughout.
Further information may be attached to instances of these elements by the <note> element, on which see section 6.8 Notes, Annotation, and Indexing, and by use of the resp and cert attributes. In this instance from the English Brut,138 a note is attached to an editorial expansion of the tail on the final d of ‘good’ to ‘goode’:
For alle the while that I had good<expan id="exp01" abbr="&tail;">e</expan> I was welbelouedThen the note:
<note target="exp01">The stroke added to the final d could signify the plural ending (-es, -is, -ys>) but the singular <hi rend="it">good</hi> was used with the meaning <q>property</q>, <q>wealth</q>, at this time (v. examples quoted in OED, sb. Good, C. 7, b, c, d and 8 spec.)</note>The editor might declare a degree of certainty for this expansion, based on the OED examples, and state the responsibility for the expansion:
For alle the while that I had good<expan abbr="&tail;" resp="mp" cert="90">e</expan> I was welbelouedObserve that the cert and resp attributes may be used with the <expan> element only to indicate respectively confidence in the content of the element (i.e. the expansion), and confidence in the responsibility for suggesting this expansion. In the case of the use of these attributes with the <abbr>, the cert and resp attributes are defined as indicating respectively confidence in the expansion held in the expan attribute and the responsibility for suggesting this expansion. The above example could be encoded using the <abbr> element as follows:
For alle the while that I had good<abbr expan="e" resp="mp" cert="90">&tail;</abbr> I was welbelouedIf it is desired to express aspects of certainty and responsibility for some other aspect of the use of these elements, then the mechanisms discussed in chapter 17 Certainty and Responsibility should be used. See also 18.2.2 Hand, Responsibility, and Certainty Attributes for discussion of the issues of certainty and responsibility in the context of transcription.
If more than one expansion for the same abbreviation is to be recorded, multiple notes may be supplied. It may also be appropriate to use the markup for critical apparatus; an example is given in section 19.3 Using Apparatus Elements in Transcriptions.
The <sic> and <corr> elements, defined in the core tag set, may be used to register authorial or scribal corrections within a witness. For example, in the manuscript of William James's A Pluralistic Universe, edited by Fredson Bowers (Cambridge: Harvard University Press, 1977) a sentence first written
One must have lived longer with this system, to appreciate its advantages.has been modified by James to begin ‘But one must ...’, without the inital capital O having been reduced to lowercase. This non-standard orthography could be recorded and corrected thus:
But <sic corr="one">One</sic> must have lived ...The same information could be conveyed by the <corr> element:
But <corr sic="One">one</corr> must have lived ...In this example from Albertus Magnus,139 both the manuscript error ‘angues’ and its correction ‘augens’ are registered by the <sic> element:
Nos autem iam ostendimus quod nutrimentum et <sic corr="augens">angues</sic>.The same information could be conveyed by the <corr> element:
Nos autem iam ostendimus quod nutrimentum et <corr sic="angues">augens</corr>.
As with the choice between <expan> and <abbr>, the choice between the synonymous <sic> and <corr> elements is left to the encoder. As a rule, the <sic> element allows the encoding to retain the original text as the content of the element, while simultaneously signifying that the contents of the element require correction, but without necessarily indicating what the correction may be. The <corr> element allows the text to be corrected, possibly without recording the details of the faulty source, while still marking explicitly the fact that the contents of the element have been corrected. The choice is likely to be a matter of editorial policy, which might be applied consistently throughout or decided case by case. If the highest priority is to present an uncorrected transcription while noting perceived errors in the original, the choice will typically be to use <sic> throughout. If the highest priority is to present a reading transcription, while indicating that perceived errors in the original have been corrected, the choice will be to use <corr> throughout.
Further information may be attached to instances of these elements by the <note> element and resp and cert attributes. Here, two separate corrections in Dudo of S. Quentin140 are assigned the same note. First the corrections, held in the attribute value of the <sic> elements:
quamuis <sic id="sic01" corr="iners">mens</sic> que nutu dei gesta sunt ... unde esset uiriliter <sic id="sic02" corr="uegetata">negata</sic>then the note, linked to the id of the <sic> element for each of the two corrections:
<note target="sic01 sic02">Substitution of a more familiar word which resembles graphically what the scribe should be copying but which does not make sense in the context.</note>
The cert attribute may also be used with the <corr> element to signify the conjectural status of a particular editorial reading, with the resp attribute used to identify the scholar responsible for the conjecture. In this example, editorial confidence in E. Talbot Donaldson's emendation of the Hengwrt manuscript reading ‘wight’ to ‘wright’ in line 117 of Chaucer's The Wife of Bath's Prologue may be marked as follows:
Telle me also, to what conclusioun Were membres maad, of generacioun And of so parfit wis a <corr id="c117" sic="wight" resp="ETD" cert="70">wright</corr> ywroght?The editor might also conveniently add a note referring to Donaldson's discussion of this passage:
<note target="c117">This emendation of the Hengwrt copy text, based on a Latin source and on the reading of three late and usually unauthoritative manuscripts, was proposed by E. Talbot Donaldson in <bibl><title>Speculum</title> 40 (1965) 626–33.</bibl></note>
Alternative corrections within a transcription of a single witness may be held within an <app> structure, in the same way that alternative expansions are so grouped in the example given in section 19.3 Using Apparatus Elements in Transcriptions. Here, Donaldson's conjectured emendation of the Hengwrt manuscript may be recorded not only alongside the editorial transcription but also alongside another conjecture:
And of so parfit wis a <app> <rdg wit="Hg">wight</rdg> <rdg wit="Ln Ry2 Ld" resp="ETD"> <corr>wright</corr> </rdg> <rdg wit="Gg" resp="PR"> <corr>wyf</corr> </rdg> </app>
Observe that no resp attribute is necessary for the base transcription: by default, responsibility is assigned to the scholar(s) responsible for the transcription, as identified in the TEI header. The conjectures are held within <corr> elements, contained within the <rdg> elements. The resp attribute identifying responsibility for each correction is attached to the outer <rdg>, and inherited by the inner <corr> element. Note too that the support for these conjectures in other manuscripts can be noted in the wit attribute in the <rdg> element.
The cert and resp attributes may be used with the <corr> element only to indicate respectively confidence in the content of the element (i.e. the correction), and confidence in the responsibility for suggesting this correction or conjecture. In the case of the use of these attributes with the <sic> element, the cert and resp attributes are defined as indicating respectively confidence in the conjecture held in the corr attribute and the responsibility for suggesting this conjecture. The above example could be encoded using the <sic> element as follows:
And of so parfit wis a <sic corr="wright" resp="etd" cert="70">wight</sic> ywroght?If it is desired to express aspects of certainty and responsibility for some other aspect of the use of these elements, then the mechanisms discussed in chapter 17 Certainty and Responsibility should be used. See also 18.2.2 Hand, Responsibility, and Certainty Attributes for discussion of the issues of certainty and responsibility in the context of transcription.
As described in section 6.5 Simple Editorial Changes, the <add> element indicating material added may be used to signify manuscript additions or insertions, be they authorial or scribal. In the autograph manuscript of Max Beerbohm's The Golden Drugget,141 the author's addition of "do ever" may be recorded as follows, with the hand attribute indicating that the addition was Beerbohm's:
Some things are best at first sight. Others — and here is one of them — <add hand="mb">do ever</add> improve by recognitionSimilarly, the <del> element indicating material deleted may be used to signify manuscript deletions. In the autograph manuscript of D. H. Lawrence's Eloi, Eloi, lama sabachthani142, the author's deletion of ‘my’ may be recorded as follows. As well as the hand attribute indicating that the deletion was Lawrence's, the rend attribute indicates that the deletion was by strike-through:
For I hate this <del rend="strikethrough" hand="dhl">my</del> body, which is so dear to meIf deletions are classified systematically, the type attribute should normally be used to indicate the classification; when they are classified by the manner in which they were effected, or by their appearance, however, this will lead to a certain arbitrariness in deciding whether to use the type or the rend attribute to hold the information. In general, it is recommended that the rend attribute be used for description of the appearance or method of deletion, and that the type attribute be reserved for higher level or more abstract classifications.
Further characteristics of the addition and deletion, e.g. the date, or ink, may be needed for detailed transcription of manuscripts. Such characteristics may conveniently be recorded as attributes of the <add> or <del> element. The specific attributes required may be added to the formal declaration of these elements by using the techniques described in chapter 29 Modifying and Customizing the TEI DTD.
The <add> and <del> elements defined in the core tag set available in all TEI documents will suffice for describing typically brief additions and deletions in the text being transcribed. On occasion, it will be necessary to record an addition or deletion which crosses a structural boundary in the text being encoded, for example the addition or deletion from a manuscript of a section containing several distinct structural subdivisions, such as poems or prose items. These are most conveniently encoded using the <addSpan> and <delSpan> elements, available in the additional tag set defined in this chapter. In this example of the use of <addSpan>, the insertion of a gathering containing four neo-Eddic poems into Landsbókasafn143 by Helgi Ólafsson is recorded as follows. A <hand> element is first declared, within the header of the document, to associate the identifier HEOL with Helgi. In the body of the text, an <addSpan> element is placed to mark the beginning of the span of added text. The hand attribute ascribes the responsibility for the addition to the manuscript to Helgi, and the to attribute declares the identifier for the anchor which marks the end of the added text:
<hand id="heol" n="Helgi Ólafsson"/> <!-- text of the original material ... --> <addSpan type="added gathering" hand="heol" to="p025"/> <!-- text of the four neo-Eddic poems added... --> <anchor id="p025"/> <!-- text of the original material continues... -->
In this example of the use of the <delSpan> element, a full two lines of Thomas Moore's autograph of the second version of Lalla Rookh144 are marked for omission by vertical strike-through. The two lines cross the structural line division marked <l n='2'>, so it would not be possible to use a single <del> element, since it would have to span the <l> marker. The lines also themselves include a further deletion and addition. The <delSpan> element indicates the begining of the span marked for deletion, with the to attribute giving the identifier delend01 for an <anchor> element which marks the end of the span of text so marked:
<l n="1"> <delSpan rend="vertical strike" to="delend01"/> Tis moonlight <del>upon</del> <add>over</add> Oman's sky</l> <l n="2">Her isles of pearl look lovelily<anchor id="delend01"/></l>
The text deleted must be at least partially legible, in order for the encoder to be able to transcribe it. If it is not legible at all, the <gap> element should be used to signal that the text was not transcribed, because it could not be; the reason attribute can give the cause of the omission from the transcription as ‘deletion, illegible’. The <gap> element may optionally be enclosed by a <del> element, if it is thought useful to record the deletion explicitly using this element. If the deleted text is partially legible, the <unclear> element described in section 18.2.3 Damage, Illegibility, and Supplied Text should be used to signal the areas of text which cannot be read with confidence; it too may be enclosed within a <del> element. See further section 18.1.7 Text Omitted from or Supplied in the Transcription and section 18.2.3 Damage, Illegibility, and Supplied Text.
<!-- 18.1.4: Added and Deleted Spans--> <!ELEMENT addSpan %om.RO; EMPTY> <!ATTLIST addSpan %a.global; type CDATA #IMPLIED place CDATA #IMPLIED resp IDREF %INHERITED; cert CDATA #IMPLIED hand IDREF %INHERITED; to IDREF #REQUIRED TEIform CDATA 'addSpan' > <!ELEMENT delSpan %om.RO; EMPTY> <!ATTLIST delSpan %a.global; type CDATA #IMPLIED resp IDREF %INHERITED; cert CDATA #IMPLIED hand IDREF %INHERITED; to IDREF #REQUIRED status CDATA "unremarkable" TEIform CDATA 'delSpan' > <!-- end of 18.1.4-->
Substitution of one word or phrase for another is perhaps the most common of all phenomena requiring special treatment in transcription of primary textual sources. It may be simply one word overwriting another, or deletion of one word and its replacement by another written above it by the same hand at the one time; the deletion and replacement may be done by different hands at different times; there may be a long chain of substitutions on the one stretch of text, with uncertainty as to the order of substitution and as to the final reading.
Three different methods may be used to express substitution of one stretch of text by another:
The use of all three of these is illustrated in the following encodings of the second line of Eloi, Eloi, lama sabachthani from the Lawrence manuscript mentioned above. Lawrence first wrote ‘How it galls me, what a galling shadow’. Subsequently, he deleted ‘galls’ and wrote ‘dogs’ above the deletion.
This substitution could be registered using the first method outlined above, as a correction using the <sic> or <corr> elements. Note the use of the resp attribute on the <corr> element to assign the correction to Lawrence. (For further information on the hand and resp attributes, see section 18.2.2 Hand, Responsibility, and Certainty Attributes.)
How it <corr sic="galls" resp="DHL">dogs</corr> me, what a galling shadowThis substitution could be registered using the second method outlined above, using the <del> and <add> elements in sequence to reflect the fact that text was first deleted then other text inserted:
How it <del type="overstrike" hand="dhl">galls</del> <add place="supralinear" hand="dhl">dogs</add> me, what a galling shadowThis substitution could be registered using the third method outlined above, using the <del> and <add> elements within an <app> structure to indicate that the deleted and added texts are variants of one another. Note that within the <app> structure the hand attribute is moved from the inner <del> and <add> elements to the outer <rdg> element:
How it <app> <rdg hand="dhl"> <del type="overstrike"> galls</del> </rdg> <rdg hand="dhl"> <add place="supralinear"> dogs</add> </rdg> </app> me, what a galling shadowEach of these three methods has its particular advantages and disadvantages. The first method (use of <sic> or <corr>) is compact and indicates clearly that one text is a substitute for another. However, it provides no clear means of stating how the substitution is effected: whether by deletion through strike-through, or underdotting, or erasure, followed by interlinear insertion, or marginal insertion. (The global rend attribute might conceivably be used, but this may not be thought an obvious place to put such information.) In a transcription where this information is not felt to be important, however, this method will suffice to indicate simple cases of direct substitution of one text for another.
The second method (use of a <del> and <add> sequence) is also compact and provides means for exact declaration of how the deletion and insertion are effected. However, it does not indicate explicitly that one text is a substitute for another. It is left for the reader or the application to infer from the <del> and <add> sequence that the insertion is to be taken as a substitution for the deletion. In many transcriptions, the inference may be safely drawn for simple cases of direct substitution of one text for another. In other transcriptions, for example of complex authorial manuscripts, this inference may prove fragile; those who desire to express clearly that an adjacent addition and deletion are not independent but constitute a single act of substitution will therefore wish to avoid this method. Others, of course, may prefer it for precisely the same reason, namely that it avoids prejudging the issue of whether adjacent deletions and additions are independent or joined.
The third method (use of the <del> and <add> elements within an <app> structure) provides means both for exact declaration of how the deletion and insertion are effected and for explicit indication that one text is a substitute for another. Further, the exact sequence of readings may also be declared by use of the varSeq attribute on the <rdg> element, as follows:
How it <app> <rdg varSeq="1" hand="dhl"> <del>galls</del> </rdg> <rdg varSeq="2" hand="dhl"> <add>dogs</add> </rdg> </app> me, what a galling shadowHere, the combination of the hand and varSeq attributes suffices to inform the reader of the authorial substitution of ‘dogs’ for ‘galls’.
Similarly, the varSeq attribute might be used in a transcription of the manuscripts of James Joyce's Ulysses to indicate the sequence of Joyce's corrections which is implicit in Hans Walther Gabler's reconstruction of the ‘overlay’ levels of Joyce's transcriptions. This third method is the most powerful and unambiguous of the three methods and enables the widest range of processing possibilities, at the expense of introducing a heavier burden of markup into the text. Production of such documents should therefore not be undertaken without markup-aware editors. Applications of some sophistication may be needed to make full use of all the information that may be held within an <app> structure. In the absence of such applications, scholars may feel that the present cost of the more informative coding using <app> structures outweighs the future benefits. In making such decisions, it should however be kept in mind that the capabilities of software at the time a project begins will often be wholly irrelevant when the project is completed some years later.
The Lawrence example above shows the three methods used for encoding a single substitution of one reading for another. The same three methods may also be used to encode longer sequences of substitutions. In the example from William James, first written out by James as ‘One must have lived longer with this system, to appreciate its advantages’ the word ‘this’ is first replaced by ‘such a’ and this is then replaced by ‘a’. 145 This may be encoded using the first method, with the sequence of substitutions shown by the nesting of <corr> elements:
One must have lived longer with <corr sic="this"><corr sic="such a">a</corr></corr> system, to appreciate its advantages.
One must have lived longer with <del>this</del> <del><add>such a</add></del> <add>a</add> system, to appreciate its advantages.Note the nesting of an <add> element within a <del> to record text first added, then deleted in the source.
One must have lived longer with <app> <rdg varSeq="1"><del>this</del></rdg> <rdg varSeq="2"><del><add>such a</add></del></rdg> <rdg varSeq="3"><add>a</add></rdg> </app> system, to appreciate its advantages.The three encodings of this slightly more complex example illustrate the general truth that the more information involving substitutions there is to be encoded, the clearer become the advantages of the use of the <app> method over the other two methods. As a rule, it is recommended that the <app> method be used for encoding substitutions of any complexity. It is also desirable that the one method be used throughout any one transcription. Accordingly, the <app> method is recommended for text critical transcription of primary textual materials requiring encoding of instances of other than straightforward substitution.
An author or scribe may mark a word or phrase in some way, and then on reflection decide to cancel the marking. For example, text may be marked for deletion and the deletion then cancelled, thus restoring the deleted text. Such cancellation may be indicated by the <restore> element:
Presume that Lawrence decided to restore ‘my’ to the phrase of Eloi, Eloi, lama sabachthani first written ‘For I hate this my body’, with the ‘my’ first deleted then restored by writing ‘stet’ in the margin. This may be encoded:
For I hate this <restore hand="dhl" desc="marginal "stet""><del>my</del></restore> body
<!-- 18.1.6: Cancelled Deletions--> <!ELEMENT restore %om.RO; %phrase.seq;> <!ATTLIST restore %a.global; desc CDATA #IMPLIED cert CDATA #IMPLIED type CDATA #IMPLIED resp IDREF %INHERITED; hand IDREF %INHERITED; TEIform CDATA 'restore' > <!-- end of 18.1.6-->
Where text is not transcribed, whether because of damage to the original, or because it is illegible, or because of editorial policy, the <gap> core element should be used to register the omission; where text not present in the source is supplied (whether conjecturally or from other witnesses) to fill an apparent gap in the text, it should be marked using the <supplied> element provided by the tag set defined in this chapter.
By its nature, the <gap> element must have no content. It should be used wherever an authorial or scribal erasure is so successful, or the text is so illegible, that nothing can be read. In the Beerbohm manuscript of The Golden Drugget cited above, for example, the author has erased several passages by inking them over completely:
Others <gap reason="cancelled" hand="mb" extent="10cm"/>—and here is one of them...
In an autograph letter of Sydney Smith in the Pierpont Morgan library,146 three words in the signature are quite illegible:
I am dr Sr yr <gap reason="illegible" hand="ss" extent="3 words"/>Sydney SmithIt is possible, but not always necessary, to provide measurements precise to the millimeter or even to the printer's point. The degree of precision attempted will vary with the purpose of the encoding and the nature of the material.
In cases where there is damage, or a degree of illegibility, but the text is nevertheless legible and is transcribed, the <gap> element should not be used. Instead, the passage should be marked using one or more of the elements <damage> and <unclear>, which are described in section 18.2.3 Damage, Illegibility, and Supplied Text.
If the source text is completely illegible or missing, and new text is supplied to fill the gap, it should be marked as <supplied>. If another (imaginary) copy of the letter above preserved the signature as reading ‘I am dear Sir your very humble Servt Sydney Smith’, the text illegible in the autograph might be supplied in the transcription:
I am dr Sr yr <supplied reason="illegible" resp="RW" source="amanuensis copy">very humble Servt</supplied> Sydney SmithBoth <gap> and <supplied> may be used in combination with <unclear>, <damage>, and other elements; for discussion, see section 18.2.4 Use of the Gap, Del, Damage, Unclear and Supplied Tags in Combination.
<!-- 18.1.7: Supplied Text--> <!ELEMENT supplied %om.RO; %paraContent;> <!ATTLIST supplied %a.global; reason CDATA #IMPLIED resp CDATA %INHERITED; hand IDREF %INHERITED; agent CDATA #IMPLIED source CDATA #IMPLIED TEIform CDATA 'supplied' > <!-- end of 18.1.7-->
This section describes methods for recording a number of non-linguistic characteristics of the source text which are often of particular interest in the transcription of primary sources: points at which one scribe takes over from another, or at which ink, pen, or other characteristics of the writing change; points at which the source is damaged or imperfectly legible; and unusual spaces or lines in the source. A discussion of the usage of the hand, resp, and cert attributes is also included. Methods for recording page breaks, column breaks, and line breaks in the source are described in section 6.6 Simple Links and Cross References.
For many text-critical purposes it is important to signal the person responsible (the hand) for the writing of a whole document, a stretch of text within a document, or a particular feature within the document. The hand may be of a known and named scribe or author, as ‘DHL’, or may be described by an anonymous formula, as ‘hand one’. Where the hand is associated with a particular feature tagged within a document, this may be indicated by the value of the hand attribute on that feature. The examples given above of the use of the hand attribute with coding of additions and deletions illustrate this.
In other cases, it may be necessary to identify a document hand without there being any association of that hand with any specific tagged document feature. The <handList> and <hand> elements are used in the TEI header (in the <profileDesc> element) to define each unique hand or scribe distinguished by the encoder in the document. One such element must appear within the header for each hand distinguished in the text, and each such element should bear a distinct identifier as the value of its global id attribute.147 Each location where a change of hands occurs may then be marked in the text by the empty <handShift> element, which specifies the hand concerned by giving the same identifier.
The attributes old and new on the <handShift> element refer to the order of the text in the transcription: ‘old’ is the material before the <handShift>, ‘new’ the material following. This will ordinarily, but not necessarily, be the order in which the material was originally written. Neither attribute is required but both are recommended where there is a new hand, as opposed to a new writing style in the one hand. The character attribute will be most often used to encode descriptive shifts which the transcriber perceives within a manuscript and which may or may not be associated with or denote changes in scribe or content. The particular values encoded will depend upon the needs of the transcriber. Where many values are to be encoded, feature structures provide an alternative means of encoding these.
A single hand may employ different writing styles and inks within a document, or may change character. For example, the writing style might shift from ‘anglicana’ to ‘secretary’, or the ink from blue to brown, or the character of the hand may change. Any such changes should be indicated by assigning a new value to the appropriate attribute within the <handShift> element. The one hand may employ different renditions within the one writing style, for example medieval scribes indicating a structural division by emboldening all the words within a line. These should be indicated by use of the rend attribute on an element, in the same manner as underlining, emboldening, font shifts, etc., in transcription of a printed text, rather than by introducing a new <handShift> element.
In this example148 first the document hands are declared in the header:
<teiHeader> <!-- ... --> <profileDesc> <!-- ... --> <handList> <hand id="h1" style="copperplate" ink="brown" character="regular" first="yes" resp="das"/> <hand id="h2" style="print" ink="brown" character="unschooled" resp="das"/> </handList> <!-- ... --> </profileDesc> <!-- ... --> </teiHeader>
... and that good Order Decency and regular worship may be once more introduced and Established in this Parish according to the Rules and Ceremonies of the Church of England and as under a good Consciencious and sober Curate there would and ought to be <handShift new="h2" old="h1" resp="das"/> and for that purpose the parishioners pray
In this example149 there is a change of ink within the one hand. This is indicated by a new value for the ink attribute on the <handShift> element:
<l>When wolde the cat dwelle in his ynne</l> <handShift ink="black"/> <l>And if the cattes skynne be slyk and gaye</l>
<!-- 18.2.1: Hand Shifts--> <!ELEMENT hand %om.RO; EMPTY> <!ATTLIST hand %a.global; hand CDATA #IMPLIED scribe CDATA #IMPLIED style CDATA #IMPLIED mainLang CDATA #IMPLIED ink CDATA #IMPLIED character CDATA #IMPLIED first CDATA #IMPLIED resp CDATA %INHERITED; TEIform CDATA 'hand' > <!ELEMENT handShift %om.RO; EMPTY> <!ATTLIST handShift %a.global; new IDREF #IMPLIED old IDREF #IMPLIED style CDATA #IMPLIED ink CDATA #IMPLIED character CDATA #IMPLIED resp IDREF %INHERITED; TEIform CDATA 'handShift' > <!ELEMENT handList %om.RO; (hand*)> <!ATTLIST handList %a.global; TEIform CDATA 'handList' > <!-- end of 18.2.1-->
The hand and resp attributes have similar, but not identical, meanings. Observe their distinctive uses in the following encoding of the William James passage mentioned above in section 18.1.3 Correction and Conjecture. In this example, the ‘But’ inserted by James is tagged as an <add>, and the consequent editorial correction of ‘One’ to ‘one’ treated separately:
<add place="supralinear" resp="FB" hand="WJ">But</add> <corr sic="One" resp="FB">one</corr> must have lived ...As in this example, hand should be reserved for indicating the hand of any form of marking—here, addition but also deletion, correction, annotation, underlining, etc.—within the primary text being transcribed. The scribal or authorial responsibility for this marking may be inferred from the value of the hand attribute. The value of the hand attribute should be one of the hand identifiers declared in the document header (see section 18.2.1 Document Hands).
As in this example, the resp on a particular element should be used only to indicate the particular aspect of responsibility defined in these Guidelines as appropriate to the resp attribute for that element. In the case of the <add> element, the resp attribute is defined as signifying the responsibility for identifying the hand of the addition: here, Bowers' identification of the hand as that of William James. In the case of the <corr> element, the resp attribute is defined as signifying the responsibility for supplying the intellectual content of the correction reported in the transcription: here, Bowers' correction of ‘One’ to ‘one’.
As these examples show, the field of application of the resp attributes varies from element to element. In some cases, it applies to the content of the element (<corr> and <expan>); in others it applies to the value of a particular attribute (<sic>, <abbr>, <del>, etc.). In all cases where both the cert and resp attributes are defined for a particular element, the two attributes refer to the same aspect of the markup. The one indicates who is intellectually responsible for some item of information, the other indicates the degree of confidence in the information. Thus, for a correction, the resp attribute signifies the person responsible for supplying the correction, while the cert attribute signifies the degree of editorial confidence felt in that correction. For the expansion of an abbreviation, the resp attribute signifies the person responsible for supplying the expansion and the cert attribute signifies the degree of editorial confidence felt in the expansion.
This close definition of the use of the resp and cert attributes with each element is intended to provide for the most frequent circumstances in which encoders might wish to make unambiguous statements regarding the responsibility for and certainty of aspects of their encoding. The resp and cert attributes, as so defined, give a convenient mechanism for this. However, there will be cases where it is desired to state responsibility for and certainty concerning other aspects of the encoding. For example, one may wish in the case of an apparent addition to state the responsibility for the use of the <add> element, rather than the responsibility for identifying the hand of the addition. It may also be that one editor may make an electronic transcription of another editor's printed transcription of a manuscript text — here, one will wish to assign layers of responsibility, so as to allow the reader to determine exactly what in the final machine-readable transcription was the responsibility of each editor. In these complex cases of divided editorial responsibility for and certainty concerning the content, attributes and application of a particular element, the more general mechanisms for representing certainty and responsibility described in chapter 17 Certainty and Responsibility should be used.
The fields of reference of the resp and cert attributes for each element have been chosen to enable what are felt as the most frequent likely statements an encoder may wish to make concerning the areas of responsibility and certainty related to that element. It is open to each local transcription scheme to vary the use of the resp and cert attributes on particular elements where it is felt convenient. This practice should be documented in the <encodingDesc> element in the file header. Further, it is recommended that before interchange any such local usage of these attributes be converted to conformancy with the definitions of the resp and cert attributes given in these Guidelines. Use of the resp and cert in interchange documents in ways not here defined may lead to unpredictable results.
It should be noted that the certainty and responsibility mechanisms described in chapter 17 Certainty and Responsibility replicate all the functions of the resp and cert attributes on particular elements. For example, the encoding of Donaldson's conjectured emendation of ‘wight’ to ‘wright’ in line 117 of Chaucer's Wife of Bath's Prologue (see 18.1.3 Correction and Conjecture) may be encoded as follows using the resp and cert attributes on the <corr> element:
<corr sic="wight" resp="ETD" cert="70">wright</corr>Exactly the same information could be conveyed using the certainty and responsibility mechanisms, as follows:
<corr id="c117" sic="wight">wright</corr> <!-- ... certainty and responsibility elements may be elsewhere --> <certainty target="c117" locus="#gicontent" degree="70"/> <respons target="c117" locus="#gicontent" resp="ETD"/>The choice of which mechanism to use is left to the encoder. In transcriptions where only such statements of responsibility and certainty are made as can be accommodated within the resp and cert attributes of particular elements, it will be economical to use the resp and cert attributes of those elements. Where many statements of responsibility and certainty are made which cannot be so accommodated, it may be economical to use the <respons> and <certainty> elements throughout.
The above discussion supposes that in each case an encoder is able to specify exactly what it is that one wishes to state responsibility for and certainty about. Situations may arise when an encoder wishes to make a statement concerning certainty or responsibility but is unable or unwilling to specify so precisely the domain of the certainty or responsibility. In these cases, the <note> element may be used with the type attribute set to ‘cert’ or ‘resp’ and the content of the note giving a prose description of the state of affairs.
The <gap> and <supplied> elements described above (section 18.1.7 Text Omitted from or Supplied in the Transcription) should be used with appropriate attributes where the degree of damage or illegibility in a text is such that nothing can be read and the text must be either omitted or supplied either conjecturally or from one or more other sources. In many cases, however, despite damage or illegibility, the text may yet be read with reasonable confidence. In these cases, the following elements should be used:
The following examples refer to the recto of folio 5 of the unique manuscript of the Elder Edda.150 Here, the manuscript of Vóluspá has been damaged through irregular rubbing so that letters in various places are obscured and in some cases cannot be read at all. The existence of the damage may be registered in general for this leaf by use of the <damage> element.
<damage extent="whole leaf" agent="rubbing at edges"> ... </damage>However, in fact the damage crosses structural divisions, so the <damage> element does not nest properly within the containing <div> elements. The simplest method to solve this problem is to split the element into two fragments, one within each structural division:
<p> <!-- beginning of division ... --> <!-- page break, beginning of damage --> <pb n='5r'/> <damage agent='rubbing at edges' extent='whole leaf'> <!-- text continues --> </damage> </p> <p> <damage agent='rubbing at edges, continued' extent='whole leaf'> <!-- beginning of new text division ... --> <!-- page break, end of this damaged section --> </damage> <pb n='5v'/> <!-- text continues ... --> </p>For other techniques of handling non-nesting information, see chapter 31 Multiple Hierarchies.
um aldr d<damage>aga</damage> yndisniota
um aldr d<unclear reason="damage">aga</unclear> yndisniota
um aldr d<damage agent="rubbing"><unclear>aga</unclear></damage> yndisniota
Alternatively, the transcriber may not feel able to read the last three letters of ‘daga’ but may wish to supply them by conjecture. Note the use of the source attribute to assign the conjecture to Finnur Jónsson:
um aldr d<supplied reason="rubbing" source="FJ">aga</supplied> yndisniotaThe <supplied> element may if desired be enclosed within a <damage> element:
um aldr d<damage agent="rubbing"><supplied source="FJ">aga</supplied></damage> yndisniota
&Thorn;ar k&hook-o;mr inn dimmi dreki fliugandi naþr frann neþan <gap reason="illegible" agent="rubbing" extent="4"/>As with <supplied>, this <gap> might be enclosed by a <damage> element.
In these examples, various phenomena of illegibility and conjecture all result from the one cause, an area of damage to the text — rubbing at various points — which is not continuous in the text, affecting it at irregular points. In these cases, the <join> element may be used to indicate which tagged features are part of the same physical phenomenon. (See chapter 14 Linking, Segmentation, and Alignment for more details.)
The above examples record imperfect legibility due to damage. When imperfect legibility is due to some other reason (typically because the handwriting is ill-formed), the <unclear> element should be used without any enclosing <damage> element. In Robert Southey's autograph of The Life of Cowper,151 the final six letters of ‘attention’ are difficult to read because of the haste of the writing, though reasonably certain from the context.
and from time to time invited in like manner his att<unclear>ention</unclear>The cert attribute on the <unclear> element may be used to indicate the level of editorial confidence in the reading contained within it.
<!-- 18.2.3: Damage and Illegiblity--> <!ELEMENT damage %om.RO; %paraContent;> <!ATTLIST damage %a.global; type CDATA #IMPLIED extent CDATA #IMPLIED resp IDREF %INHERITED; hand IDREF %INHERITED; agent CDATA #IMPLIED degree CDATA #IMPLIED TEIform CDATA 'damage' > <!-- end of 18.2.3-->The <unclear> element is defined in section 6.5 Simple Editorial Changes.
The <gap>, <damage>, <unclear>, <supplied>, and <del> elements may be closely allied in their use. For example, an area of damage in a primary source might be encoded with any one of the first four of these elements, depending on how far the damage has affected the readability of the text. Further, certain of the elements may nest within one another. The examples given in the last sections illustrate something of how these elements are to be distinguished in use. This may be formulated as follows:
The presence of significant space in the text being transcribed may be indicated by the <space> element. The author or scribe may have left space for a word, or for an initial capital, and for some reason the word or capital was never supplied and the space left empty. This element should not be used to mark normal inter-word space or the like.
By god if wommen had writen storyes As <space extent="7"/> han within her oratoryesThe <supplied> element discussed in the previous section may be used to supply the text presumed missing:
By god if wommen had writen storyes As <supplied reason="space" resp="ES" source="Hg">preestes</supplied> han within her oratoryesHere, the fact of the space within the manuscript is indicated by the value of the reason attribute. The source of the supplied text is shown by the value of the source attribute as the Hengwrt manuscript; the transcriber responsible for supplying the text is ES. The <space> element is formally defined thus:
<!-- 18.2.5: Spaces in the source--> <!ELEMENT space %om.RO; EMPTY> <!ATTLIST space %a.global; dim (horizontal | vertical) #IMPLIED extent CDATA #IMPLIED resp CDATA #IMPLIED TEIform CDATA 'space' > <!-- end of 18.2.5-->
The most common form of marking of text in manuscripts is by lines written under, beside or through the text. The lines themselves may be of various types: they may be solid, dashed or dotted, doubled or tripled, wavy or straight, or a combination of these and other renderings. The line may be used for emphasis, or to mark a foreign or technical term, or to signal a quotation or a title, etc.: the elements <emph>, <foreign>, <term>, <mentioned>, <title> may be used for these. Frequently, a scholar may judge that a line is used to delete text: the <del> element is available to indicate this. In all these cases, the rend attribute may be used on these or other elements to indicate that the text is marked by a line and the style of the line. Thus, Lawrence's deletion by strike-through of ‘my’ in the autograph of Eloi, Eloi, lama sabachthani is noted:
For I hate this <del rend="strikethrough" hand="dhl">my</del> body, which is so dear to me
There will be instances, however, where a scholar wishes only to register the occurrence of lines in the text, without making any judgement as to what the lines signify. In these the <hi> element may be used, with the rend attribute to mark the style of line. In the manuscript of a letter by Robert Browning to George Moulton-Barrett,152 the underlining of the phrase ‘had obtained all the letters to Mr Boyd’ may be marked-up as follows:
I have once,—by declaring I would prosecute by law—, hindered a man's proceedings who <hi rend="underline">had obtained all the letters to Mr Boyd</hi>
The above examples presume the common case where a single word or phrase is marked by a line, with no doubt as to where the marking begins or ends and with no overlapping of the area of text with other marked areas of text. Where there is doubt, the <certainty> element may be used to record the doubt. In the Browning example cited above the underlining actually begins half-way under ‘who’, and this uncertainty could be remarked as follows:
I have once,—by declaring I would prosecute by law—, hindered a man's proceedings who <hi id="cstart1" rend="underline">had obtained all the letters to Mr Boyd</hi> <!-- ... --> <certainty target="cstart1" locus="#startloc" desc="may begin with previous word" degree="0.70"/>
Where the area of text marked overlaps other areas of text, for example crossing a structural division, one of the span mechanisms outlined in these Guidelines may be used. Where the line is thought to mark a deletion, the <delSpan> element may be used. Where it is desired simply to record the marking of a span of text in circumstances where it is not possible to surround the text with a <hi> element, the <span> element may be used with the rend attribute indicating the style of line-marking.
More work needs to be done on clarifying the treatment of other textual features marked by lines which might so overlap or nest. For example, in many Middle English manuscripts (e.g. the Jesus and Digby verse collections) marginal sidebars may indicate metrical structure: couplets may be linked in pairs, with the pairs themselves linked into stanzas. Or, marginal sidebars may indicate emphasis, or may point out a region of text on which there is some annotation: in many manuscripts of Chaucer's Wife of Bath's Prologue lines 655–8 are marked with nesting parentheses against which the scribe has written ‘nota’.
At the lowest level, all such features could be captured by use of the <note> element, containing a prose description of the manuscript at this point. It is not yet clear how best to mark up such phenomena so as to obtain more usefully structured encodings. For example, in the Chaucer example just cited, one may wish to record that the ‘nota’ is written in the Hengwrt manuscript in the right margin against a single large left parenthesis bracketing the four lines, with two right parentheses in the right margin bracketing two overlapping pairs of lines: the first and third, the second and fourth. The <note> element allows us to record that the scribe wrote ‘nota’, but is not well-adapted to show that the ‘nota’ points both at all four lines and at two pairs of lines within the four lines.
As a rule, matter associated with the page break (signature, catchword, page number) should be drawn into the <pb> element as attributes: see section 6.9 Reference Systems. In text-critical situations where these elements need tagging in their own right (for instance, when the catch-word presents a variant reading, or spacing in the header or footer is significant for compositor identification), the element <fw> may be used:
It should not be used for marginal glosses, annotations, or textual variants, which should be tagged using <gloss>, <note>, or the text-critical tags described in chapter 19 Critical Apparatus, respectively.
<fw type="head" place="top-centre">Poëms.</fw> <fw type="pageno" place="top-right">29</fw> <fw type="sig" place="bot-centre">E3</fw> <fw type="catch" place="bot-right">TEMPLE</fw>
<!-- 18.3: Headers and footers--> <!ELEMENT fw %om.RO; %phrase.seq;> <!ATTLIST fw %a.global; type CDATA #IMPLIED place CDATA #IMPLIED TEIform CDATA 'fw' > <!-- end of 18.3-->
We repeat the advice given at the beginning of this chapter, that these recommendations are not intended to meet every transcriptional circumstance ever likely to be faced by any scholar. They are intended rather as a base to enable encoding of the most common phenomena found in the course of scholarly transcription of primary source materials. These guidelines particularly do not address the encoding of physical description of textual witnesses: the materials of the carrier, the medium of the inscribing implement, the layout of the inscription upon the material, the organisation of the carrier materials themselves (as quiring, collation, etc.), authorial instructions or scribal markup, etc. Some of these issues may be covered in future editions of these guidelines.