The structure and encoding of ParlaMint corpora
2023-11-06

Table of contents

1. Introduction

This document is meant to serve as a reference for the encoding of ParlaMint corpora of parliamentary proceedings. In order for the ParlaMint corpora to be interoperable (i.e. so that the same scripts can be used to process them), their structure is fairly rigid, both in terms of file names and folder structure, as well as their TEI XML encoding. This is not to say that all the corpora have to contain exactly the same information because we distinguish obligatory information, which all the corpora should contain, from that which is optional, and present only in the corpora for which it has been possible to gather it from the corpus sources.

This document is a specialisation of Parla-CLARIN, itself a customisation the TEI Guidelines. But while Parla-CLARIN gives fairly general recommendations for encoding corpora of parliamentary proceedings, ParlaMint, as mentioned, is much stricter. This document gives very specific encoding recommendations without necessarily stating the reasons for their choice. It covers the overall structure of ParlaMint corpora, the metadata they contain, the encoding of transcriptions, and, for the linguistically annotated version, the encoding of word-level linguistic annotatios, syntactic dependencies and named entities.

The document is not meant as a tutorial on TEI or ParlaMint, but as a reference to elements, their nesting and attributes exemplified by snippets from the existing ParlaMint corpora. Other sources can help in understanding the encoding of ParlaMint corpora:

The rest of these recommendations are structured as follows:

2. Overall corpus structure

2.1. XML structure

The parliamentary proceeding of one country of autonomous region constitute one ParlaMint corpus, which is stored as one XML document, with <teiCorpus> as its top-level element. It is composed of a <teiHeader>, giving the metadata for the corpus as a whole (further detailed in the Section on Corpus metadata), followed by a series of <TEI> elements that each contain one corpus component, as illustrated1 below:
             <!-- Corpus root --> <teiCorpus xmlns="http://www.tei-c.org/ns/1.0">   <teiHeader>...</teiHeader>   <TEI>...</TEI> <!-- Corpus component -->   <TEI>...</TEI> <!-- Corpus component -->   ...            <!-- More corpus components -->   </teiCorpus>           
Each corpus component should contain at most the transcripts for one day, although several components can contain the transcript for the same day, e.g. for different (types of) meetings. How and if these further subdivisions into separate components are realised is dependent on the corpus, as the granularity of parliamentary proceedings corpora, not to mention the national rules of structuring the workings of the parliament, differ substantially.
A corpus component will thus be rooted in the <TEI> element, which then contains its metadata in its own <teiHeader>, followed by the <text> element, which contains the transcription of the particular component, as illustrated below:
<TEI xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader>...</teiHeader>  <text>...</text> </TEI>

The <teiHeader> of a corpus component (further detailed in the Section on Corpus metadata) contains the metadata specific for this component (along with some redundant metadata about the provenance), and which should be unique in the corpus, i.e. the corpus component metadata should distinguish it from all the other components of the corpus.

2.2. Use of XInclude

The fact that a corpus is one XML document does not mean that it is also stored in one file. In fact, ParlaMint requires that each corpus component is stored in a separate file, with the corpus root, i.e. the top-level <teiCorpus>, also stored as one file. Furthermore, some parts of the corpus root metadata are also stored in separate files.

To enable one XML document to be composed of many files, we use the XInclude mechanism, and the corpus root uses this mechanism (i.e. the <include> elements in the XInclude namespace) to include its corpus component files, so a corpus root will be in fact encoded similarly to the following example:
           <!-- Corpus root file --> <teiCorpus xmlns="http://www.tei-c.org/ns/1.0" >    <teiHeader>...</teiHeader>   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"       href="2014/ParlaMint-NL_2014-04-16.xml"/>  <!-- Corpus component file -->   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"       href="2014/ParlaMint-NL_2014-04-17.xml"/>  <!-- Corpus component file -->   ...                                            <!-- More corpus component files --> </teiCorpus>           

Apart from corpus components, some parts of the overall corpus metadata (i.e. the <teiCorpus> <teiHeader> element) are also stored as separate files, and hence also included in the corpus root using the same XInclude mechanism as explained above.

2.3. File names and directory structure

ParlaMint has strict rules on how to name the various files that constitute a corpus, and how to collect them in directories.

The file names have the the following structure:

  • The corpus root file name should start with the string ParlaMint-, followed by the ISO 3166 country (or automous region) code (cf. Section on Standard values) e.g. ParlaMint-NL.xml or ParlaMint-ES-CT.
  • For machine-translated corpora the ISO 639 code of the language (cf. Section on Standard values) should follow the country code, e.g. ParlaMint-NL-en.xml.
  • A corpus component filename should start with the name of the root, followed by an underscore and the ISO 8601 formatted date of the transcript, for example ParlaMint-IS_2015-01-21-54.xml. In case a corpus component is further distinguished, so that there are are several components with the same date, the corpus compilers are free to extend the file name by a hyphen and any suffix containing only ASCII letters and numbers and the hyphen character, e.g. ParlaMint-NL_2018-10-30-eerstekamer-4.xml or ParlaMint-CZ_2016-04-13-ps2013-044-02-016-098.xml
  • Certain metadata elements from the corpus root <teiHeader> are stored in separate files, in particular the list of speakers, <listPerson>, the list of political parties and other organisations, <listOrg>, and the ParlaMint structural and linguistic taxonomies, i.e. <taxonomy> elements. The file names for such metadata files start with the name of the corpus root, followed by a hyphen, and then the name of the element, e.g. ParlaMint-BE-listPerson.xml. Where there are more files for instances of the same element name, as is the case for taxonomies, the filename should end with another hypen, followed by the ID of the particular element, e.g. ParlaMint-BE-taxonomy-UD-SYN.xml. Finally, some of the taxonomies are not corpus-specific, i.e. identical files are used by all ParlaMint corpora. In this case, the country or region code is ommitted, e.g. ParlaMint-taxonomy-parla.legislature.xml.
  • The file names of the corpus as a whole or corpus components that have been automatically converted from the source XML into some other format should have the same name as the corpus root or components, respectively, but with appropriate file extensions, e.g, ParlaMint-IS_2015-01-21-54.txt; this is further explained in the Section on Conversions.
  • As discussed in the Chapter on Linguistic annotation we distinguish the linguistically annotated version of the corpus from the ‘plain-text’ one, with the linguistic annotated version having the additional suffix .ana on the corpus root and components, e.g. ParlaMint-ES-CT.ana.xml or ParlaMint-IS_2015-01-21-54.ana.xml.

For distribution the complete XML corpus should be stored in a directory that has the same name prefix as the corpus root file. The directory then contains the corpus root file and its metadata files, while the corpus components should be in subdirectories, one per year, for example:

 ParlaMint-BE.TEI/ParlaMint-BE.xml
ParlaMint-BE.TEI/ParlaMint-BE-listPerson.xml
ParlaMint-BE.TEI/ParlaMint-BE-listOrg.xml
ParlaMint-BE.TEI/ParlaMint-taxonomy-parla.legislature.xml
ParlaMint-BE.TEI/ParlaMint-taxonomy-speaker_types.xml
...
ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-06-19.xml
ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-06-30.xml
ParlaMint-BE.TEI/2014/ParlaMint-BE_2014-07-17.xml
...
ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-06-54.xml
ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-07-54.xml
ParlaMint-BE.TEI/2015/ParlaMint-BE_2015-01-08-54.xml
...

The lingistically annotated version of the corpus is stored separately, with the main directory and, as mentioned, the corpus root and component filenames having the additional suffix .ana, e.g.

 ParlaMint-BE.TEI.ana/ParlaMint-BE.ana.xml
ParlaMint-BE.TEI.ana/ParlaMint-BE-listPerson.xml
ParlaMint-BE.TEI.ana/ParlaMint-BE-listOrg.xml
ParlaMint-BE.TEI.ana/ParlaMint-taxonomy-parla.legislature.xml
ParlaMint-BE.TEI.ana/ParlaMint-taxonomy-speaker_types.xml
ParlaMint-taxonomy-NER.xml
ParlaMint-taxonomy-UD.xml
...
ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-06-19.ana.xml
ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-06-30.ana.xml
ParlaMint-BE.TEI.ana/2014/ParlaMint-BE_2014-07-17.ana.xml
...
ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-06-54.ana.xml
ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-07-54.ana.xml
ParlaMint-BE.TEI.ana/2015/ParlaMint-BE_2015-01-08-54.ana.xml
...

3. General requirements

This section gives some general requirements a ParlaMint corpus has to meet, in particular those relating to the characters in a corpus, and the use of standards. It also details the structure of the file names of the ParlaMint root and component files, as well as the attributes expected on the <teiCorpus> and <TEI> tags.

3.1. Characters

The corpus should be encoded in Unicode, using the UTF-8 character encoding, at least for European languages. In cases where the original contains characters from the Unicode Private Use Area, these should, if possible, be given their closest Unicode equivalents or substituted by the Unicode replacement character U+FFFD. End-of-line hyphens, if present in the source files, should be removed, and the split words joined in order to enhance searching the corpus and to simplify linguistic processing.

The following characters, esp. prevalent when the source documents were in Word or HTML, deserve special mention:

  • TAB (U+0009) character helps the alignment of strings on successive lines. As ParlaMint is not interested in preserving the layout, all TAB chacters are substituted by space characters (U+0020).
  • NO-BREAK SPACE (U+00A0) prevents, with some applications, an automatic line break at its position and also collapsing such consecutive characters into a single space. As the use of this character complicates (or breaks) further processing, esp. linguistic annotation, these characters should be substituted by the normal space character (U+0020). The same holds for other variants of spaces (U+2000 - U+200A), which are, however, used much less frequently.
  • NON-BREAKING HYPHEN (U+2011), similarly to NO-BREAK SPACE, prevents a line break, in this case following its position. With a similar reasoning as above, this character should be substituted by the normal hyphen character ('-', U+002D).
  • SOFT HYPHEN (U+00AD) indicates that a word can be hyphenated at that point. Occurrences of this character should be removed from the corpus.

Text-bearing elements should also not start or end with space characters, and sequences of whitespace characters should be changed into a single space.

3.2. Standard values

Whenever possible, ParlaMint uses standards for information coding. In particular, the following information must be standardised:

  • As the identity of a ParlaMint corpus is determined by the country or region of the particular parliament, its code appears in many places. For specifying these codes, the ISO 3166 standard should be used, in particular ISO 3166-1 alpha-2 for the two letter codes of the countries (for national parliaments) and ISO 3166-2 for the names of country subdivision (for parliaments of autonomous provinces,). So, for example, the country code for Spain is "ES", while the code for the autonomous Basque community is "ES-PV". Note that we use the term regional parliaments for such cases.
  • The codes for the languages used in the corpora (i.e. the possible values of the xml:lang attribute) should follow BCP 47 (cf. also xml:lang in XML document schemas. Essentially, this means that the value for a language code should have two letters, following ISO 639-1 or, and only if a two letter code does not exist for a language, the three-letter ISO 639-2/T code. For example, the code for Basque is 'eu'. ParlaMint corpora will use at least two languages, i.e. the language that the transcriptions are written in, which we will call the local language and English, as the meta-language, which is (also) used in the metadata.
  • Temporal, i.e. time-related information is typically stored in the when, from and to attributes of various elements. To specify a date or time as the value of these attributes, formatting according to the ISO 8601 standard should be used, e.g. 2022-04-01 for the 1st of April 2022. More information on temporal attributes is given in the Section on Temporal attributes.

3.3. Attributes of top-level elements

The Chapter on Overall corpus structure introduced the top level elements of the corpus root file and of the component files (i.e. the <teiCorpus> and <TEI> elements), but did not elaborate on their attributes; these are presented in this section.

The corpus root has three required attributes, as shown below:
             <teiCorpus xmlns="http://www.tei-c.org/ns/1.0"             xml:id="ParlaMint-FR"             xml:lang="fr">           
All three attributes can also be used on any other element, and are thus of special importance:
  • xmlns determines the namespace of the element, and this should always be the TEI namespace, i.e. http://www.tei-c.org/ns/1.0. Note that all lower level elements in the same file inherit this namespace, so it is not necessary (although it is not an error) for other elements to also define their namespace.
  • xml:id is an attribute form the (implicitly assumed) XML namespace, and gives the identifier for the corpus root or component. The value of an ID should be unique in the corpus as a whole and should obey format requirements as defined by W3C. For the corpus root, as well as for the components, it is required that this top level identifier is identical to the file name (without the file extension). The xml:id is a global attribute, so any element can have it. While this is not required, it is necessary for any element that is then referred to (via this same ID) by some other element, such as many elements in the <teiHeader>, as is explained in the Section on Corpus metadata. The subordinate elements in the transcription that have an ID (such as utterances and segments), are recommended to have the top level xml:id as a prefix and to indicate the element name in the ID. For example, if the top level ID is ParlaMint-GB_2021-01-06, the first utterance would have the ID ParlaMint-GB_2021-01-06-lords.u1 and the first segment ParlaMint-GB_2021-01-06-lords.seg1. The number of the element should not have leading zeros.
  • xml:lang is also a global attribute and gives the language code of the text content of the element; for the corpus root this does not (just) mean the content of its TEI header, but primarily the textual content of its XIncluded components. The convention is that language of the text content of an element is determined by the value of the first xml:lang attribute on its ancestor axis. In cases where the content is multilingual, the language code should be of the majority language. When the proportion of the languages is about equal, then the mul code for multiple languages can also be used.
A corpus component also has the same three required attributes, but additionally also the ana attribute:
             <TEI xmlns="http://www.tei-c.org/ns/1.0"       xml:id="ParlaMint-FR_2017-07-04-E1001"       xml:lang="fr"       ana="#parla.sitting #reference">           
The same as for the corpus root, the component also sets the TEI namespace, and gives the language of its textual content, while its xml:id, of course, identifies the particular component. The ana attribute is a pointing attribute, and we introduce the these attributes in the next section.

3.4. Pointing attributes

The ParlaMint encoding uses pointing attributes for a number of purposes, e.g. for references to taxonomy categories, to speaker metadata, or to linguistic categories.

While a few elements have dedicated pointing attributes, there are three generally used ones. They share the characteristics that they are all used by a large number of different elements and that their value is a series of pointers, i.e. a white-space delimited sequence of references to the values of some xml:id attribute in the corpus or, in general, to an URI. The three attributes are:

  • ana serves to provide an analysis or to classify an element according to some pre-determined vocabulary. In ParlaMint the target element will typically be a category in a taxonomy, an event or date, or an organisation.
  • corresp points to items that correspond to the current element in some way, e.g. the (URL of a) media file to a page break.
  • ref provides an explicit reference to the full definition or identity for the entity being named. In ParlaMint it is used e.g. for connecting a person's affiliation with a particular organisation. The value of this attribute is often, but not always, an URL, e.g. for associating a place name with its GeoNames URL.
To illustrate, the example below gives some elements that contain one or more of these attributes:
<meeting ana="#parla.upper #parla.term #LEG.18">18 Legislatura</meeting> ... <affiliation ref="#group.L-SP-PSd.Az"  role="memberana="#LEG.18from="2018-03-27"/> ... <placeName ref="https://www.geonames.org/2523918">Palermo</placeName> ... <link ana="ud-syn:det"  target="#ParlaMint-IT.seg1.2.6 #ParlaMint-IT.seg1.2.5"/>
The first example, with the <meeting> element classifies it (the definitions are given in the relevant taxonomy) as a meeting of the upper house, in the scope of a parlimentary term, specifically in the XVIII Legislative Term. The example with <affiliation> (again, the definitions are given the elements with the pointed-to ID) specifies that the (person that has this) affiliation is a member of the parliamentary group ‘Lega-Salvini Premier-Partito Sardo d'Azione’ in the scope of the XVIII Legislative Term. The <placeName> example gives the definition of Palermo in the GeoNames database via the used URL. Finally, the <link> example illustrates a Universal Dependencies determiner syntactic link between two tokens. The link uses the TEI extended pointer syntax, further explained in the Section on Prefix definitions.

It is often difficult to decide which of the attribute to use for a particular pointer, therefore examples of usage given with the relevant element should be always consulted.

3.5. Temporal attributes

ParlaMint makes a lot of use of temporal information, e.g. to determine when a session took place or the period when a certain person was an MP. As mentioned in the Section on Standard values, the ISO 8601 format should be used to specify the dates or times.

The following attributes are used to specify temporal information:

  • The when attribute is used when the temporal information refers to a point in time, typically a date, and is used e.g. to give the date when the corpus was published, or when a change in the corpus was made.
  • The from and to attributes give the starting and ending date or time of an interval, e.g. the time period the corpus covers, or the period when a person was an MP. If only one of the two attributes is present, then the assumption is that this interval extends at least to the start (if from is missing) or after the end (if to is missing) of time period that the particular ParlaMint corpus covers. Similary, if both attributes are missing, the assumption is that the interval covers the complete time period of the ParlaMint corpus.

4. Corpus metadata

As mentioned, <teiCorpus> and <TEI> elements contain the obligatory <teiHeader> element, which stores the metadata to the corpus root or component. In this section we explain and give examples of the required and optional metadata that is contained in the <teiHeader>, proceeding through its various elements, and there distinguishing which parts and what content is appropriate for the corpus root, and which for a corpus component.

As a general remark, most metadata contains free text, and it is a requirement of ParlaMint that this data is given in the English language, to help researchers for other countries to understand it, and it is recommended to also give it in the local language in which the (main portion of) parliamentary transcripts is written, for a local researcher to be able to use it in their native tongue.

A ParlaMint <teiHeader> contains three obligatory elements: the file description, <fileDesc>, the encoding description, <encodingDesc>, and the profile description, <profileDesc>, and an optional revision description, <revisionDesc>:
<teiHeader>  <fileDesc>...</fileDesc>  <encodingDesc>...</encodingDesc>  <profileDesc>...</profileDesc>  <revisionDesc>...</revisionDesc> </teiHeader>
Below we explain each of these element in turn.

4.1. File description

The file description, <fileDesc> is composed of five obligatory elements, namely the title statement, <titleStmt>, the edition statement, <editionStmt>, the extent, <extent>, the publication statement, <publicationStmt>, and the source description, <sourceDesc>:
<fileDesc>  <titleStmt>...</titleStmt>  <editionStmt>...</editionStmt>  <extent>...</extent>  <publicationStmt>...</publicationStmt>  <sourceDesc>...</sourceDesc> </fileDesc>

4.1.1. Title statement

The title statement, <titleStmt> gives the title of the corpus root or component, along with the specification of the particular session(s) of the parliament contained, the persons responsible for compiling the corpus, and the funder(s) of the project.

This structure is exemplified by the following corpus root title statement:
<titleStmt>  <title type="main">Slovenski parlamentarni korpus ParlaMint-SI [ParlaMint]</title>  <title type="mainxml:lang="en">Slovenian parliamentary corpus ParlaMint-SI [ParlaMint]</title>  <title type="sub">Zapisi sej Državnega zbora Republike Slovenije, 7. in 8. mandat (2014 - 2020)</title>  <title type="subxml:lang="en">Minutes of the National Assembly of the Republic of Slovenia, Term 7 and 8 (2014 - 2020)</title>  <meeting n="7corresp="#DZ"   ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting>  <meeting n="8corresp="#DZ"   ana="#parla.lower #parla.term #DZ.8">8. mandat</meeting>  <respStmt>   <persName ref="https://orcid.org/0000-0001-6143-6877">Andrej Pančur</persName>   <persName ref="https://orcid.org/0000-0002-1560-4099">Tomaž Erjavec</persName>   <resp>Kodiranje ParlaMint TEI XML</resp>   <resp xml:lang="en">ParlaMint TEI XML corpus encoding</resp>  </respStmt>  <funder>   <orgName>Raziskovalna infrastruktura CLARIN</orgName>   <orgName xml:lang="en">The CLARIN research infrastructure</orgName>  </funder>  <funder>   <orgName>Slovenska raziskovalna infrastruktura CLARIN.SI</orgName>   <orgName xml:lang="en">The Slovenian research infrastructure CLARIN.SI</orgName>  </funder> </titleStmt>
The title statement starts with two titles (one main, the other subordinate), both in English and the local language, with the appropriate language code possibly inherited from a superordinate element. They are distinguished by the value main or sub of their type attribute and the value of their xml:lang attribute.

The main title has a formulaic structure ‘<Country name> parliamentary corpus ParlaMint-<Country code> [ParlaMint]’, with an equivalent structure for the local language. Note that the corpus ‘stamp’ in square brackets can also be ‘[ParlaMint.ana]’ for the linguistically annotated version of the corpus (as explained in the Chapter on Linguistic annotation) or ‘[ParlaMint SAMPLE]’ for corpus data samples, as available on the ParlaMint GitHub repository.

The subordinate title, in contrast to the main one, is free text, and usually formed on the basis of the source of the corpus. As with the main one, it should be given in both languages.

After the titles come the specification of the particular sessions that the corpus contains, encoded as <meeting> elements: the two meeting elements in the above example state that the ParlaMint-SI corpus contains the meetings of the 7th and 8th terms of the lower house of the National Assembly of the Republic of Slovenia. The <meeting> elements can give, as the value of their n attribute, the numbers of the meetings that the corpus covers, and their text content can give a free-text description of the meetings in the local language.

The formal information on the meetings is given in the values of the corresp and ana attributes, which are pointing attributes, as already explained in the Section on Attributes of top-level elements. Here they refer to the definition of organisations further explained in the Section on Organisations and the categories of taxonomy elements, further explained in the Section on the Class declaration. The value of the corresp attribute points to the governmental body of which a particular meeting element is a meeting of (in this case the National Assembly of the Republic of Slovenia), while the ana attribute contains a space-delimited sequence of pointers: #parla.lower points to the definition of the lower house, #parla.term to the definition of a parliamentary term, and #DZ.7 to the definition of the seventh mandate.

Next come one or more responsibility statements, <respStmt>, each one containing one or more person names, <persName>, with an optional ref attribute, giving the URL, where more information about the person can be found, and the responsibility element <resp>, which specifies what responsibility the statement is about.

In a similar manner, the <funder> elements give information on the organisations which have financially contributed to the compilation of the corpus, with the names of the organisations given in the <orgName> elements.

A corpus component has a very similar title statement to the corpus root, except that certain elements specify the metadata of the component, rather than the complete corpus. The also contain some redundant metadata, in particular, the responsibility statement and the funder, as illustrated in the example below:
<titleStmt>  <title type="main">Slovenski parlamentarni korpus ParlaMint-SI, izredna seja 59 [ParlaMint]</title>  <title type="mainxml:lang="en">Slovenian parliamentary corpus ParlaMint-SI, Extraordinary Session 59 [ParlaMint]</title>  <title type="sub">Zapisi sej Državnega zbora Republike Slovenije, 7. mandat, 59. izredna seja, 13.4.2018</title>  <title type="subxml:lang="en">Minutes of the National Assembly of the Republic of Slovenia, Term 7, Extraordinary Session 59, 13.4.2018</title>  <meeting n="59corresp="#DZ"   ana="#parla.lower #parla.meeting.extraordinary">Izredna</meeting>  <meeting n="7corresp="#DZ"   ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting>  <respStmt>   <persName>Andrej Pančur</persName>   <resp>Kodiranje TEI</resp>   <resp xml:lang="en">TEI corpus encoding</resp>  </respStmt>  <funder>   <orgName>Raziskovalna infrastruktura CLARIN</orgName>   <orgName xml:lang="en">The CLARIN research infrastructure</orgName>  </funder>  <funder>   <orgName>Slovenska raziskovalna infrastruktura CLARIN.SI</orgName>   <orgName xml:lang="en">The Slovenian research infrastructure CLARIN.SI</orgName>  </funder> </titleStmt>
In the example it can be seen that the main title of a corpus component is simply an extension of the corpus root title, as it also gives the name of the particular meeting that the component contains, while the subordinate title is, again, free text. Both titles must be unique in the complete corpus.

The other difference is in the <meeting> elements, which here specify a particular meeting of the corpus component transcription. In the exmple above, this is an extraordinary meeting of the lower house in the seventh term of the National Assembly of the Republic of Slovenia.

4.1.2. Edition statement

ParlaMint corpora have their edition statement, <editionStmt> both in the corpus root and components. As illustrated below, the only element it contains is <edition>:
<editionStmt>  <edition>3.0</edition> </editionStmt>
We use semantic versioning to specify the version of the corpus, i.e. giving the version number, where a new major version means substantial changes to the corpus, while the minor version is reserved for e.g. correcting errata or other minor changes. We do not use the patch number. It should be noted that - at least so far - all the ParlaMint corpora were released together, so that they are all of the same edition, i.e. have the same version number. At the time of writing, the latest version is 2.1, with the next one planned to be 3.0.

4.1.3. Extents

The <extent> element gives information on selected sizes of the complete corpus (in the corpus root) or of one corpus component, as illustrated below in the case of a corpus root extent:
<extent>  <measure unit="speechesquantity="75122"   xml:lang="sl">75.122 govorov</measure>  <measure unit="speechesquantity="75122"   xml:lang="en">75,122 speeches</measure>  <measure unit="wordsquantity="20190034"   xml:lang="sl">20.190.034 besed</measure>  <measure unit="wordsquantity="20190034"   xml:lang="en">20,190,034 words</measure> </extent>
ParlaMint requires two sizes to be given, and in both languages, which are distinguished by their unit attribute, namely the number of speeches and the number of words. The exact quantity is given in the quantity attribute, while the text content of <measure> gives the quantity together with the unit - if possible, the number here should contain the thousands separator appropriate for the language.

It should be noted that both sizes are somewhat complex to compute and are inserted into the TEI headers in the finalisation of a corpus (cf. the Section on Finalisation of corpora) by a common script, so it is not necessary to insert the extent in the process of developing a ParlaMint corpus.

4.1.4. Publication statement

The publication statement <publicationStmt> must appear in the corpus root as well as, in identical form, in the corpus components. As illustrated below, it contains information about the publisher of the corpus, the persistent identifier where the complete corpus can be found, under which licence it is distributed, and when it was released:
<publicationStmt>  <publisher>   <orgName xml:lang="sl">Raziskovalna infrastrukutra CLARIN</orgName>   <orgName xml:lang="en">CLARIN research infrastructure</orgName>   <ref target="https://www.clarin.eu/">www.clarin.eu</ref>  </publisher>  <idno type="URIsubtype="handle">http://hdl.handle.net/11356/1432</idno>  <availability status="free">   <licence>http://creativecommons.org/licenses/by/4.0/</licence>   <p xml:lang="sl">To delo je ponujeno pod   <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Priznanje avtorstva 4.0        mednarodna licenca</ref>.</p>   <p xml:lang="en">This work is licensed under the   <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0        International License</ref>.</p>  </availability>  <date when="2021-06-11">11. 6. 2023</date> </publicationStmt>
The <publisher> is, at least for the corpora produced in the scope of the CLARIN ParlaMint project, the CLARIN research infrastructure, and the element also gives the home page of the infrastructure. The ‘identifier number’ element, <idno>, specifies via its type and subtype attributes with fixed values URI and handle that the identifier is a handle, and contains the handle where the complete corpus corresponding to the specified version can be found. The <availability> specifiers, via its <licence> element the fixed-value CC BY 4.0 URL, and in the following paragraph gives a prose description of the licence, including its URL via the target attribute of <ref>. As usual, the textual information is given in both languages. Finally, the <date> gives the date of the release, where the when gives the date in the ISO 8601 format, while the textual content can give it according to the conventions used in the local language.

4.1.5. Source description

The source description <sourceDesc> of the corpus root encodes the original digital source of the ParlaMint corpus in the <bibl> element, as shown in the following example:
<sourceDesc>  <bibl>   <title type="mainxml:lang="sl">Zapisi sej Državnega zbora Republike Slovenije</title>   <title type="mainxml:lang="en">Minutes of the National Assembly of the Republic of Slovenia</title>   <idno type="URI">https://www.dz-rs.si</idno>   <date from="2014-08-01to="2020-07-16">1.8.2014 - 16.7.2020</date>  </bibl> </sourceDesc>
Apart from the bi-lingual <title>s, it should also give in <idno> with the fixed type as URI the government URL where the transcripts were first harvested from, while the dates of the earliest and latest transcript in the corpus are indicated by the from and to attributes of the <date> element. As usual, the values of these attributes should be according to ISO 8601, while the textual content can be formatted according to the local rules for writing dates.
For corpus components the source description is very similar to the one for the corpus root, except that the <title> can be modified to constrain the description to the exact meeting the component contains. The <date> element must, of course, specify the exact date when the meeting took place. If the transcription of the meeting is avilable on the Web, the <idno> should give this URL. Furthermore, if the audio or video of the meeting is available, this information can be given in the <recodingStmt>, as illustrated in the example below:
<sourceDesc>  <bibl>   <title type="mainxml:lang="cs">Parlament České republiky, Poslanecká sněmovna</title>   <title type="mainxml:lang="en">Parliament of the Czech Republic, Chamber of Deputies</title>   <idno type="URI">https://www.psp.cz/eknih/2013ps/stenprot/044schuz/s044033.htm</idno>   <date when="2016-04-13">13.04.2016</date>  </bibl>  <recordingStmt>   <recording type="audio">    <media xml:id="ps2013-044-02-000-000.audio1"     mimeType="audio/mp3"     source="https://www.psp.cz/eknih/2013ps/audio/2016/04/13/2016041308580912.mp3"     url="2013ps/audio/2016/04/13/2016041308580912.mp3"/>   </recording>  </recordingStmt> </sourceDesc>

As the example shows, the recording statement contains a <recording> element, which specifies the type of the recording (audio or video), and then contains a <media> element giving the ID of the file, its mimeType, the URL of the source of the recording (typically the official governmental site for parliamentary proceedings) and the local (possibly processed) copy of the file; this can be a local file, even though it won't be distributed together with the ParlaMint corpus or, better, a Web-based file on a stable location.

4.2. Encoding description

The encoding description <encodingDesc> of the corpus root contains the following elements:
<encodingDesc>  <projectDesc>...</projectDesc>  <editorialDecl>...</editorialDecl>  <tagsDecl>...</tagsDecl>  <classDecl>...</classDecl> </encodingDesc>

In contrast, the encoding description of a corpus component contains only two elements, namely (and redundantly) the <projectDesc> and the <tagsDecl>.

4.2.1. Project description

The project description <projectDesc> of the corpus root contains a short description of the project in the scope of which the corpus was compiled:
<projectDesc>  <p xml:lang="sl">Glavni cilji projekta <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> so    (1) izdelati večjezično množico na enak način kodiranih korpusov    zapiskov parlamentarnih sej, (2) jezikoslovno označiti te korpuse; (3)    narediti korpuse dostopne za prevzem in prek konkordančnikov; in (4)    pripraviti primere uporabe korpusov v politologiji in digitalni    humanistiki.</p>  <p xml:lang="en">The <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>    project aims to (1) create a multilingual set of uniformly encoded    comparable corpora of parliamentary proceedings (2) process the    corpora linguistically; (3) make the corpora available for download    and through concordancers; and (4) build use cases in Political    Sciences and Digital Humanities based on the corpus data.</p> </projectDesc>
The description above is written for the CLARIN ParlaMint project and the English language part can be used as is in the produced corpora for version 3.

4.2.2. Editorial declaration

The editorial declaration, <editorialDecl> is used only in the corpus root and contains prose descriptions of the editorial decision made in the process of compiling the corpus, along several dimensions, in particular what, if any types of <correction>, <normalization>, <quotation>, <hyphenation>, and <segmentation> was performed on the source texts of the corpus. The example below illustrates the use of these elements:
<editorialDecl>  <correction>   <p xml:lang="en">No correction of source texts was performed.</p>  </correction>  <normalization>   <p xml:lang="en">Text has not been normalised, except for spacing.</p>  </normalization>  <hyphenation>   <p xml:lang="en">No end-of-line hyphens were present in the source.</p>  </hyphenation>  <quotation>   <p xml:lang="en">Quotation marks have been left in the text and are not explicitly marked up.</p>  </quotation>  <segmentation>   <p xml:lang="en">The texts are segmented into utterances (speeches) and segments (corresponding to paragraphs in the source transcription).</p>  </segmentation> </editorialDecl>

4.2.3. Tags declaration

The tags declaration, <tagsDecl> of the corpus root gives the count of all the XML tags used in the data part (so, not in the TEI header) of the corpus (for the corpus root) or in an individual component of the corpus. To distinguish the TEI elements from the possible use of elements from other namespaces, a <namespace> element giving the TEI namespace in its name attribute is introduced first. Inside it, each TEI tag is listed in its own <tagUsage> element, with the attribute gi giving the name of the tag and occurs the number of occurrences, as shown in the following example:
<tagsDecl>  <namespace name="http://www.tei-c.org/ns/1.0">   <tagUsage gi="textoccurs="414"/>   <tagUsage gi="bodyoccurs="414"/>   <tagUsage gi="divoccurs="414"/>   <tagUsage gi="headoccurs="826"/>   <tagUsage gi="uoccurs="75122"/>   <tagUsage gi="segoccurs="280971"/>   <tagUsage gi="noteoccurs="85525"/>   <tagUsage gi="gapoccurs="7897"/>   <tagUsage gi="vocaloccurs="1740"/>   <tagUsage gi="incidentoccurs="37"/>   <tagUsage gi="kinesicoccurs="560"/>   <tagUsage gi="descoccurs="10234"/>  </namespace> </tagsDecl>
It should be noted that similar to the extents (as explained in the Section on Extents) the tag usage is inserted into the TEI headers in the finalisation of a corpus (cf. the Section on Validation and conversion) by a common script, so it is not necessary to compute it the process of developing a ParlaMint corpus.

4.2.4. Class declaration and taxonomies

The class declaration, <classDecl> is used only in the corpus root and contains only definitions of some controlled vocabularies used in ParlaMint corpora. These vocabularies, possibly hierarchically organised, are encoded using the <taxonomy> element.

The taxonomies themselves are stored in separate files, and are typically ParlaMint-wide, i.e. all corpora use the same taxonomies. The taxonomies are included in the document root with the XInclude directive, as illustrated below, for the case of the Czech corpus:
<classDecl>   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"       href="ParlaMint-taxonomy-parla.legislature.xml"/>    <!-- Common taxonomy on parliament legislature -->   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"       href="ParlaMint-CZ-taxonomy-meeting.parts.xml"/>     <!-- CZ-specific taxonomy with additional categories for meetings -->   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"       href="ParlaMint-taxonomy-speaker_types.xml"/>        <!-- Common taxonomy on types of speakers -->   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"       href="ParlaMint-taxonomy-subcorpus.xml"/>            <!-- Common taxonomy on predeterimentd subcorpora -->   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"        href="ParlaMint-taxonomy-politicalOrientation.xml"/> <!-- Common taxonomy on L-R political orientations of pol. parties -->   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"        href="ParlaMint-taxonomy-CHES.xml"/>                 <!-- Common taxonomy on CHES variables of pol. parties --> </classDecl>
As can be seen, three of the taxonomies are general ParlaMint taxonomies, while two are corpus specific, and are distinguished by including the country code CZ (followed by hyphen) into the filename.
To illustrate the structure of a taxonomy element, we give below the simplest common taxonomy (and include only the descriptions in English), which contains the categories that define the three subcorpora of a ParlaMint corpus:
<classDecl> ... <taxonomy xml:id="ParlaMint-taxonomy-subcorpus"   xml:lang="mul">   <desc xml:lang="en">    <term>Subcorpora</term>   </desc>   <category xml:id="reference">    <catDesc xml:lang="en">     <term>Reference</term>: reference subcorpus, until 2020-01-30</catDesc>   </category>   <category xml:id="covid">    <catDesc xml:lang="en">     <term>COVID</term>: COVID subcorpus, from 2020-01-31 onwards, when WHO made the        formal declaration of PHEIC, i.e. the Public Health Emergency of International Concern for COVID-19</catDesc>   </category>   <category xml:id="war">    <catDesc xml:lang="en">     <term>War</term>: War in Ukraine subcorpus, from 2022-02-24 onwards, i.e. from        Russia's full-scale invasion of Ukraine</catDesc>   </category>  </taxonomy> </classDecl>
A <taxonomy> thus first describes, via <desc>, what it is a taxonomy of, and then lists (the possibly nested) categories in <category> elements. Crucial here are the values of their xml:id attributes, by which a category is referred to, e.g. via the ana attribute of some other element, as was already explained in the Section on Attributes of top-level elements, in connection with classifying a corpus component via the ana attribute of its <TEI> element. The taxonomy category then bilingually glosses its meaning in its <catDesc> elements, which should always first contain the short name of the category, encoded in the <term> element.

ParlaMint requires several taxonomies to be defined in the class declaration of the corpus root (as well as a additionaly ones for the linguistically annotated corpus, as further described in the Section on Linguistic metadata). As mentioned, these taxonomies are defined globally and available as part of the data on the ParlaMint GitHub repository, and there is a special procedure modifying them, in particular on how to insert translations of a new language.

The five obligatory taxonomies are:

  • The subcorpus taxonomy, already given in the example
  • The taxonomy of speaker types, which distinguishes e.g. the chair of a meeting, ordinary speakers, and guest speakers.
  • The legislature taxonomy, which gives the possible organisations of a parliament, and is by far the most complex one.
  • The political orientation taxonomy, which gives the values of the (mostly) left-right political orientation of political parties and parliamentary groups.
  • The CHES variables taxonomy, which gives the variables of the Chapel Hill Survey on political parties (cf. the Section on Political parties and parliamentary groups).

Furhtermore, there are two obligatory taxonomies which pertain to the linguistically analysed version of the corpus only, cf. the Section on Linguistic taxonomies.

4.3. Profile description

The profile description, <profileDesc> is the third main division of the metadata provided by the TEI header. It contains a description of non-bibliographic aspects of the corpus, for example the list of speakers with their metadata. For the corpus root, it contains four elements, of which only the first, the <settingDesc> is used in corpus components. The elements are listed below:
<profileDesc>  <settingDesc>...</settingDesc>  <textClass>...</textClass>  <particDesc>...</particDesc>  <langUsage>...</langUsage> </profileDesc>
We explain the contents of each element in the following sections.

4.3.1. Setting description

The setting description, <settingDesc>, is used by both the corpus root and corpus components, and contains only one element, <setting>, which then gives information on where and when the meetings included took place. The example below gives a typicaly corpus root setting description:
<settingDesc>  <setting>   <name type="place">Westminster</name>   <name type="city">London</name>   <name type="countrykey="GB">U.K.</name>   <date from="2015-01-01to="2021-03-31"/>  </setting> </settingDesc>
As can be seen, the location of the meeting is given in <name> elements with the type attribute further specifying what kind of location this is, with country (or region for regional parliaments) additionally having the key, which gives is ISO 3166 code, as explained in the Section on Standard values. For the corpus root it also contains the interval of the dates of the transcripts included in the corpus. Note that this date range is also given in the <sourceDesc>, as explained in the Section on the Source description.
The setting description is also present in corpus components, and is very similar to the one in the corpus root, except that it can additionally specify the location of the meeting and must give its exact date, as illustrated below:
<settingDesc>  <setting>   <name type="place">Commons Chamber</name>   <name type="place">Westminster</name>   <name type="city">London</name>   <name type="countrykey="GB">U.K.</name>   <date when="2019-02-18">February 18th, 2019</date>  </setting> </settingDesc>

4.3.2. Text class

The text class, <textClass> groups information which describes the nature or topic of the corpus in terms of a standard classification scheme. It is used only in the ParlaMint corpus root, where it contains the category reference, <catRef> element:
<textClass>  <catRef scheme="#parla.legislature"   target="#parla.bi #parla.lower #parla.upper"/> </textClass>
The category reference specifies in the value of the scheme attribute which scheme it uses, and this will always be a pointer to the ParlaMint-wide taxonomy on legislature, as further explained in the Section on the Class declaration. The target attribute then gives pointers to the kind of legislature types the country or region has and what the corpus contains; in the case above, the country has a bicameral parliament, and the corpus contains the transcriptions of both the upper and lower house sittings. In general, the options are:
  • #parla.uni: Unicameral parliament
  • #parla.bi #parla.lower: Bicameral parliament, lower house only
  • #parla.bi #parla.upper: Bicameral parliament, upper house only
  • #parla.bi #parla.lower #parla.upper: Bicameral parliament, both houses

4.3.3. Participant description

The participant description, <particDesc> gives the information about about the speakers whose speeches constitute the corpus transcripts, as well as information about the government, parliament, parliamentary groups of political parties and other ‘organisations’ relevant to the affiliations of the speaker or the corpus in general. The <particDesc> is a part of the TEI header of the corpus root and contains two dedicated types of lists, <listOrg> for organisations, and <listPerson> for the speakers, as shown below:
<particDesc>  <listOrg>...</listOrg>  <listPerson>...</listPerson> </particDesc>
While the above gives the XML structure of the participant description, ParlaMint separates the organisation and person list into separate files (cf. the Section on File names and directory structure), so the actual encoding would be (for the CZ corpus) as follows:
<particDesc> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ParlaMint-CZ-listOrg.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ParlaMint-CZ-listPerson.xml"/> </particDesc>

Given the importance that ParlaMint gives to the information on speakers and their affiliations to (political) organisations, as well as the richness of this information, the content of <listOrg> and <listPerson> is further explained in a separate Chapter on Speakers and their organisations.

4.3.4. Language usage

The language usage, <langUsage> is the fourth and last element of the profile description of a corpus root and defines the languages that are used in the corpus. Typically the language use will define (bilingually) only two languages, the local language and English, as the language used in the metadata, for example:
<langUsage>  <language ident="slxml:lang="sl">slovenski</language>  <language ident="enxml:lang="sl">angleški</language>  <language ident="slxml:lang="en">Slovenian</language>  <language ident="enxml:lang="en">English</language> </langUsage>
In cases where the transcription contains more than one language, the percentage of their use can also be indicated in the usage element of the <language> elements, as illustrated in the example below:
<langUsage>  <language ident="enxml:lang="en">English</language>  <language ident="enxml:lang="nl">Engels</language>  <language usage="45ident="nl"   xml:lang="en">Dutch</language>  <language usage="45ident="nl"   xml:lang="nl">Nederlands</language>  <language usage="55ident="fr"   xml:lang="en">French</language>  <language usage="55ident="fr"   xml:lang="nl">Frans</language> </langUsage>

4.4. Revision description

The revision description, <revisionDesc> is the fourth, and last element of the TEI header. It is an optional element that can appear in the corpus root or component, and documents the revisions made in the corpus or component. Its structure is illustrated below:
<revisionDesc>  <change when="2021-06-11">   <name>Tomaž Erjavec</name>: Finalized encoding.</change>  <change when="2021-05-28">   <name>Tomaž Erjavec</name>: Built corpus.</change> </revisionDesc>
The revision description consists of a series of <change> elements, with the attribute when giving the date of the change, and the content containing the <name> of the person responsible for the change, and a free-text description of the change.

5. Speakers and their organisations

ParlaMint places considerable emphasis of including in the corpora significant information about the persons giving the speeches contained in the transcriptions. This is why, even though this information is encoded in the <particDesc> element of the <teiHeader> of the corpus root (cf. the Chapter on Participant description) we treat it here in a separate Chapter. Below we first discuss the information on persons, including how they are affiliated with (political) organisation, and then explain the encoding of these organisations.

5.1. Speakers

The information on speakers is given in the <listPerson> element of the <particDesc> element (cf. the Section on Participant description). This element contains the series of <person> elements, each of which gives information on an individual speaker, as the example below illustrates:
<listPerson>  <person xml:id="AccettoMatej">   <persName>    <surname>Accetto</surname>    <forename>Matej</forename>   </persName>   <sex value="M"/>  </person> ... </listPerson>
Each <person> must have an xml:id attribute, so that it can be referred to from the transcription. The person's name, <persName> gives the name of the person which is further decomposed into the person <surname>(s), <forename>(s) and possibly <addName>. A person's name can also change, typically because of marriage. In this case, the <person> should contain another (or, possibly, several) <persName> elements, each marked by the from and/or to temporal attributes, as shown in the following example:
<person xml:id="GlawischnigAnna">  <persName to="2016-06-01">   <surname>Glawischnig</surname>   <forename>Anna</forename>  </persName>  <persName from="2016-06-02">   <surname>Glawischnig-Piesczek</surname>   <forename>Anna</forename>  </persName> ... </person>
Note that the to/from dates should not overlap, neither should they have gaps in them, as this would mean that the person could have either two names at once, or none.

The person must also have the <sex> element, with the value attribute being one of the controlled values: M for male, F for female, O for other, N for none or U for unknown.

The person element can also contain other optional information, i.e. the date and place of <birth> (and <death>), their official Web page, link(s) to Wikipedia, their VIAF, or photo, as illustrated below:
<person xml:id="SayeedaWarsi">  <persName>   <forename>Sayeeda</forename>   <surname>Warsi</surname>  </persName>  <sex value="F"/>  <birth when="1971-03-28">   <placeName ref="https://www.geonames.org/2651286/">Dewsbury</placeName>  </birth>  <idno type="URIsubtype="contact">https://members.parliament.uk/member/3839/contact</idno>  <idno type="URIsubtype="wikimedia">https://en.wikipedia.org/wiki/Sayeeda_Warsi,_Baroness_Warsi</idno>  <idno type="URIsubtype="wikimedia"   xml:lang="es">https://es.wikipedia.org/wiki/Sayeeda_Warsi</idno>  <idno type="URIsubtype="viaf">http://viaf.org/viaf/33149912470406211798</idno>  <figure>   <graphic url="https://api.parliament.uk/photo/Paa3j0vS.jpg?crop=CU_1:1"/>  </figure> </person>
Finally, the name of the person, their place of birth etc. can als be written in several languages or scripts, as illustrated below:
<person xml:id="PlevnelievRosen">  <persName xml:lang="bg">   <forename>Росен</forename>   <surname>Асенов</surname>   <surname>Плевнелиев</surname>  </persName>  <persName xml:lang="en">   <forename>Rosen</forename>   <surname>Asenov</surname>   <surname>Plevneliev</surname>  </persName>  <sex value="M"/>  <birth when="1964-05-14">   <placeName>Гоце Делчев</placeName>   <placeName xml:lang="bg-Latn">Gotse Delchev</placeName>  </birth>  <education>Висшия машинно-електротехнически институт в София, със специалност „изчислителна техника“</education>  <education xml:lang="bg-Latn">Visshiya mashinno-elektrotehnicheski institut v Sofiya, sas spetsialnost „izchislitelna tehnika“</education>  <occupation>политик</occupation>  <occupation xml:lang="bg-Latn">politik</occupation>  <affiliation from="2012-01-22"   ref="#republic.Bulgariarole="memberto="2017-01-21"/>  <affiliation from="2012-01-22"   ref="#republic.Bulgariarole="headto="2017-01-21">   <roleName xml:lang="bg">4-и президент на Република България</roleName>   <roleName xml:lang="bg-Latn">4-i prezident na Republika Balgariya</roleName>  </affiliation>  <idno type="URIsubtype="wikimedia">https://bg.wikipedia.org/wiki/Росен_Плевнелиев</idno>  <idno type="URIsubtype="wikimedia">https://en.wikipedia.org/wiki/Rosen_Plevneliev</idno> </person>

5.1.1. Speaker affiliations

And important element of a person is <affiliation>, which associates the speakers with organisations, i.e. it specifies who is a member of the government, parliament, a parliamentary group of political parties2 or political parties themselves, as well as who holds a relevant office, e.g. that they are the president, chairman, minister etc. in or of a given organisation. The following example shows the use of the <affiliation> element for specifying membership in organisations:
<person xml:id="BahŽibertAnja">  <persName>   <forename>Anja</forename>   <surname>Žibert</surname>  </persName>  <sex value="F"/>  <affiliation role="member"   ref="#parliamentSIfrom="2014-08-01to="2018-06-21"   ana="#DZ.7"/>  <affiliation role="member"   ref="#parliamentSIfrom="2018-06-22ana="#DZ.8"/> </person>
The formal type of affiliation is given in the role attribute, in this case member. The ref attribute points to the ID of the organisation (cf. the Section on Organisations) in which the person has the specified role. For MPs, as is the case above, this will be the ID of the parliament or one of the two houses (cf. the Section on The parliament organisations).

The example above also shows the use of the classification attribute ana, which points to the specification of the legislative period in which the person was affiliated with the specified organisation. Such legislative periods are typically given as <event> elements inside the government or parliament organisations, as futher explained in the Sections on the government and parliament organisations.

The affiliation element also has the usual from and to attributes, i.e. from and to when the person was affiliated with the organisation. Affiliation can also have the classification ana attribute, which here points to the definition of the legislative period in which the person was affiliated with the specified organisation.

The role attribute can have as its value one of the values given by the ParlaMint schema. For backward compatibility with ParlaMint I corpora, there are some roles that are only used by one corpus (cf. the definition of role for <affiliation>), but the main ones that should be used are:

  • member specifies that a person is a member of the organisation
  • head is used for the lead person in an organisation, regardless of how this role is named in the specific country and for the specific organisation, i.e. it can be used for the queen (head of the country organisation), president (head of the republic organisation), prime minister (head of goverment), minister (head of ministry), chairperson (head of committee), etc.
  • deputyHead is the deputy head of the organisation, again, regardless of how this role is named in the specific country and for the specific organisation, i.e. it is used for the vice president, deputy chairperson, etc.
  • minister is used to indicate that the person is the minister in the government. Note that this only says that they are a minister, and when. To encode what they are a minister of, the ministry organisation needs to be created, and the minister associated with it with the head role.
Because the roles are quite generic, ParlaMint also allows to specify the exact name of the affiliation role for a particular country or region using the <roleName> element, preferably both in the local language, as well as in English, as illustrated in the following example, which specifies that somebody is a minister, and also what they are a minister of:
<affiliation role="ministerref="#GOV"  from="2020-08-01">  <roleName xml:lang="sl">Minister za obrambo</roleName>  <roleName xml:lang="en">Minister of Defence</roleName> </affiliation> <affiliation role="head"  ref="#MinistryOfDefencefrom="2020-08-01">  <roleName xml:lang="sl">Minister za obrambo</roleName>  <roleName xml:lang="en">Minister of Defence</roleName> </affiliation>
Finally, it is also possible to specify the name of the organisation that the person is affilated with directly inside the <affiliation> element, using the <orgName> element3 again, preferably both in the local language, as well as in English, as illustrated in the following example, which specifies that somebody is a minister of a certain ministry:
<affiliation role="ministerref="#GOV"  from="2020-08-01">  <orgName xml:lang="sl">Ministrstvo za obrambo</orgName>  <orgName xml:lang="en">Ministry of Defence</orgName> </affiliation>

It should be noted that ParlaMint makes no assumptions on the interconnection between various roles, e.g. we do not assume that if somebody has a minister role in the government that they are also a member of the government. Therefore it is necessary to specify all the desired affiliations with their particular roles, e.g. both as minister and as member.

It is important to give correct roles to the affiliations that associate a person with organisations. We list the most common roles and how they should be encoded, emphasising the ones that are obligatory in ParlaMint :

  • King or queen: affiliation/@role="head"org/@role="country"
  • President (as opposed to Prime minister): affiliation/@role="head"org/@role="republic"
  • Prime minister (or other head of government): affiliation/@role="head" & affiliation/@role="member"org/@role="government" (cf. Section on The government organisation)
  • Deputy primer minister: affiliation/@role="deputyHead" & affiliation/@role="member"org/@role="government"
  • Minister: affiliation/@role="minister" & affiliation/@role="member"org/@role="government"
    If ministries are defined, then also: affiliation/@role="head & affiliation/@role="member"org/@role="ministry"
  • Deputy minister: affiliation/@role="deputyMinister" & affiliation/@role="member"org/@role="government"
    If ministries are defined, then also: affiliation/@role="deputyHead & affiliation/@role="member"org/@role="ministry"
  • Leader of parliamentary group: affiliation/@role="head" & affiliation/@role="member"org/@role="parliamentaryGroup" (cf. Section on Political parties and parliamentary groups)
  • Member of parliamentary group: affiliation/@role="member"org/@role="parliamentaryGroup"
  • Leader of political party: affiliation/@role="head" & affiliation/@role="member"org/@role="politicalParty"
  • Member of political party: affiliation/@role="member"org/@role="politicalParty"
  • MP: affiliation/@role="member"org/@role="parliament" (cf. Section on The parliament organisations)

5.2. Organisations

Information on the the government, parliament, political parties or parliamentary groups of political parties, as well as other optional parliamentary structures (e.g. ministries, committees) is given in the corpus root <listOrg> element of the <particDesc> (cf. the Section on Participant description). The <listOrg> element then contains a series of <org> elements, each giving information about one organisation. The organisations are followed by the <listRelations> element giving a list of relations between organisations:
<listOrg>  <org>...</org>  <org>...</org>  <org>...</org> ... <listRelation>...</listRelation> </listOrg>
We exemplify the structure of one organisation element by giving the basic information about a parliamentary group of political parties:
<org role="parliamentaryGroup"  xml:id="group.LN-Aut">  <orgName full="yesxml:lang="it">Lega Nord e Autonomie</orgName>  <orgName full="abb">LN-Aut</orgName>  <event from="2013-03-15">   <label xml:lang="en">existence</label>  </event> </org>
First, each organisation must have an xml:id attribute, so that other elements (in particular, <person>s) can refer to it. The fact that the organisation is a parliamentary group of political parties is encoded in the role attribute, and we elaborate on this below. The name of the organisation group is given in the <orgName> element, which also uses the full attribute to distinguish between the full name of the party and the abbreviated name of the party group.

Organisations are also created and dissolved, and this information is encoded in the <event> element, which has as its <label> existence, and where the start of its existence is given in the from attribute; as there is no to attribute, this also means that the party still exists.

The example above gives the minimal required information about an organisation but ParlaMint also allows further data to be added, as exemplified by the following example:
<org xml:id="party.PS"  role="parliamentaryGroup">  <orgName full="yesxml:lang="sl">Pozitivna Slovenija</orgName>  <orgName full="yesxml:lang="en">Positive Slovenia</orgName>  <orgName full="abb">PS</orgName>  <event from="2011-10-22">   <label xml:lang="en">existence</label>  </event>  <idno type="URIsubtype="wikimedia"   xml:lang="sl">https://sl.wikipedia.org/wiki/Pozitivna_Slovenija</idno>  <idno type="URIsubtype="wikimedia"   xml:lang="en">https://en.wikipedia.org/wiki/Positive_Slovenia</idno> </org>
Here the full name of the organisation is also given in English, and there two external links giving URLs to further information about the organisation. In the example above, these are the Slovene and English Wikipedia pages, as indicated by the appropriate values of the xml:lang attribtes on the <idno> element.

Returning to the role attribute, it is the ParlaMint schema (cf. the Section on Validating ParlaMint corpora) that gives its set of allowed values. Currently, the list is quite long, as we left it up to the partners of ParlaMint I to determine the values, however, there are some that are common to all corpora, and, for some, it is obligatory to have organisations with these roles in the organisation list of a corpus. Furthermore, it is recommended that the organisations are listed in the order as given below, with the obligatory roles emphasised:

  1. country: the country taken as an organisation, which can be used to specify the king or queen of the country as a <person> affiliated with the country organisation with the role head;
  2. republic: the republic, which can be used to specify the president of the country as a <person> affiliated with the republic organisation with the role head;
  3. government: the government of the country or region;
  4. ministry: a particular ministry, which can be used to specify the minister as a <person> affiliated with the ministry organisation with the role head;
  5. parliament: the parliament, upper or lower house of the country or region;
  6. parliamentaryGroup: a grouping of political parties for the purpose of acting as one in the government;
  7. politicalParty: a political party.

We discuss the obligatory types of organisations and political parties in the following sections.

5.2.1. The government organisation

The government organisation, distinguished by the role government, is required to be present in the <listOrg> of every corpus root and is, by convention, the first organisation in the list. It gives the ID and name of the country or region government, and also contains the list of specific governments by using the dedicated list element for events, <listEvent>, which then gives the these governments as a series of <event> elements:
<org xml:id="ParlaMint-SI-GOV"  role="government">  <orgName xml:lang="slfull="yes">Vlada Republike Slovenije</orgName>  <orgName xml:lang="enfull="yes">Government of the Republic of Slovenia</orgName>  <event from="1990-05-16">   <label xml:lang="en">existence</label>  </event>  <idno type="URIsubtype="wikimedia"   xml:lang="sl">https://sl.wikipedia.org/wiki/Vlada_Republike_Slovenije</idno>  <idno type="URIsubtype="wikimedia"   xml:lang="en">https://en.wikipedia.org/wiki/Government_of_Slovenia</idno>  <listEvent>   <event xml:id="GOV.11from="2013-03-20"    to="2014-09-18">    <label xml:lang="sl">11. vlada Republike Slovenije (20. marec 2013 - 18. september 2014)</label>    <label xml:lang="en">11th Government of the Republic of Slovenia (20 March 2013 - 18 September 2014)</label>   </event>    ...  <event xml:id="GOV.14from="2018-03-13">    <label xml:lang="sl">14. vlada Republike Slovenije (13. marec 2020 - danes)</label>    <label xml:lang="en">14th Government of the Republic of Slovenia (March 13, 2020 - today)</label>   </event>  </listEvent> </org>

5.2.2. The parliament organisations

The parliament organisations, distinguished by the role parliament, are also required to be present in the <listOrg> of every corpus root and are, by convention, the second and third organisation in the list. There will be two parliament organisations for bicameral chambers, at least in cases where the corpus contains the transcripts of both the upper and the lower house. For unicameral ones, or if the transcripts contains only the lower house, there will be only one parliament organisation. Which of the three options the organisation encoded is determined by the value of the ana attribute, which refers to the appropriate ID of the category in the legislature taxonomy (cf. the Section on Class declaration). The ana attribute should also specify whether the parliament is national or regional, with the categories also specified in the legislature taxonomy. In short, the theoretically possible values of ana for the parliament organisations are:
  • #parla.national #parla.uni: national parliament, unicameral system
  • #parla.national #parla.lower: national parliament, lower house
  • #parla.national #parla.upper: national parliament, upper house
  • #parla.regional #parla.uni: regional parliament, unicameral system
  • #parla.regional #parla.lower: regional parliament, lower house
  • #parla.regional #parla.upper: regional parliament, upper house
Otherwise, the structure of the parliament organisations is identical to the one for the government, i.e. it gives the ID and name of the parliament and encodes the successive parliaments, using the <event> element:
<org ana="#parla.national #parla.lower"  role="parliamentxml:id="be_federal_parliament">  <orgName full="yesxml:lang="nl">Federaal Parlement van België</orgName>  <orgName full="yesxml:lang="en">Belgian Federal Parliament</orgName>  <event from="1831-02-07">   <label xml:lang="en">existence</label>  </event>  <listEvent>   <head xml:lang="nl">Zittingsperiode</head>   <head xml:lang="en">Legislative period</head>   <event to="2007-05-02from="2003-06-05"    xml:id="period_51">    <label xml:lang="nl">Zittingsperiode 51</label>    <label xml:lang="en">Legislative period 51</label>   </event>    ...  <event from="2019-06-20"    xml:id="period_55">    <label xml:lang="nl">Zittingsperiode 55</label>    <label xml:lang="en">Legislative period 55</label>   </event>  </listEvent> </org>

5.2.3. Political parties and parliamentary groups

In the scope of ParlaMint, very important organisations are political parties (distinguished by the role politicalParty) and, even more so, parliamentary groups that represent political parties in the parliament (distinguished by the role parliamentryGroup). These organisations are linked to <person> elements (i.e. speakers) so that is known to which political party or parliamentary group the speaker belongs to or represents in a certain moment of time, as further explained in the Section on Speaker affiliations.

ParlaMint requires that a corpus must use parliamentary groups, while the use of political parties is optional. Note that if political parties are used, it is also expected to encode which political parties constitute a parliamentary group; this is encoded via the <relation> element, as further explained in the Section on Relations between organisations.

The introduction to this chapter already gave examples of how organisations are encoded in general, so we here only give examples of the encoding of the additional metadata that can also be associated with political parties or parliamentary groups, i.e. their political orientation on the left-to-right scale and the variables of the Chapel Hill Expert Surveys for Europe, CHES for short. This additional metadata is encoded in the <state> element(s), which should be the last element(s) in the <org>.

5.2.3.1. Encoding political orientation

Political orientation is encoded with the <state> element with the value of its type attribute equal to politicalOrientation. The nested <state> elements then give the type of the information, which can be either the (corpus) encoder or Wikipedia, its source (either a pointer to the ID of the person for encoder, or to the Wikipedia URL), and the reference to the category definition (defined in the politicalOrientation taxonomy) via the ana attribute, as is illustrated in the example below:
<org role="parliamentaryGroupxml:id="MR">  <orgName full="abb">MR</orgName>  <orgName full="yes">Mouvement Réformateur</orgName>  <idno type="URIsubtype="wikimedia">https://en.wikipedia.org/wiki/Reformist_Movement</idno>  <state type="politicalOrientation">   <state type="encoder"    source="#GrietDepoorterana="#orientation.CRR">    <note xml:lang="en">Orientation determined by encoder, using own knowledge of the parliamentary group.</note>   </state>   <state type="Wikipedia"    source="https://en.wikipedia.org/wiki/Reformist_Movementana="#orientation.CR">    <note xml:lang="en">From 1992 the Reformist Movement (MR) consisted of: FDF, MCC, PRL and PFF. In September        2001, FDF decides to leave the alliance and chooses a new name, becoming DeFI.</note>   </state>  </state> </org>
Note also that a <state> may have a note that gives furuther free-text information about the orientation.

5.2.3.2. Encoding CHES variables

The second type of metadata on organisations, in particular on political parties and parliamentary groups comes from the Chapel Hill Expert Surveys for Europe (CHES), either from the 1999-2019 edition, of from the 2019 edition. Here the top-level <state> element gives the type of the state, i.e. CHES and the URL of the CSV source for the information. Its <label>4 gives the abbreviation of the political party name in CHES, which can, and often does, differ from its ParlaMint abbreviation. Each subordinate <state> (of type variable) then encodes one CHES variable, which is given, via the ana attribute, as the reference to the appropriate category defined in the CHES taxonomy (cf. the Section on Class declaration and taxonomies). Finally, as CHES gives the values of its variables according to years, the third level of <state> (of type value) stores the periods of the variable together with its numeric value in the n attribute, as illustrated in the example below:
<state type="CHES"  source="https://www.chesdata.eu/s/1999-2019_CHES_dataset_meansv3.csv">  <label>   <orgName full="abbfrom="2002to="2018"    xml:lang="en">MR</orgName>  </label>  <state type="variableana="#ches-lrgen">   <state type="valuefrom="2002to="2005"    n="6.35"/>   <state type="valuefrom="2006to="2009"    n="6.67"/>   <state type="valuefrom="2010to="2013"    n="7.0"/>   <state type="valuefrom="2014to="2018"    n="7.0"/>  </state>  <state type="variableana="#ches-lrecon">   <state type="valuefrom="2002to="2005"    n="7.3"/>   <state type="valuefrom="2006to="2009"    n="7.5"/>   <state type="valuefrom="2010to="2013"    n="7.62"/>   <state type="valuefrom="2014to="2018"    n="7.60"/>  </state> ... </state>

5.2.3.3. Relations between organisations

As mentioned, the relations between various organisations, in particular, which parliamentary groups of political parties are in the coalition or in opposition and when, are encoded in the final element of <listOrg>, namely <listRelation>, which then contains <relation> elements, as shown in the example below:
<listRelation>  <relation name="coalition"   mutual="#parliamentaryGroup.MR #parliamentaryGroup.OpenVldfrom="2014-10-11to="2018-12-09"   ana="#period_54"/>  <relation name="opposition"   active="#parliamentaryGroup.Ecolo #parliamentaryGroup.cdHpassive="#government.BE"   from="2014-10-11to="2018-12-09ana="#period_54"/>  <relation name="representing"   active="#parliamentaryGroup.MR"   passive="#politicalParty.CSSD.153 #politicalParty.ENO.1from="2013-10-29to="2017-10-26"/> </listRelation>
The type of relation is given in the name attribute. ParlaMint allows the following values of name:
  • coalition: the pointers to the organisations (i.e. parliamentary groups or political parties) are given in the mutual attribute (because a coalition is mutual relation betweent its members);
  • opposition: the pointers to the organisations are given in the active attribute, as the organisations are in an active relation to the government, the pointer to which is given in the passive attribute;
  • representing: a parliamentary group representing one or more political parties in the parliament. The parliamentary group is given as the value of the active attribute, while the political parties are given as the value of the passive attribute.
  • renaming: the two organisation (typically political parties) referred to are essentially the same organistion, which has been, however, renamed at some point in time; the reference to the old organisation is given in the passive attribute, while the reference to the new one is given in the active attribute;
  • successor: an organisation (again, typically a political party), or several of them, ceased to exist, but a sucessor was created; as with renaming, the previous organisation is given as the value of the passive attribute, while the new one uses the active attribute.
For the relations it is typically also necessary to specify from and possibly to when the relation was in force. Finally, it is possible, but not necessary, to also give the legislative period and/or the government when this particular coalition or opposition existed.

6. Transcriptions

The transcriptions are encoded in the <text> element of corpus components. This element contains only the element <body>, which should then contain at least one division, <div>, as illustrated below:
<text ana="#reference">  <body>   <div type="commentSection">...</div>   <div type="debateSection">...</div>   <div type="debateSection">...</div>    ...  </body> </text>
As shown, the <text> element should be (as is the top level <TEI> element, as discussed in the Section on Attributes of top-level elements) marked with the ana attribute as to which subcorpus the text belongs to, with the subcorpora themselves defined in the appropriate taxonomy (cf. the Section on Class declaration).

6.1. Divisions

A text body contains a series of divisions, <div> in cases when the source document can be reliably split into sections, which is typically done on the basis of headings identified in the source. When this is not possible, the complete body will be just one division.

In ParlaMint we have two types of divisions, which are distinguished by the value of their (required) type attribute. If its value is debateSection, then the divisions must contain at least one speech, while the value commentSection must not contain any speeches, i.e. it contains transcriber (or other) comments only, e.g. the table of contents, references to laws etc.

Inside a debateSection-type division, the main elements of interest are speeches, encoded as the utterance element, <u>. However, this type of division can (and the commentSection-type must) also contain headings and notes by the transcribers that serve to structure and comment the speeches, as well as page breaks, as illustrated below:
<text ana="#reference">  <body>   <div type="debateSection">    <pb n="1"/>    <head>Child Poverty Unit</head>    <note>Question</note>    <note>Asked by</note>    <u>...</u>      ...   <pb n="2"/>      ...   </div>   <div type="debateSection">    <head>Trade Union Act 2016 (Political Funds)</head>    <note>Motion to Approve</note>    <note>Moved by</note>    <u>...</u>      ...   </div>    ...  </body> </text>
It should be noted that the <head> element can appear only at the start of a division while notes can be interspersed with speeches anywhere inside a division, and can also appear inside speeches, i.e. inside <u> elements. It is also possible to specify the type of note, and, in fact, to use more precise elements than just <note>, which is further explained in the Section on Transcriber comments.
Page breaks, <pb> are an optional element, and can also appear both inside divisions and inside speeches or segments (as well as inside the <note> element, and, for the lingustically analysed version, inside sentences). They are used to preserve the page breaks from the digital source, possibly together with the page number, as the value of the n attribute. They can also point to the source of a particular page in the corpus via the source attribute and to their media file via the corresp attribute (cf. the Section on Source description and esp. the Example on the source description of a component file), as illustrated in the example below:
<div type="debateSection">  <pb n="1"   source="https://www.psp.cz/eknih/2013ps/stenprot/001schuz/s001001.htmcorresp="#ps2013-001-01-000-000.audio1"/>  <note type="speaker">Předsedající Miroslava Němcová</note> ... </div>

6.2. Utterances

A speech is marked up using the <u> (utterance) element, as illustrated below:
<u who="#DavidPriorana="#regular">  <seg>I ask that the draft Regulations laid before the House on 5 December be approved.</seg>  <seg>The relevant document is the 20th Report from the Legislation Committee.</seg> </u>
The most important attribute of an utterance is who, which gives the pointer to the <person> element containing the metadata of the speaker, which is discussed in the Section on Speakers. Despite its importance in allowing analyses of speeches by speaker and their metadata, it does happen that the speaker of every speech cannot be determined; for such cases, the who attribute should be omitted.

The <u> element should also have the ana attribute giving a pointer to the typology of types of speakers, which is especially important to enable the distinction between the speeches of a session chair (who mostly speak on procedural matters) from regular, and, possibly, guest speakers. Note that we used the #regular values not only for MPs but for all other speakers that can regularly speak in a parliament, e.g. ministers, the MP, members of parlimentary commissions etc. There is also a special type of speaker, called #interrupting, which we discuss further in the Section on Interrupted utterances.

The utterances are then segmented using the <seg> element, which encodes the paragraphs of the source transcription. Even if the source files do not contain paragraph markings, each speech should contain at least one segment.

Finally, an utterance (just as a division) can also contain transcriber comments (notes), as further detailed in the next section.

6.3. Transcriber comments

Transcriber comments give information on who spoke, what the time was, interruptions and the reason for them, what is happening in the chamber, results of voting, etc. While section headings can also be taken as a kind of transcriber comments, these serve to structure the transcription and are encoded as <head> elements, as explained at the start of this chapter, cf. the Example there. Another type of transcribe comment treated separately is the presence of gaps in the transcript; these are treated in the Section on Gaps.

Apart heads and gaps, transcriber comments are encoded using the <note> element or one of several so called ‘incident’ elements, as explained below. These elements can be placed directly inside <div>, <u>, <seg> or even <s> in the linguistically annotated version. They should be placed as far up the hierarchy as possible, i.e. if they would appear at the start or end of a segment or utterance, to encode them before the start, or, respectively, after the end of this segment or utterance. If possible, it is especially conventient not to have them inside <seg> (which contains text), as placing these elements there leads to mixed content, which is more difficult to process further, in particular when linguistically annotating the corpus. Similary, it is also better to move them outside <s> elements. However, if a transcriber comments were placed in the middle of the text for good reasons then they can be encoded inside the segment or sentence. Note, however, that utterances can also be split on transcriber comments, as is explained in the Section on Interrupted utterances.

6.3.1. Notes

In general, transcriber comments are encoded using <note>, which can be further qualified via its type attribute. We do not currently specify what the valid values of this attribute are. Some comments can also be encoded using more precise TEI elements, as further explained below. The following example gives typical transcriber comments:
<note type="speaker">The president, Dr. Milan Brglez:</note> ... <note type="vote-ayes">84 voted for the adoption of the measure.</note> ... <note type="vote-noes">2 voted against the adoption of the measure.</note> ... <note type="time">The session began at 10 o'clock.</note> ...
The first note simply gives the speaker of the utterance that would follow it, the second and third are notes on the voting results, while the fourth gives the time when the session started. Note that in this case we can also explicitly add the time when the sessions started, as in the following example:
<note type="time">The session began at <time when="2016-04-13T010:00:00">10 o'clock</time>.</note>
Note that, in ParlaMint, the when attribute of <time> must contain not only the time, but the date as well, so that users of the corpus do not need to infer it.

6.3.2. Incidents

Some types of transcriber comments, which we term incidents can be encoded using more specific TEI elements. These elements can also be further qualified by the type attribute, with the values being determined by the ParlaMint schema. The three incident elements are:
  • <vocal> marks any vocalised but not necessarily lexical phenomenon, with the values of the (for this element) obligatory type attribute being, for example interruption, laughter, murmuring etc.
  • <kinesic> marks any communicative phenomenon, not necessarily vocalised, with the optional type values being e.g. applause, laughter, gesture.
  • <incident> marks any phenomenon or occurrence, not necessarily vocalised or communicative, with the optional type values being e.g. break, sound, action.
The example below illustrates the use of these three elements:
<vocal type="interruption">  <desc>sounds from the chamber</desc> </vocal> ... <kinesic type="signal">  <desc>signal for end of debate</desc> </kinesic> ... <incident type="action">  <desc>minute of silence</desc> </incident>
As the example shows, the original content of the transcriber comment is retained in the <desc> element. Note that in cases when the agent of an incident is known, they can be specified in the optional who attribute, just as on utterances.
While the incidents must have at least one <desc> element, they can also have several, so that the description of the incident can be, if so desired, translated into English, as illustrated in the following example:
<vocal type="interruption">  <desc xml:lang="sl">oglašanje z dvoraner</desc>  <desc xml:lang="en">sounds from the chamber</desc> </vocal>

6.4. Gaps

The transcribers can also note that a part of the speech was not transcribed, typically because it was not understood, sometimes also noting the reason why, such as that the microphone was not turned on, that there was noise in the chamber, or that the speaker was speaking too quietly. These notes can be encoded as the <gap> element, which is then also marked by reason=inaudible. The original transcriber comment is left in the <desc> element, as illustrated below:
... I would further state that <gap reason="inaudible">  <desc>speaker spoke too quietly, not understood</desc> </gap> and furthermore ...
Another reason for omitting a part of the transcription can be an editorial decision of the corpus compilers. The transcript can, for example, contain material that they do not want to include in the corpus, such as tables, or parts of the transcription that for technical reasons cannot be converted to text. In these cases, the reason given should be editorial, while the <desc> should contain what has been omitted, as illustrated below.
<gap reason="editorial">  <desc xml:lang="en">Table omitted</desc> </gap>
Sometimes a passage of the transcription is in a foregin language, and, esp. as the corpus is to be linguistically annotated, the passage is best left out of the transcription proper. This can be achieved by encoding it as a gap in the transcription with the reason foreign, while the <desc> should contain the omitted text. In this case therefore the description does not give the reason for the ommission, but rather the text that has been ommited. The language of the foreign passage should be indicated on the xml:lang attribute of <desc>. If the language has not been identified, the ISO 639 code for undetermined language ‘und’ can be used, while in cases where more than one language is used in such a passage, the ‘mul’ code for multiple languages is used. All languages used on <desc> should of course be documented in the <langUsage> element. Below an example:
<gap reason="foreign">  <desc xml:lang="und">Huliniahuanngittunga</desc> </gap>
As with incidents, gaps can also contain several descriptions so that they can be, if so desired, translated:
<gap reason="editorial">  <desc xml:lang="de">Zitierte Druckfassung entfernt</desc>  <desc xml:lang="en">Quoted printed matter omited</desc> </gap>

6.5. Interrupted utterances

A special case occurs when a transcription note states that somebody interrupted the speaker and gives the transcript of the interruption, possibly with who interrupted, with the main speaker then continuing with their speech, as in the following made up snippet:
Boris Johnson: I propose a no-deal Brexit. /Jeremy Corbyn: Traitor!/ Because England does not want any dealings with the European Union.
The standard manner in which such interruptions are encoded is using the default <note> element, or, much better, the <vocal> element, as explained in the Section on Incidents, as below:
<u who="#BorisJohnsonana="#regular">  <seg>I propose a no-deal Brexit. <vocal type="interruption">    <desc>Jeremy Corbyn:        Traitor!</desc>   </vocal> Because England does not want any dealings with the European    Union.</seg> </u>
This solution is relatively easy to implement and valid in ParlaMint, however, it has the disadvantage of leaving what is essentially a speech as the content of a comment. In cases where it is possible to consistently identify such ‘mini speeches’, an alternative and more useful encoding will turn this comment into a separate speech, and split the main utterance into two (or more) pieces. The example below illustrates how this is encoded:
<u who="#BorisJohnsonana="#regular"  xml:id="GB001.8.3next="#GB001.8.5">I propose a no-deal Brexit.</u> <u who="#JeremyCorbyn"  ana="#regular #interruptingxml:id="GB001.8.4">Traitor!</u> <u who="#BorisJohnsonana="#regular"  xml:id="GB001.8.5prev="#GB001.8.3">Because England does not want any dealings with the European Union.</u>
As can be seen, the split is indicated by the use of the next attribute on the first part of the split utterance and by the prev attribute of the next part of the split utterance, while the fact that an utterance interrupts another one is signaled by the addition of #interrupting to the ana attribute. The values of the next and prev attributes are pointers to the next of previous identifiers of the appropriate part of the split utterance.5.

In the example the speaker of the interrupting speech has also been identified and marked in the who attribute; in cases where this is not possible, this attribute can be omitted. As mentioned this speaker should also have the value #interrupting in their ana attribute; this value comes from the appropriate category of the ParlaMint speaker type taxonomy. In case the speaker is identified, and their status can be determined, ana should also contain the type proper of the speaker, i.e. whether they are #chair, #regular or #guest speaker is assumed.

7. Linguistic annotation

This section introduces the ParlaMint linguistic annotation. An important note is that a linguistically annotated ParlaMint corpus is stored separately from its base (or plain-text) version, i.e. the version that has been discussed in the preceding sections. The encoding of the linguistically annotated version differs from the plain-text one in the following:

7.1. Linguistic markup

Linguistic annotation is added only to the text content of <seg> elements inside the speeches, i.e <u> elements. For this text, ParlaMint requires the following additional markup to be present:

  • tokens: what is a word, and what is punctuation, with preserved information on inter-token spaces;
  • sentences: what is a sentence;
  • lemmas: what is the base form of each word;
  • Universal Dependencies (UD) part-of-speech and morphological features, and, optionally, part-of-speech tags from a different (local) tagset;
  • named entities (NE): what is a name, categorised at least into the standard four NE classes;
  • the UD dependency syntactic parse of the sentences;
  • USAS semantic annotations on words and phrases but only for the machine translated corpora.

Below, we explain the encoding of each of these levels.

7.1.1. Word-level annotation

Basic linguistic annotation comprises tokenisation, sentence segmentation, part-of-speech tagging and lemmatisation, and this mark-up is illustrated in the example below:
<s>  <w msd="UPosTag=DET|Case=Gen|Gender=Neut|Number=Sing|PronType=Dem"   lemma="ta">Tega</w>  <w msd="UPosTag=PRON|PronType=Prs|Reflex=Yes|Variant=Short"   lemma="se">se</w>  <w msd="UPosTag=PARTlemma="sploh">sploh</w>  <w msd="UPosTag=AUX|Mood=Ind|Number=Sing|Person=1|Polarity=Neg|Tense=Pres|VerbForm=Fin"   lemma="biti">nisem</w>  <w msd="UPosTag=VERB|Aspect=Perf|Gender=Masc|Number=Sing|VerbForm=Part"   lemma="zavestijoin="right">zavedel</w>  <pc msd="UPosTag=PUNCT">.</pc> </s>
Sentences are marked up using the <s> element, words with the <w> element and punctuation symbols with the <pc> element. To retain the linguistically significant whitespace, the join element with the fixed value right is used, meaning there should be no whitespace to the right of the token. There can be (depending on the language and annotation tool used) an added complication with tokenisation, which is further taken up in the next Section on Syntactic words.

The base form or lemmas of a word is given as the value of the lemma attribute, while punctuation characters, <pc>, do not have this attribute.

The UD part-of-speech and morphological features are both packed in the msd attribute, with the part-of-speech having the UPosTag linguistic attribute, and the features separated by the vertical bar.

ParlaMint also allows (but does not require) part-of-speech tags from some other tagset6 to be added to the linguistic annotation. Where this information is encoded, depends on the type of tagset.

For synthetic tagsets, such as the Penn Treebank tagset, which have atomic tags that cannot always be decomposed into attribute-value pairs (e.g. the tag ‘TO’ for the word ‘to’) should be encoded using the pos on words and punctuation symbols, as shown in the example below:
<s>  <w lemma="I"   msd="UPosTag=PRON|Case=Nom|Number=Sing|Person=1|PronType=Prspos="PRP">I</w>  <w lemma="support"   msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Finpos="VBP">support</w>  <w lemma="the"   msd="UPosTag=DET|Definite=Def|PronType=Artpos="DT">the</w>  <w lemma="amendment"   msd="UPosTag=NOUN|Number=Singpos="NNjoin="right">amendment</w>  <pc msd="UPosTag=PUNCTpos=".">.</pc> </s>
For analytic tagsets, where a part-of-speech tag can be always decomposed into a set of attribute-values, the pointing attribute ana should be used. An example of such a collection of tagsets for various languages is given in the MULTEXT-East morphosyntactic specifications, and we give below an example that uses this tagset:
<s>  <w ana="mte:Vmpr1plemma="prehajati"   msd="UPosTag=VERB|Aspect=Imp|Mood=Ind|Number=Plur|Person=1|Tense=Pres|VerbForm=Fin">Prehajamo</w>  <w ana="mte:Salemma="na"   msd="UPosTag=ADP|Case=Acc">na</w>  <w ana="mte:Ncnsajoin="right"   lemma="odločanje"   msd="UPosTag=NOUN|Case=Acc|Gender=Neut|Number=Sing">odločanje</w>  <pc ana="mte:Zmsd="UPosTag=PUNCT">.</pc> </s>
The mte: is a prefix that is, via the TEI extended pointer syntax as defined in the TEI header (cf. the Section on Prefix definitions) expanded so that the value of such an ana attribute points to the expansions of the given tag to a feature structure. For example, the value mte:Vmpr1p would be expanded to https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p, which then resolves to the feature-structure below:
<fs xml:id="Vmpr1pxml:lang="en"  corresp="#Ggnspm">  <f name="CATEGORY">   <symbol value="Verb"/>  </f>  <f name="Type">   <symbol value="main"/>  </f>  <f name="Aspect">   <symbol value="progressive"/>  </f>  <f name="VForm">   <symbol value="present"/>  </f>  <f name="Person">   <symbol value="first"/>  </f>  <f name="Number">   <symbol value="plural"/>  </f> </fs>

7.1.2. Syntactic words

Certain frameworks, in particular the UD one (cf. their information on Tokenization and Word Segmentation and on Words, Tokens and Empty Nodes), allow for tokens to be decomposed into several words, and it is these syntactic words, and not tokens, that are further annotated.

To allow for such mismatches between word tokens and syntactic words, we use embedded empty words with associated norm attributes and the standard attributes with linguistic annotation. For example, Czech has the word ‘abyste’ which is in UD decomposed into two syntactic words, ‘aby’ and ‘byste’. This should be encoded as in the following example7:
<w>abyste <w norm="abylemma="aby"   msd="UPosTag=SCONJ"/>  <w norm="bystelemma="být"   msd="UPosTag=AUX|Mood=Cnd|Number=Plur|Person=2|VerbForm=Fin"/> </w>
Note also that if such a multi-word token does not have a space following it, join="right" should be added to the top level word.
While we do not have examples of such a practice yet, there could also be cases where two (or more) tokens correspond to one syntactic word. In such cases, it is the syntactic word that is on the top level, while the inner words are the actual tokens. To take an example from historical language, Slovene used to form the superlative form of adjectives with the word ‘naj’ written separately (and often as ‘nar’), while in contemporary Slovene, the ‘naj’ is a prefix of the adjective. This would be encoded as follows:
<w norm="najlepšilemma="lep">  <w>nar</w>  <w>lepši</w> </w>
In this case, if such a multi-token syntactic word would not have a space following it, join="right" should be added to the last token, i.e. ‘lepši’.

7.1.3. Named entities

ParlaMint also requires annotation of Named Entities (NE), which should be categorised into the following four types:

  • PER: person
  • LOC: location
  • ORG: organisation
  • MISC: miscellaneous

These types are also specified in a specialised taxonomy, as further explained in the Section on Linguistic taxonomies.

The identified names and their type are marked up as the <name> element with the appropriate value of its type attribute, as shown in the example below:
... <w lemma="andmsd="UPosTag=CCONJ">and</w> <name type="ORG">  <w lemma="Westminster"   msd="UPosTag=PROPN|Number=Sing">Westminster</w>  <w join="rightlemma="Hall"   msd="UPosTag=PROPN|Number=Sing">Hall</w> </name> <w lemma=",msd="UPosTag=PUNCT">,</w> ...
ParlaMint also supports more complex NE annotation schemes, such as the one used for Czech data, which introduces very detailed NE types and also allows for nested named entities. The example below gives such a case, where the top level and ParlaMint-compatible person name also contains nested names:
<name type="PERana="ne:p">  <name ana="ne:pf">   <w>Františka</w>  </name>  <name ana="ne:ps">   <w>Laudáta</w>  </name> </name>
Here, the language specific NE annotations are given as the value of the ana attribute, which are pointers using the TEI extended pointer syntax (cf. the Section on Prefix definitions) into a corpus-specific taxonomy that defines these local NE types (for how ParlaMint linguistic taxonomies are defined, cf. the Section on Linguistic taxonomies).

7.1.4. Syntactic parses

Sentences are accompanied by a Universal Dependencies parse. These analyses are encoded inside their sentence mark-up, although in a stand-off manner. This means that each token must be given an ID, while the syntactic analysis is stored in the link group, <linkGrp> element containing a series of <link> elements. Each is labeled by a dependency label and joins two tokens. The example below illustrates the syntactic encoding:
<s xml:id="ParlaMint-GB_2021-01-06.seg393.8">  <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.1">I</w>  <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.2">support</w>  <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.3">the</w>  <w join="right"   xml:id="ParlaMint-GB_2021-01-06.seg393.8.4">amendment</w>  <pc xml:id="ParlaMint-GB_2021-01-06.seg393.8.5">.</pc>  <linkGrp targFunc="head argument"   type="UD-SYN">   <link ana="ud-syn:nsubj"    target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.1"/>   <link ana="ud-syn:root"    target="#ParlaMint-GB_2021-01-06.seg393.8 #ParlaMint-GB_2021-01-06.seg393.8.2"/>   <link ana="ud-syn:det"    target="#ParlaMint-GB_2021-01-06.seg393.8.4 #ParlaMint-GB_2021-01-06.seg393.8.3"/>   <link ana="ud-syn:obj"    target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.4"/>   <link ana="ud-syn:punct"    target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.5"/>  </linkGrp> </s>
The example shows that each token, as well as the sentence element should be given an xml:id attribute, that the link group comes at the end (but inside) the sentence, that <linkGrp> has two attributes targFunc and type, both with the fixed values of head argument and UD-SYN, and that it contains a series of empty <link> elements.

The link elements then give, via the value of their target attribute, references the head and argument tokens of the syntactic relation, which is specified in the ana attribute. By convention, the links are ordered so that the argument references follow the ordering of the tokens in the sentence, i.e. all the tokens in the sentence should appear in order in the second position. Note that for the top level root relation (of which there should be only one in the sentence), the head is the reference to the sentence ID.

The relations themselves are pointers which use the ud: prefix that is, via the TEI extended pointer syntax as defined in the TEI header (cf. the Section on Prefix definitions) expanded so that the value of such an ana attribute points to the categories of the special UD syntactic taxonomy which must be a part of the linguistically annotated version of the corpus; how to insert this taxonomy is specified in the Section on Linguistic taxonomies. There is one more detail to watch out for, namely, that UD allows the colon symbol : to appear in extended relations, e.g. acl:relcl for relative clause modifier. As we already use the colon for the extended pointer prefix, the colons in the relations should be changed to underscore, e.g. to ud-syn:acl_relcl. Note, however, that the relations specified in a <link> ana attribute are just pointers, and could have any value; it is the UD taxonomy that actually determines the correct value of the relation.

7.1.5. Semantic annotation

The machine translated ParlaMint corpora (cf. the Section on Translations of corpora) have one more level of annotation, namely the semantic annotation of tokens and phrases (i.e. Multi-Word Expressions or MWEs) with USAS semantic tags. First, in order to be able to semantically tag MWEs, a new element was introduced inside sentences, namely phrase, <phr>8. The USAS semantic tags are then encoded in two attributes of tokens or MWEs, namely function and ana, as illustated in the example below:
<w function="Z5ana="sem:Z5">the</w> <w function="G1.1/S2mf,S9/S2mf"  ana="sem:G1.1 sem:S2">Minister</w> <w function="Z5ana="sem:Z5">of</w> <name type="ORG">  <w function="I1ana="sem:I1">Finance</w> </name> <w function="Z5ana="sem:Z5">and</w> <phr type="semfunction="Z1mf,Z3c"  ana="sem:Z1">  <w function="Z1mf,Z3cana="sem:Z1">Deputy</w>  <w lemma="Primefunction="Z1mf,Z3c"   ana="sem:Z1">Prime</w>  <w lemma="Ministerfunction="Z1mf,Z3c"   ana="sem:Z1">Minister</w> </phr>
The first thing to note is that all the tokens inside a MWE receive identical semantic markup as its encompassing MWE <phr> element. Second, the function attribute gives the USAS tags exactly as output by the tool used for semantic tagging, which includes not only the tag computed to be the most appropriate for the given context, but also all the other lexically possible tags, with the comma being the separator. The first tag is then used to compute the values of the ana attribute. Here the conjunctive USAS tag (here the delimiter is the slash) is transformed into a series of references to the USAS taxomomy (cf. the Section on Linguistic taxonomies) while also removing tag qualifiers. For example, the tag G1.1/S2mf is transformed into sem:G1.1 sem:S2 i.e. the mf (for male and female) qualifiers are removed from S2mf. Note also that the prefix sem (cf. the Section on Prefix definitions) is used for pointing into the USAS taxonomy. With this set-up it is possible to encode the exact USAS tags as well as their ParlaMint categories, which give the not only the tag, but also its gloss.

7.2. Metadata for linguistic annotation

What kind of metadata a plain-text ParlaMint corpus should contain was explained in the Section on Corpus metadata and in this section we detail what additions must be made to the metadata for the linguistically annotated version. Note that the changes for this version have been already explained at the start of this Chapter. In short, there are three additional parts that should be added to the <teiHeader> of the corpus root, namely a description of the tool(s) used to linguistically annotate the corpus, two additional taxonomies (one for named entities, and one for UD syntactic relations) and the definition of the prefix expansions for UD syntactic relations. These descriptions should also serve as the point of departure for those that want to introduce their own prefixes and taxonomies for defining additional and corpus-specific part-of-speech tagging schemes or named entity classes.

7.2.1. Application information for linguistic processing

As the linguistic analysis of a ParlaMint will be performed by a tool, the information on which tool (or tools) have been used should be documented in the corpus root TEI header. This information is encoded in the <appInfo> element of the <encodingDesc>, as shown in the example below:
<appInfo>  <application version="1.0ident="classla">   <label>CLASSLA</label>   <desc xml:lang="en">Linguistic processing performed with with CLASSLA trained for      Slovene, available from <ref target="https://github.com/clarinsi/classla">https://github.com/clarinsi/classla</ref>.</desc>  </application> </appInfo>
The <appInfo> element contains, in general, a series of <application> elements, each one giving the information on one tool. The element gives the version number of the tool and specifies, via ident, and identifying code. It has two subordinate elements, with <label> giving the name of the tool and <desc> a short description of it, preferably with a pointer to the URL where it can be found or is at least documented.

7.2.2. Linguistic taxonomies

Some linguistic annotations have fixed vocabularies and these should be encoded as taxonomies in the TEI header of the linguistically analysed corpus root, similarly to other taxonomies, as discussed in the Section on the Class declaration.

The first taxonomy is the Named Entity types, which has - apart from translating the categories into the local language - a fixed structure, as follows:
<taxonomy xml:id="ParlaMint-taxonomy-NER.ana">  <desc xml:lang="en">   <term>Named entities</term>  </desc>  <category xml:id="PER">   <catDesc xml:lang="sl">    <term>oseba</term>   </catDesc>   <catDesc xml:lang="en">    <term>person</term>   </catDesc>  </category>  <category xml:id="LOC">   <catDesc xml:lang="sl">    <term>lokacija</term>   </catDesc>   <catDesc xml:lang="en">    <term>location</term>   </catDesc>  </category>  <category xml:id="ORG">   <catDesc xml:lang="sl">    <term>organizacija</term>   </catDesc>   <catDesc xml:lang="en">    <term>organisation</term>   </catDesc>  </category>  <category xml:id="MISC">   <catDesc xml:lang="sl">    <term>drugo</term>   </catDesc>   <catDesc xml:lang="en">    <term>miscellaneous</term>   </catDesc>  </category> </taxonomy>
The second taxonomy to be inserted is the one for Universal Dependency relations. We currently do not use corpus specific taxonomies, even though different languages use different subsets of the UD syntactic relations, but rather a common taxonomy giving all the UD relations; the taxonomy has currently also not been localised, i.e. it is available in the English language only. Below we illustrate by giving a few relation definitions:
<taxonomy xml:id="ParlaMint-taxonomy-UD-SYN.ana">  <desc xml:lang="en">   <term>UD syntactic relations</term>  </desc>  <category xml:id="acl">   <catDesc xml:lang="en">    <term>acl</term>: Clausal modifier of noun (adjectival clause)</catDesc>  </category>  <category xml:id="cc_preconj">   <catDesc xml:lang="en">    <term>cc:preconj</term>: Preconjunct</catDesc>  </category>  <category xml:id="dep">   <catDesc xml:lang="en">    <term>dep</term>: Unspecified dependency</catDesc>  </category>  <category xml:id="punct">   <catDesc xml:lang="en">    <term>punct</term>: Punctuation</catDesc>  </category>  <category xml:id="root">   <catDesc xml:lang="en">    <term>root</term>: Root</catDesc>  </category> </taxonomy>
The ID and description, <desc> of the <taxonomy> are fixed, and the <category> elements have the usual structure. Note that the ID of a category is identical to its name given in <term>, except that the colon, : in the official name of the relation must be substituted by the underscore, _, to enable correct referencing of these IDs, as discussed in the Section on Syntactic parses.
The third taxonomy is only relevant for the machine translated corpora (cf. the Section on Translations of corpora) and gives the categories of the USAS semantic tags (for their encoding in the corpora cf. the Section on Semantic annotation). This taxonomy is only available in the English language and is identical for all the machine translated corpora. Below we illustrate its structure by giving the start of the taxonomy:
<taxonomy xml:id="ParlaMint-taxonomy-USAS.ana"  xml:lang="en">  <desc xml:lang="en">   <term>USAS categories</term>: Semantic categories following the USAS Semantic tagset, ...</desc>  <category xml:id="A1">   <catDesc>    <term>A1</term>: General And Abstract Terms</catDesc>   <category xml:id="A1.1.1">    <catDesc>     <term>A1.1.1</term>: General actions / making</catDesc>    <category xml:id="A1.1.1n">     <catDesc>      <term>A1.1.1-</term>: Inaction</catDesc>    </category>   </category>   <category xml:id="A1.1.2">    <catDesc>     <term>A1.1.2</term>: Damaging and destroying</catDesc>    <category xml:id="A1.1.2n">     <catDesc>      <term>A1.1.2-</term>: Fixing and mending</catDesc>    </category>   </category>    ...  </category> ... </taxonomy>
The current USAS taxonomy covers a subset of all USAS semantic tags and was derived from the official list of USAS semantic subcategories. A category in the taxonomy can contain up to one positive (USAS = '+', taxonomy = 'p') or negative (USAS = '-', taxonomy = 'n') modifier. Other USAS modifiers (i.e. regex [mfnci%@]) are not retained. The taxonomy includes 455 categories, each with its USAS code and gloss.

7.2.3. Prefix definitions

Pointing attributes, such as ana, take as their value a series of references to the value of xml:id elements in an XML document. If this is the same document, then the reference to the ID is the hash character, # prefixed to the particular ID, e.g. #parla.uni, and if they are in another XML document, then the hash is prefixed with the URL of the document, e.g. https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p.

Because the complete URL tends to be long, which is especially inconvenient when such references are given to every token in a corpus, TEI introduces the so called Extended pointer syntax, whereby the reference to an ID can be given in the form of a prefix, which is separated by a colon from the local part of the ID reference, and the value of this prefix is determined via the <prefixDef> element in the <profileDesc> of the TEI header.

ParlaMint uses this mechanism for all linguistic annotations with a closed vocabulary, in particular for the Universal Dependencies syntactic relations, for the optional and corpus-specific analytical part-of-speech tags (c.f. the Sections on Syntactic parses and Word-level annotation), and for semantic annotation in the machine translated corpora (c.f. the Section on Semantic annotation). The example below illustrates the prefix definitions for the obligatory UD syntactic relations and for the optional MULTEXT-East tags:
<listPrefixDef>  <prefixDef ident="ud-syn"   matchPattern="(.+)replacementPattern="#$1">   <p xml:lang="en">Private URIs with this prefix point to elements giving their name. In this document they are simply local references into the UD-SYN taxonomy categories in the corpus root TEI header.</p>  </prefixDef>  <prefixDef ident="mtematchPattern="(.+)"   replacementPattern="http://nl.ijs.si/ME/V6/msd/tables/msd-fslib-sl.xml#$1">   <p xml:lang="en">Private URIs with this prefix point to feature-structure elements defining the Slovenian MULTEXT-East Version 6 MSDs.</p>  </prefixDef> </listPrefixDef>
The specialised element for listing prefix definitions, <listPrefixDef> gives a series of prefix definitions, i.e. <prefixDef> elements. Each prefix definition defines its prefix as the value of the ident attribute, and then specifies a regular expression that matches the part of the ID reference after the prefix in its matchPattern attribute, and its substitution as the value of the replacementPattern attribute. The first prefix definition thus defines the ud-syn prefix, so for any ID reference with this prefix, e.g. ud-syn:acl_relcl, the part after the prefix (acl_relcl) should be matched against (.+) and the result being the matched part (here the entire relation acl_relcl) substituted by #$1, i.e. by the hash character followed by the original value, so that ud-syn:acl_relcl gives #acl_relcl. This substitution is of course trivial, and hardly necessary, but was implemented so that all fixed-vocabulary linguistic analyses have the same treatment.

More to the point is the second example, where very short ID references, such as mte:Vmpr1p are transformed to https://nl.ijs.si/ME/V6/msd/tables/msd-fslib2-sl.xml#Vmpr1p, as already explained in the Section on Word-level annotation.

Finally, each prefix definition also contains a possibly bi-lingual paragraph explaining the definition.

8. Translations of corpora

The ParlaMint machine translated corpora are encoded simliarly to corpora in their source language, i.e. they have an identically structured corpus root and components. The most obvious differences are the following:

The structure of the <text>, inluding the transcriber comments and the linguistic analysis is encoded the same as as for the corpora in the source language. The two differences are that the aligned elements are linked to their corresponding elements in the source corpus, and, to simplify processing, the transcriber comments are moved outside sentences in the (rare) cases where they appeared inside them in the original language corpus. In the current machine translated corpora the alignment are given to utterances, segments, sentences, and transcriber comments, which have, furthermore, always a 1-1 mapping to the corresponding source element. Therefore the alignment is trivial, simply specifiying the same xml:id value of the element the source corpus, as illustrated in the following example:
<div type="debateSectionxml:lang="en">  <note type="speaker"   xml:id="ParlaMint-LV_2019-01-31-PT13-516.ana.note1"   corresp="mt-src:ParlaMint-LV_2019-01-31-PT13-516.ana.note1">Head of the sitting.</note>  <u who="#ĀboltiņaSolvita"   xml:id="ParlaMint-LV_2014-11.u1ana="#chairxml:lang="en"   corresp="mt-src:ParlaMint-LV_2014-11.u1">   <seg xml:id="ParlaMint-LV_2014-11-04.seg1"    xml:lang="en"    corresp="mt-src:ParlaMint-LV_2014-11-04.seg1">    <s xml:id="ParlaMint-LV_2014-11-04.s1"     corresp="mt-src:ParlaMint-LV_2014-11-04.s1">     <w xml:id="ParlaMint-LV_2014-11-04.s1.t1"      msd="UPosTag=PROPN|Number=Singlemma="Mr.">Mr.</w>     <w xml:id="ParlaMint-LV_2014-11-04.s1.t2"      msd="UPosTag=PROPN|Number=Singlemma="Presidentjoin="right">President</w>        ...    </s>      ...   </seg>    ...  </u> ... </div>
As can be seen above, the alignment is specified on the corresp attibute. The alignment reference makes use of the TEI extended pointer syntax (cf. also the Section on Prefix definitions), to define the mt-src prefix which must resolve to the correct component file of the corpus in the source language.
In contrast to other linguistic annotations that specify one prefix definition for the complete corpus (so, inside the corpus root <teiHeader>), the translated corpora should specify a prefix definition inside each corpora component because its definition depends on the component file. For example, the prefix definition of the corpus component file ParlaMint-LV-en_2014-11-04.ana.xml should be as in the following example:
<listPrefixDef>  <prefixDef ident="mt-src"   matchPattern="(.+)"   replacementPattern="../../ParlaMint-LV.TEI.ana/2014/ParlaMint-LV_2014-11-04.ana.xml#$1">   <p>Private URIs with this prefix point to aligned source elements of the MTed corpus.</p>  </prefixDef> </listPrefixDef>
The assumption above is that the two corpora are available in the same directory (so, ./ParlaMint-LV.TEI.ana/ and ./ParlaMint-LV-en.TEI.ana/), so that corresp values of the file ParlaMint-LV-en.TEI.ana/2014/ParlaMint-LV-en_2014-11-04.ana.xml with the mt-src prefix will point to ParlaMint-LV-en.TEI.ana/2014/../../ParlaMint-LV.TEI.ana/2014/ParlaMint-LV_2014.ana.xml i.e. to ./ParlaMint-LV.TEI.ana/2014/ParlaMint-LV_2014-11-04.xml.
The final additional element of the transated corpora is the information on the application, <application>, (cf. also the Section on Application information for linguistic processing) i.e. on the program that was used to translate the corpora, as illustrated by the following example:
<application ident="EasyNMTversion="2.0">  <label>EasyNMT (OPUS-MT model)</label>  <desc>Translation to English done with EasyNMT    (<ref target="https://github.com/UKPLab/EasyNMT">https://github.com/UKPLab/EasyNMT</ref>)    with OPUS-MT model bat    (<ref target="https://github.com/Helsinki-NLP/Opus-MT">https://github.com/Helsinki-NLP/Opus-MT</ref>)</desc> </application>
This element should be given in the corpus root, together with all the other information on applications inside the application information (<appInfo>) element.

9. Validation and conversion

The chapter explains how to validate and finalise a ParlaMint corpus, and introduces scripts for converting a ParlaMint corpus to other, derived formats.

9.1. Validating ParlaMint corpora

The XML structure of ParlaMint corpora can be validated via RelaxNG schemas, which exist in two versions, one that was produced as a customisation of the TEI Guidelines, and a set of schemas that were made from scratch for ParlaMint.

The TEI customisation is written as a TEI ODD document, which is, in fact, the XML version of this document, and is available in the TEI/ directory of the ParlaMint GitHub repository. The XML contains not only the prose guidelines, but also the formal specification of the TEI schema, which is given in the Appendix A. In the XML it contains the formal schema specification, while in the on-line version this is converted to a reference to all the elements, attributes and classes used in ParlaMint corpora. The ODD document is not immediately useful for XML validation, but has to be converted with TEI XSLT stylesheets first in order to obtain a RelaxNG schema, and this schema is also available in the same directory under the name of ParlaMint.rng (in RelaxNG XML syntax) and ParlaMint.rnc (in RelaxNG compact syntax). This schema should be used to check that ParlaMint component files validate against TEI.

However, it is difficult to constrain a TEI ODD-derived XML schema to allow only the kinds of nestings and attributes that should appear in a ParlaMint corpus, so this schema allows (and lists Appendix A) nesting of elements, as well as attributes that are in fact forbidden in ParlaMint corpora.

For this reason, we have also developed a set of RelaxNG schemas from scratch, which do allow only those elements, attributes and content models that are in fact valid for a ParlaMint corpus. There are all together four such schemas, one for a "plain-text" corpus root, one for its corpus components, one for the linguistically annotated corpus root, and one for its components. These schemas can be found in the Schema/ directory of the ParlaMint GitHub repository, with the README file giving instructions on how to use them.

Validating with XML schemas checks the formal structure of XML files but is less successful in validating other aspects of conformance, such as the textual content or linking of pointer attributes. For this reason, we have also developed an XSLT script that assumes a schema-validated ParlaMint file on its input, and checks various other aspects of conformance. These validation scripts can be found in the Scripts/ directory of the ParlaMint GitHub repository, with the README file listing them.

It should be noted that it is not necessary to run the validation scripts directly, as the validation can be performed by the main Makefile of the project. The Makefile is self-documenting, i.e. to see how to use it, please run make help in the top level directory of the ParlaMint project.

While each contributor of a corpus should validate their files with the ParlaMint schemas and validation script, there also exist further stages of validation, which are also applied to ParlaMint corpora:

  • The corpora are converted to derived formats, in particular, the linguistically annotated version of the corpus to CoNLL-U and to the so called vertical format for CQP-type concordancers. The Universal Dependencies project provides a program for validating the formatting and linguistic analyses in CoNLL-U files, and this validation is used on the CoNLL-U files derived from their XML source, up to level 2 conformance. The vertical files, on the other hand, are first compiled with manatee (the back end of (no)Sketch Engine) and this compilation can also expose various errors.
  • The last stage in validation is ‘human validation’ where e.g. simply looking at various produced metadata files or at the concordances of a corpus exposes errors.

9.2. Finalisation of corpora

While the vast majority of converting source encodings into the ParlaMint corpus format is left to the compilers of a corpus, there are a few metadata elements that can be produced by a common script on the basis of nearly finished corpora, which then results in the final version of the corpus for a particular release. This includes setting the date, edition and handle under which the corpus will be distributed, and also calculating the size of the corpus (cf. the Sections on Extents and on Tags declaration). The script for finalisation can be found in the Scripts/ directory of the ParlaMint GitHub repository and the README file briefly explains its function; more comments can be found in the script itself.

9.3. Conversions

A TEI encoded document is, in general, not meant to be used directly by software programs, rather, it serves as an interchange and storage format. The ParlaMint project has produced various scripts to down-convert the XML encoded corpora to other formats and they can be found in the Scripts/ directory of the ParlaMint GitHub repository, with the README file listing them and explaining their function. In short, the scripts convert the ParlaMint XML to plain text, to CoNLL-U, and to vertical format. There is also a script that takes a ParlaMint corpus and makes from it a sample for inclusion to the ParlaMint GitHub repository.

10. Contributing to ParlaMint

The ParlaMint GitHub repository contains these guidelines, the ParlaMint XML schemas, the scripts used to validate, finalise and convert the ParlaMint TEI XML corpora to derived formats, and samples of the ParlaMint corpora. There are four main branches in the repository:

The validation procedure for corpora is explained in the Section on Validating ParlaMint corpora, while the technical aspects of contributing corpora is further explained in the CONTRIBUTING file of the repository.

11. Acknowledgements

The work on these recommendations was funded by the CLARIN Research Infrastructure for Language Resources and Tools.

Appendix A Formal specification

Appendix A.1 Elements

Appendix A.1.1 <TEI>

<TEI> (TEI document) contains a single TEI-conformant document, combining a single TEI header with one or more members of the model.resource class. Multiple <TEI> elements may be combined within a <TEI> (or <teiCorpus>) element. [4. Default Text Structure 15.1. Varieties of Composite Text]
Moduletextstructure — Formal specification
Attributesatt.global.linking (synch, next, prev, @corresp)
xml:id
StatusRequired
DatatypeID
xml:lang
StatusRequired
Datatypeteidata.language
ana
StatusRequired
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Contained by
core: teiCorpus
May contain
header: teiHeader
textstructure: text
Note

This element is required. It is customary to specify the TEI namespace http://www.tei-c.org/ns/1.0 on it, for example: <TEI version="4.4.0" xml:lang="it" xmlns="http://www.tei-c.org/ns/1.0">.

ExampleExample of ParlaMint corpus component:
<TEI xml:id="ParlaMint-GB_2015-01-06-commons"  xml:lang="enana="#parla.sitting #reference" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader>...</teiHeader>  <text ana="#reference">   <body>...</body>  </text> </TEI>
Schematron
<sch:ns prefix="tei"  uri="http://www.tei-c.org/ns/1.0"/> <sch:ns prefix="xs"  uri="http://www.w3.org/2001/XMLSchema"/>
Schematron
<sch:ns prefix="rng"  uri="http://relaxng.org/ns/structure/1.0"/>
Content model
<content>
 <elementRef key="teiHeader"/>
 <elementRef key="text"/>
</content>
    
Schema Declaration
element TEI
{
   tei_att.global.linking.attribute.corresp,
   attribute xml:id { text },
   attribute xml:lang { text },
   attribute ana { list { + } },
   tei_teiHeader,
   tei_text
}

Appendix A.1.2 <addName>

<addName> (additional name) contains an additional name component, such as a nickname, epithet, or alias, or any other descriptive phrase used within a personal name. [13.2.1. Personal Names]
Modulenamesdates — Formal specification
Member of
Contained by
namesdates: persName
May containCharacter data only
Example
<persName>  <surname>Möderndorfer</surname>  <forename>Jani</forename>  <addName>Janko</addName> </persName>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element addName { text }

Appendix A.1.3 <affiliation>

<affiliation> (affiliation) contains an informal description of a person's present or past affiliation with some organisation, for example a political party or ministry. [15.2.2. The Participant Description]
Modulenamesdates — Formal specification
Attributesatt.global.analytic (@ana) att.datable.w3c (notBefore, notAfter, @when, @from, @to) att.canonical (key, @ref)
role
StatusRequired
Legal values are:
academician
alternateOfDelegation
associateMember
candidateChairman
constitutionalJudge
deputyHead
deputyMinister
head
member
minister
ministerDelegate
nonAttachedMember
observer
ombudsman
prosecutorGeneral
publicDefenderOfRights
replacement
representative
secretary
secretaryGeneral
secretaryOfState
verifier
vicePublicDefenderOfRights
Member of
Contained by
namesdates: person
May contain
namesdates: orgName roleName
Note

If included, the name of an organization may be tagged using either the <name> element as above, or the more specific <orgName> element.

Example
<person xml:id="AdamKalous.1979">  <persName>   <surname>Kalous</surname>   <forename>Adam</forename>  </persName>  <sex value="M"/>  <birth when="1979-10-06"/>  <idno type="URI">https://www.psp.cz/sqw/detail.sqw?id=6497</idno>  <affiliation ref="#subcommittee.PEFPS.1414"   role="headfrom="2018-03-14T00:00:00"   to="2021-10-21T00:00:00">   <roleName xml:lang="en">Chair Person</roleName>  </affiliation>  <affiliation ref="#subcommittee.PEFPS.1414"   role="memberfrom="2018-03-14T00:00:00"   to="2021-10-21T00:00:00">   <roleName xml:lang="en">Member</roleName>  </affiliation>  <affiliation ref="#committee.VSR.1315"   role="deputyHeadfrom="2017-12-06T16:00:00"   to="2021-10-21T00:00:00">   <roleName xml:lang="en">Vice Chairman</roleName>  </affiliation>  <affiliation ref="#committee.VSR.1315"   role="memberfrom="2017-11-28T16:00:00"   to="2021-10-21T00:00:00">   <roleName xml:lang="en">Member</roleName>  </affiliation>  <affiliation ref="#parliamentaryGroup.ANO.1292"   role="memberfrom="2017-10-24T00:00:00"   to="2021-10-21T00:00:00">   <roleName xml:lang="en">Member</roleName>  </affiliation>  <affiliation ref="#politicalParty.ANO2011.1104"   role="representativefrom="2017-10-21to="2021-10-21">   <roleName xml:lang="en">Candidate MP</roleName>  </affiliation>  <affiliation ref="#parliament"   ana="#parliament.PSP8role="memberfrom="2017-10-21T14:00:00"   to="2021-10-21T00:00:00">   <roleName xml:lang="en">MP</roleName>  </affiliation> </person>
Example
<p>The affiliation element can also include an <att>ana</att> attribute, which points to the appropriate legislative period when the person was affiliated with the specified organisation:</p> <person xml:id="BahŽibertAnja">  <persName>   <surname>Bah</surname>   <surname>Žibert</surname>   <forename>Anja</forename>  </persName>  <sex value="F"/>  <affiliation role="memberref="#DZ"   from="2014-08-01to="2018-06-21ana="#DZ.7">   <roleName xml:lang="en">MP</roleName>  </affiliation>  <affiliation role="member"   ref="#party.SDS.2from="2014-08-01to="2018-06-21"   ana="#DZ.7">   <roleName xml:lang="en">Member</roleName>  </affiliation>  <affiliation role="memberref="#DZ"   from="2018-06-22ana="#DZ.8">   <roleName xml:lang="en">MP</roleName>  </affiliation> </person>
Content model
<content>
 <elementRef key="roleName" minOccurs="0"
  maxOccurs="unbounded"/>
 <elementRef key="orgName" minOccurs="0"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element affiliation
{
   tei_att.global.analytic.attribute.ana,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   tei_att.canonical.attribute.ref,
   attribute role
   {
      "academician"
    | "alternateOfDelegation"
    | "associateMember"
    | "candidateChairman"
    | "constitutionalJudge"
    | "deputyHead"
    | "deputyMinister"
    | "head"
    | "member"
    | "minister"
    | "ministerDelegate"
    | "nonAttachedMember"
    | "observer"
    | "ombudsman"
    | "prosecutorGeneral"
    | "publicDefenderOfRights"
    | "replacement"
    | "representative"
    | "secretary"
    | "secretaryGeneral"
    | "secretaryOfState"
    | "verifier"
    | "vicePublicDefenderOfRights"
   },
   tei_roleName*,
   tei_orgName*
}

Appendix A.1.4 <appInfo>

<appInfo> (application information) records information about an application which has edited the TEI file. [2.3.11. The Application Information Element]
Moduleheader — Formal specification
Contained by
header: encodingDesc
May contain
header: application
Example
<appInfo>  <application version="4.0"   ident="stanford-corenlp">   <label>Stanford CoreNLP</label>   <desc>Tokenisation, POS tagging, NER and dependency parsed using Stanford CoreNLP <ref target="https://stanfordnlp.github.io/CoreNLP/">https://stanfordnlp.github.io/CoreNLP/</ref>.</desc>  </application> </appInfo>
Example
<appInfo>  <application version="1.0"   ident="reldi-tokeniser">   <label>ReLDI tokeniser</label>  </application>  <application version="1.0"   ident="classla-stanfordnlp">   <label>CLASSLA-StanfordNLP</label>  </application>  <application version="1.0"   ident="janes-ner">   <label>NER system for South Slavic languages</label>  </application> </appInfo>
Content model
<content>
 <elementRef key="application"
  minOccurs="1" maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element appInfo { tei_application+ }

Appendix A.1.5 <application>

<application> provides information about an application which has acted upon the document. [2.3.11. The Application Information Element]
Moduleheader — Formal specification
Attributes
identsupplies an identifier for the application, independent of its version number or display name.
StatusRequired
Datatypeteidata.name
versionsupplies a version number for the application, independent of its identifier or display name.
StatusRequired
Datatypeteidata.versionNumber
Contained by
header: appInfo
May contain
core: desc label
Example
<appInfo>  <application version="1"   ident="app-stanza">   <label>Stanza</label>   <desc xml:lang="en">    <ref target="https://stanfordnlp.github.io/stanza/index.html">Stanza</ref>: a jointly trained neural tagger, lemmatizer and dependency parser. Pretrained model based on the italian-isdt-ud-2.5 treebank</desc>  </application>  <application version="1ident="app-t2k">   <label>T2K</label>   <desc xml:lang="en">    <ref target="http://www.italianlp.it/demo/t2k-text-to-knowledge/">T2K</ref>: contains a named entity recognition module for Italian.</desc>  </application>  <application version="1"   ident="conll-U2TEIXML">   <label>CoNLL-U 2 TEI XML</label>   <desc xml:lang="en">    <ref target="http://conllu2teixml">CoNLL-U 2 TEI XML</ref>: converter from CoNLL-U format to (ParlaClarin/ParlaMint) Tei XML Format</desc>  </application> </appInfo>
Example
<appInfo>  <application version="4.0"   ident="stanford-corenlp">   <label>Stanford CoreNLP</label>   <desc>Tokenisation, POS tagging, NER and dependency parsed using Stanford CoreNLP <ref target="https://stanfordnlp.github.io/CoreNLP/">https://stanfordnlp.github.io/CoreNLP/</ref>.</desc>  </application> </appInfo>
Content model
<content>
 <elementRef key="label"/>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element application
{
   attribute ident { text },
   attribute version { text },
   tei_label,
   tei_desc+
}

Appendix A.1.6 <availability>

<availability> (availability) supplies information about the availability of a text, for example any restrictions on its use or distribution, its copyright status, any licence applying to it, etc. [2.2.4. Publication, Distribution, Licensing, etc.]
Moduleheader — Formal specification
Attributes
status
StatusRequired
Legal values are:
free
Contained by
May contain
core: p
header: licence
Note

A consistent format should be adopted

Example
<availability status="free">  <licence>http://creativecommons.org/licenses/by/4.0/</licence>  <p xml:lang="hr">Ovaj rad je dostupan pod <ref target="http://creativecommons.org/licenses/by/4.0/">međunarodnom licencom Creative Commons Imenovanje 4.0</ref>  </p>  <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>  </p> </availability>
Content model
<content>
 <elementRef key="licence"/>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element availability { attribute status { "free" }, tei_licence, tei_p+ }

Appendix A.1.7 <bibl>

<bibl> (bibliographic citation) contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged. [3.12.1. Methods of Encoding Bibliographic References and Lists of References 2.2.7. The Source Description 15.3.2. Declarable Elements]
Modulecore — Formal specification
Member of
Contained by
header: sourceDesc
May contain
Note

Contains phrase-level elements, together with any combination of elements from the model.biblPart class

Example
<bibl>  <title type="main">Minutes of the National Assembly of the Republic of Bulgaria</title>  <date when="2020-03-11">2020-03-11</date> </bibl>
Example
<bibl>  <title type="mainxml:lang="en">https://www.tbmm.gov.tr/tutanak/donem24/yil2/bas/b013m.htm</title>  <edition xml:lang="en">Official session record</edition>  <publisher xml:lang="en">The Turkish Parliament</publisher>  <idno type="URI">https://www.tbmm.gov.tr/</idno>  <date when="2011-10-27">2011-10-27</date> </bibl>
Content model
<content>
 <elementRef key="title" minOccurs="1"
  maxOccurs="unbounded"/>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="edition" minOccurs="0"
   maxOccurs="1"/>
  <elementRef key="publisher" minOccurs="0"
   maxOccurs="1"/>
  <elementRef key="idno" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="date" minOccurs="1"
   maxOccurs="1"/>
 </alternate>
</content>
    
Schema Declaration
element bibl
{
   tei_title+,
   ( tei_edition? | tei_publisher? | tei_idno* | tei_date )+
}

Appendix A.1.8 <birth>

<birth> (birth) contains information about a person's birth, obligatorily its date and optionaly the place. Note that there can be several placeNames, all referring to the same place, but written in different languages or scripts. [15.2.2. The Participant Description]
Modulenamesdates — Formal specification
Attributes
whensupplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.
Derived fromatt.datable.w3c
StatusRequired
Datatypeteidata.temporal.w3c
Contained by
namesdates: person
May contain
namesdates: placeName
Example
<person xml:id="ReinerŽeljkon="1291"> ... <birth when="1953-05-28"/> </person>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="placeName" minOccurs="0"/>
 </alternate>
</content>
    
Schema Declaration
element birth { attribute when { text }, ( tei_placeName? ) }

Appendix A.1.9 <body>

<body> (text body) contains the whole body of a single unitary text, excluding any front or back matter. [4. Default Text Structure]
Moduletextstructure — Formal specification
Contained by
textstructure: text
May contain
textstructure: div
Example
<body>  <div type="debateSection">...</div>  <div type="debateSection">...</div> ... </body>
Content model
<content>
 <elementRef key="div" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element body { tei_div+ }

Appendix A.1.10 <catDesc>

<catDesc> (category description) describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal <textDesc>. [2.3.7. The Classification Declaration]
Moduleheader — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Contained by
header: category
May contain
core: ref term
character data
Example
<category xml:id="parla.organisation">  <catDesc xml:lang="en">   <term>Organisation</term>  </catDesc>  <catDesc xml:lang="bg">   <term>Организация</term>  </catDesc> </category>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="term"/>
  <alternate minOccurs="1"
   maxOccurs="unbounded">
   <textNode/>
   <elementRef key="ref"/>
  </alternate>
 </sequence>
</content>
    
Schema Declaration
element catDesc
{
   tei_att.global.attribute.xmllang,
   ( tei_term, ( text | tei_ref )+ )
}

Appendix A.1.11 <catRef>

<catRef> (category reference) specifies one or more defined categories within some taxonomy or text typology. [2.4.3. The Text Classification]
Moduleheader — Formal specification
Attributes
targetspecifies the destination of the reference by supplying one or more URI References
Derived fromatt.pointing
StatusRequired
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
schemeidentifies the classification scheme within which the set of categories concerned is defined, for example by a <taxonomy> element, or by some other resource.
StatusRequired
Datatypeteidata.pointer
Contained by
header: textClass
May containEmpty element
Note

The scheme attribute needs to be supplied only if more than one taxonomy has been declared.

Example
<textClass>  <catRef scheme="#parla.legislature"   target="#parla.uni"/> </textClass> ... elsewhere ... <taxonomy xml:id="parla.legislature"> ... <category xml:id="parla.uni">   <catDesc xml:lang="lt">    <term>Vienų rūmų parlamentas</term>   </catDesc>   <catDesc xml:lang="en">    <term>Unicameralism</term>   </catDesc>  </category> </taxonomy>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element catRef
{
   attribute target { list { + } },
   attribute scheme { text },
   empty
}

Appendix A.1.12 <category>

<category> (category) contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy. [2.3.7. The Classification Declaration]
Moduleheader — Formal specification
Attributesatt.global (xml:id, xml:lang, xml:base, xml:space, @n)
xml:id(identifier) provides a unique identifier for the element bearing the attribute.
Derived fromatt.global
StatusRequired
DatatypeID
ana(analysis) indicates one or more elements containing interpretations of the element on which the ana attribute appears.
Derived fromatt.global.analytic
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Contained by
May contain
Example
<category xml:id="parla.session">  <catDesc xml:lang="en">   <term>Session</term>: A parliamentary year, which always begins on the first Tuesday in October at 12.00 o’clock noon and ends on the same date at the same time the following year. However, parliamentary work at Christiansborg is organised in such a way that it primarily takes place from October to June.</catDesc> </category>
Example
<category xml:id="parla.term">  <catDesc xml:lang="nl">   <term>Zittingsperiode</term>  </catDesc>  <catDesc xml:lang="en">   <term>Legislative period</term>  </catDesc> </category>
Content model
<content>
 <elementRef key="catDesc" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="category" minOccurs="0"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element category
{
   tei_att.global.attribute.n,
   attribute xml:id { text },
   attribute ana { list { + } }?,
   tei_catDesc+,
   tei_category*
}

Appendix A.1.13 <change>

<change> (change) documents a change or set of changes made during the production of a source document, or during the revision of an electronic file. [2.6. The Revision Description 2.4.1. Creation 11.7. Identifying Changes and Revisions]
Moduleheader — Formal specification
Attributesatt.datable.w3c (notBefore, notAfter, from, to, @when)
Contained by
header: revisionDesc
May contain
core: name
character data
Note

The who attribute may be used to point to any other element, but will typically specify a <respStmt> or <person> element elsewhere in the header, identifying the person responsible for the change and their role in making it.

It is recommended that changes be recorded with the most recent first. The status attribute may be used to indicate the status of a document following the change documented.

Example
<revisionDesc>  <change when="2021-01-28">   <name>Tommaso Agnoloni</name>: Generated corpus in ParlaMint.</change>  <change when="2021-02-26">   <name>Tommaso Agnoloni</name>, <name>Francesca Frontini</name>: Corpus revision, fixing</change> </revisionDesc>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="name"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element change { tei_att.datable.w3c.attribute.when, ( tei_name | text )+ }

Appendix A.1.14 <classDecl>

<classDecl> (classification declarations) contains taxonomies defining classificatory codes used elsewhere in the text. Note that the taxonomies are in ParlaMint typically stored in separate files. [2.3.7. The Classification Declaration 2.3. The Encoding Description]
Moduleheader — Formal specification
Contained by
header: encodingDesc
May contain
derived-module-parlamint: include
header: taxonomy
Example
<classDecl> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-taxonomy-parla.legislature.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-taxonomy.xml-speaker_types"/> ... </classDecl>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="taxonomy"/>
  <elementRef key="include"/>
 </alternate>
</content>
    
Schema Declaration
element classDecl { ( tei_taxonomy | tei_include )+ }

Appendix A.1.15 <correction>

<correction> (correction principles) states how and under what circumstances corrections have been made in the text. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements]
Moduleheader — Formal specification
Contained by
May contain
core: p
Note

May be used to note the results of proof reading the text against its original, indicating (for example) whether discrepancies have been silently rectified, or recorded using the editorial tags described in section 3.5. Simple Editorial Changes.

Example
<editorialDecl>  <correction>   <p>No correction of source texts was performed.</p>  </correction> </editorialDecl>
Content model
<content>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element correction { tei_p+ }

Appendix A.1.16 <date>

<date> (date) contains a date in any format. [3.6.4. Dates and Times 2.2.4. Publication, Distribution, Licensing, etc. 2.6. The Revision Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 15.2.3. The Setting Description 13.4. Dates]
Modulecore — Formal specification
Attributesatt.typed (@type, @subtype) att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.datable.w3c (notBefore, notAfter, @when, @from, @to)
Member of
Contained by
analysis: s
corpus: setting
May contain
analysis: pc w
core: date
character data
ExampleThe element <date> gives the date in the when attribute in the ISO 8601 format, while the textual content is not constrained:
<date when="2021-06-08">2021-06-08</date>
ExampleThe textual content can be given according to the conventions used in the local language:
<date when="2018-04-13xml:lang="sl">13.4.2018</date>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <elementRef key="date"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element date
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   tei_att.typed.attributes,
   ( tei_w | tei_pc | tei_date | text )+
}

Appendix A.1.17 <death>

<death> (death) contains information about a person's death, obligatorily its date and optionaly the place. Note that there can be several placeNames, all referring to the same place, but written in different languages or scripts. [15.2.2. The Participant Description]
Modulenamesdates — Formal specification
Attributes
whensupplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.
Derived fromatt.datable.w3c
StatusRequired
Datatypeteidata.temporal.w3c
Contained by
namesdates: person
May contain
namesdates: placeName
Example
<death when="2020-12-29"/>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="placeName" minOccurs="0"/>
 </alternate>
</content>
    
Schema Declaration
element death { attribute when { text }, ( tei_placeName? ) }

Appendix A.1.18 <desc>

<desc> (description) contains a short description of the purpose, function, or use of its parent element, or when the parent is a documentation element, describes or defines the object being documented. [22.4.1. Description of Components]
Modulecore — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Member of
Contained by
core: gap
namesdates: org
May contain
core: ref term
character data
Note

When used in a specification element such as <elementSpec>, TEI convention requires that this be expressed as a finite clause, begining with an active verb.

Example
<p>Example of <gi>desc</gi> elements for transcriber comments:</p> <gap reason="inaudible">  <desc>speaker spoke too quietly, not understood</desc> </gap> <kinesic type="applause">  <desc xml:lang="sl">ploskanje</desc> </kinesic> <vocal type="interruption">  <desc>sounds from the chamber</desc> </vocal> ... <kinesic type="signal">  <desc>signal for end of debate</desc> </kinesic> ... <incident type="action">  <desc>minute of silence</desc> </incident>
ExampleExample of <desc> elements used as a part of taxonomy:
<taxonomy xml:id="parla.legislature">  <desc xml:lang="sl">   <term>Zakonodajna oblast</term>  </desc>  <desc>   <term>Legislature</term>  </desc> ... </taxonomy>
ExampleElement <desc> can also be used to describe tool(s) used to linguistically annotate the corpus:
<application version="1.0"  ident="reldi-tokeniser">  <label>ReLDI tokeniser</label>  <desc xml:lang="en">Tokenisation and sentence segmentation with ReLDI tokeniser, available from <ref target="https://github.com/clarinsi/reldi-tokeniser">https://github.com/clarinsi/reldi-tokeniser</ref>.</desc> </application>
SchematronA <desc> with a type of deprecationInfo should only occur when its parent element is being deprecated. Furthermore, it should always occur in an element that is being deprecated when <desc> is a valid child of that element.
<sch:rule context="tei:desc[ @type eq 'deprecationInfo']"> <sch:assert test="../@validUntil">Information about a deprecation should only be present in a specification element that is being deprecated: that is, only an element that has a @validUntil attribute should have a child <desc type="deprecationInfo">.</sch:assert> </sch:rule>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef minOccurs="0" key="term"/>
  <alternate minOccurs="1"
   maxOccurs="unbounded">
   <textNode/>
   <elementRef key="ref"/>
  </alternate>
 </sequence>
</content>
    
Schema Declaration
element desc
{
   tei_att.global.attribute.xmllang,
   ( tei_term?, ( text | tei_ref )+ )
}

Appendix A.1.19 <div>

<div> (text division) contains division of the body a corpus component. [4.1. Divisions of the Body]
Moduletextstructure — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.typed (type, @subtype)
type
StatusRequired
Legal values are:
debateSection
General purpose text division for all parts of parliamentary proceedings. It should include at least one utterance. If needed, the @subtype attribute can be used for additional content classification.
commentSection
A special purpose text division used as a container for transcriber comments. Should not contain any utterances. If needed, the @subtype attribute can be used for additional content classification.
Contained by
textstructure: body
May contain
Example
<div type="debateSection">  <head>Devolution of Power (Cities)</head>  <u xml:id="ParlaMint-GB_2015-01-06-commons.u1">...</u>  <u xml:id="ParlaMint-GB_2015-01-06-commons.u2">...</u> ... <note>House adjourned.</note> </div>
Schematron
<sch:report test="(ancestor::tei:l or ancestor::tei:lg) and not(ancestor::tei:floatingText)"> Abstract model violation: Lines may not contain higher-level structural elements such as div, unless div is a descendant of floatingText. </sch:report>
Schematron
<sch:report test="(ancestor::tei:p or ancestor::tei:ab) and not(ancestor::tei:floatingText)"> Abstract model violation: p and ab may not contain higher-level structural elements such as div, unless div is a descendant of floatingText. </sch:report>
Content model
<content>
 <elementRef key="head" minOccurs="0"
  maxOccurs="unbounded"/>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="note"/>
  <elementRef key="vocal"/>
  <elementRef key="kinesic"/>
  <elementRef key="incident"/>
  <elementRef key="gap"/>
  <elementRef key="pb"/>
  <elementRef key="u"/>
 </alternate>
</content>
    
Schema Declaration
element div
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   tei_att.typed.attribute.subtype,
   attribute type { "debateSection" | "commentSection" },
   tei_head*,
   (
      tei_note
    | tei_vocal
    | tei_kinesic
    | tei_incident
    | tei_gap
    | tei_pb
    | tei_u
   )+
}

Appendix A.1.20 <edition>

<edition> (edition) describes the particularities of one edition of a text. [2.2.2. The Edition Statement]
Moduleheader — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Contained by
core: bibl
header: editionStmt
May containCharacter data only
Example
<edition>2.1</edition>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element edition { tei_att.global.attribute.xmllang, text }

Appendix A.1.21 <editionStmt>

<editionStmt> (edition statement) groups information relating to one edition of a text. [2.2.2. The Edition Statement 2.2. The File Description]
Moduleheader — Formal specification
Contained by
header: fileDesc
May contain
header: edition
Example
<editionStmt>  <edition>2.1</edition> </editionStmt>
Content model
<content>
 <elementRef key="edition" minOccurs="1"
  maxOccurs="1"/>
</content>
    
Schema Declaration
element editionStmt { tei_edition }

Appendix A.1.22 <editorialDecl>

<editorialDecl> (editorial practice declaration) provides details of editorial principles and practices applied during the encoding of a text. [2.3.3. The Editorial Practices Declaration 2.3. The Encoding Description 15.3.2. Declarable Elements]
Moduleheader — Formal specification
Contained by
header: encodingDesc
May contain
Example
<editorialDecl>  <correction>   <p>No correction of source texts was performed.</p>  </correction>  <normalization>   <p>Text has not been normalised, except for spacing.</p>  </normalization>  <hyphenation>   <p>Hyphenation has not been altered with respect to the source files.</p>  </hyphenation>  <quotation>   <p>Quotation marks have been left in the text and are not explicitly marked up.</p>  </quotation>  <segmentation>   <p>The texts are segmented into utterances (contributions) and segments (corresponding to paragraphs in the source transcription).</p>  </segmentation> </editorialDecl>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="correction"/>
  <elementRef key="normalization"/>
  <elementRef key="hyphenation"/>
  <elementRef key="quotation"/>
  <elementRef key="segmentation"/>
 </alternate>
</content>
    
Schema Declaration
element editorialDecl
{
   (
      tei_correction
    | tei_normalization
    | tei_hyphenation
    | tei_quotation
    | tei_segmentation
   )+
}

Appendix A.1.23 <education>

<education> (education) contains a description of the educational experience of a person. [15.2.2. The Participant Description]
Modulenamesdates — Formal specification
Attributesatt.global (xml:id, xml:base, xml:space, @n, @xml:lang) att.datable.w3c (notBefore, notAfter, @when, @from, @to)
Contained by
namesdates: person
May containCharacter data only
Example
<education>Bachelor of Science, Electrical and Information Technology Engineer</education>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element education
{
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   text
}

Appendix A.1.24 <email>

<email> (electronic mail address) contains an email address identifying a location to which email messages can be delivered. [3.6.2. Addresses]
Modulecore — Formal specification
Attributesatt.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana)
Member of
Contained by
core: unit
May contain
analysis: pc w
character data
Note

The format of a modern Internet email address is defined in RFC 2822

ExampleThe element can be used for fine-grained Named Entities which include e-mail addresses:
<email ana="ne:me"  xml:id="ParlaMint-CZ_2014-12-09-ps2013-023-05-003-133.ne87">  <w xml:id="ParlaMint-CZ_2014-12-09-ps2013-023-05-003-133.u4.p9.s3.w13"   lemma="namraza@cd.cz"   msd="UPosTag=NOUN|Case=Gen|Gender=Fem|Number=Plur|Polarity=Pos">namraza@cd.cz</w> </email>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element email
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   ( tei_w | tei_pc | text )+
}

Appendix A.1.25 <encodingDesc>

<encodingDesc> (encoding description) documents the relationship between an electronic text and the source or sources from which it was derived. [2.3. The Encoding Description 2.1.1. The TEI Header and Its Components]
Moduleheader — Formal specification
Contained by
header: teiHeader
May contain
ExampleGeneral structure of an encoding description:
<encodingDesc>  <projectDesc>...</projectDesc>  <editorialDecl>...</editorialDecl>  <tagsDecl>...</tagsDecl>  <classDecl>...</classDecl> </encodingDesc>
ExampleStructure of an encoding description for unannotated corpus root:
<encodingDesc>  <projectDesc>   <p xml:lang="sl">    <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>   </p>   <p xml:lang="en">    <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> is a project that aims to (1) create a multilingual set of comparable corpora of parliamentary proceedings uniformly encoded...</p>  </projectDesc>  <editorialDecl>   <correction>...</correction>   <normalization>...</normalization>   <hyphenation>...</hyphenation>   <quotation>...</quotation>   <segmentation>...</segmentation>  </editorialDecl>  <tagsDecl>   <namespace name="http://www.tei-c.org/ns/1.0">    <tagUsage gi="bodyoccurs="414"/>    <tagUsage gi="descoccurs="10234"/>    <tagUsage gi="divoccurs="414"/>   </namespace>  </tagsDecl>  <classDecl>...</classDecl> </encodingDesc>
ExampleExample of encoding description of an annotated corpus root. The structure includes two additional elements, <listPrefixDef> and <appInfo>.
<encodingDesc>  <projectDesc>... </projectDesc>  <editorialDecl>...</editorialDecl>  <tagsDecl>...</tagsDecl>  <classDecl>...</classDecl>  <listPrefixDef>   <prefixDef ident="mte"    matchPattern="(.+)"    replacementPattern="http://nl.ijs.si/ME/V6/msd/tables/msd-fslib-sl.xml#$1">    <p xml:lang="en">Private URIs with this prefix point to feature-structure elements defining the Slovenian MULTEXT-East Version 6 MSDs.</p>   </prefixDef>  </listPrefixDef>  <appInfo>   <application>...</application>  </appInfo> </encodingDesc>
ExampleExample of encoding description of a corpus component (annotated or unannotated). In contrast to the corpus root, the encoding description of a corpus component contains only two elements, namely, the <projectDesc> and the <tagsDecl>.
<encodingDesc>  <projectDesc>...</projectDesc>  <tagsDecl>...</tagsDecl> </encodingDesc>
Content model
<content>
 <elementRef key="projectDesc"/>
 <elementRef key="editorialDecl"
  minOccurs="0" maxOccurs="1"/>
 <elementRef key="tagsDecl"/>
 <elementRef key="classDecl" minOccurs="0"
  maxOccurs="1"/>
 <elementRef key="listPrefixDef"
  minOccurs="0" maxOccurs="1"/>
 <elementRef key="appInfo" minOccurs="0"
  maxOccurs="1"/>
</content>
    
Schema Declaration
element encodingDesc
{
   tei_projectDesc,
   tei_editorialDecl?,
   tei_tagsDecl,
   tei_classDecl?,
   tei_listPrefixDef?,
   tei_appInfo?
}

Appendix A.1.26 <equipment>

<equipment> (equipment) provides technical details of the equipment and media used for an audio or video recording used as the source for a spoken text. [8.2. Documenting the Source of Transcribed Speech 15.3.2. Declarable Elements]
Modulespoken — Formal specification
Attributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @next, @prev)) (att.global.analytic (@ana)) (att.global.responsibility (@resp)) (att.global.source (@source)) att.declarable (@default)
Contained by
May contain
core: p
Example
<equipment>  <p>"Hi-8" 8 mm NTSC camcorder with integral directional    microphone and windshield and stereo digital sound    recording channel.  </p> </equipment>
Example
<equipment>  <p>8-track analogue transfer mixed down to 19 cm/sec audio    tape for cassette mastering</p> </equipment>
Content model
<content>
 <classRef key="model.pLike" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element equipment
{
   tei_att.global.attributes,
   tei_att.declarable.attributes,
   tei_model.pLike+
}

Appendix A.1.27 <equipment>

<equipment> (equipment) provides technical details of the equipment and media used for an audio or video recording used as the source for a spoken text.
Modulespoken — Formal specification
Attributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @next, @prev)) (att.global.analytic (@ana)) (att.global.responsibility (@resp)) (att.global.source (@source)) att.declarable (@default)
Contained by
May contain
core: p
Example
<equipment>  <p>"Hi-8" 8 mm NTSC camcorder with integral directional    microphone and windshield and stereo digital sound    recording channel.  </p> </equipment>
Content model
<content>
 <classRef key="model.pLike" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element equipment
{
   tei_att.global.attributes,
   tei_att.declarable.attributes,
   tei_model.pLike+
}

Appendix A.1.28 <event>

<event> (event) contains data relating to any kind of significant event associated with a person, place, or organisation. [13.3.1. Basic Principles]
Modulenamesdates — Formal specification
Attributesatt.global (n, xml:lang, xml:base, xml:space, @xml:id) att.datable.w3c (notBefore, notAfter, @when, @from, @to)
Contained by
namesdates: listEvent org
May contain
core: label
Example
<event xml:id="PoGB.55from="2010-05-18"  to="2015-03-30">  <label>Fifty-fifth Parliament of the United Kingdom</label> </event>
Example
<org xml:id="government.HR"  role="government">  <orgName xml:lang="hrfull="yes">Vlada Republike Hrvatske</orgName>  <orgName xml:lang="enfull="yes">Government of the Republic of Croatia</orgName>  <event from="1990-05-30">   <label xml:lang="en">existence</label>  </event> </org>
Content model
<content>
 <elementRef key="label" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element event
{
   tei_att.global.attribute.xmlid,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   tei_label+
}

Appendix A.1.29 <extent>

<extent> (extent) describes the approximate size of a text stored on some carrier medium or of some other object, digital or non-digital, specified in any convenient units. [2.2.3. Type and Extent of File 2.2. The File Description 3.12.2.4. Imprint, Size of a Document, and Reprint Information 10.7.1. Object Description]
Moduleheader — Formal specification
Contained by
header: fileDesc
May contain
core: measure
Example
<extent>  <measure unit="speechesquantity="75122"   xml:lang="sl">75.122 govorov</measure>  <measure unit="speechesquantity="75122"   xml:lang="en">75,122 speeches</measure>  <measure unit="wordsquantity="20190034"   xml:lang="sl">20.190.034 besed</measure>  <measure unit="wordsquantity="20190034"   xml:lang="en">20,190,034 words</measure> </extent>
Content model
<content>
 <elementRef key="measure" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element extent { tei_measure+ }

Appendix A.1.30 <figure>

<figure> (figure) groups elements representing or containing graphic information such as an illustration, formula, or figure. [14.4. Specific Elements for Graphic Images]
Modulefigures — Formal specification
Member of
Contained by
namesdates: person
May contain
Example
<figure>  <graphic url="https://www.psp.cz/eknih/cdrom/2017ps/eknih/2017ps/poslanci/i6497.jpg"/> </figure>
Content model
<content>
 <elementRef key="head" minOccurs="0"
  maxOccurs="1"/>
 <elementRef key="graphic" minOccurs="1"
  maxOccurs="1"/>
</content>
    
Schema Declaration
element figure { tei_head?, tei_graphic }

Appendix A.1.31 <fileDesc>

<fileDesc> (file description) contains a full bibliographic description of an electronic file. [2.2. The File Description 2.1.1. The TEI Header and Its Components]
Moduleheader — Formal specification
Contained by
header: teiHeader
May contain
Note

The major source of information for those seeking to create a catalogue entry or bibliographic citation for an electronic file. As such, it provides a title and statements of responsibility together with details of the publication or distribution of the file, of any series to which it belongs, and detailed bibliographic notes for matters not addressed elsewhere in the header. It also contains a full bibliographic description for the source or sources from which the electronic text was derived.

ExampleBasic structure of the <fileDesc> element:
<fileDesc>  <titleStmt>...</titleStmt>  <editionStmt>...</editionStmt>  <extent>...</extent>  <publicationStmt>...</publicationStmt>  <sourceDesc>...</sourceDesc> </fileDesc>
ExampleExample of the <fileDesc> element in a corpus root:
<fileDesc>  <titleStmt>   <title type="mainxml:lang="en">Dutch parliamentary corpus ParlaMint-NL [ParlaMint]</title>   <title type="mainxml:lang="nl">Corpus van het Nederlandse Parlement ParlaMint-NL [ParlaMint]</title>   <title type="subxml:lang="en">Minutes of the Eerste Kamer and Tweede Kamer of The Netherlands (2015-2020)</title>   <title type="subxml:lang="nl">Minuten van de Eerste en Tweede Kamer van Nederland (2015-2020)</title>   <meeting n="28-lower"    ana="#parla.lower #parla.term">28ste Tweede Kamer</meeting>   <meeting n="29-lower"    ana="#parla.lower #parla.term">29ste Tweede Kamer</meeting>   <meeting n="34-upper"    ana="#parla.upper #parla.term">34ste Eerste Kamer</meeting>   <meeting n="35-upper"    ana="#parla.upper #parla.term">35ste Eerste Kamer</meeting>   <meeting n="36-upper"    ana="#parla.upper #parla.term">36ste Eerste Kamer</meeting>   <respStmt>    <persName xml:id="RubenvanHeusden"     xml:lang="nl">Ruben van Heusden</persName>    <resp xml:lang="en">Downloading and converting the corpus to TEI format</resp>   </respStmt>   <funder>    <orgName xml:lang="en">The CLARIN research infrastructure</orgName>   </funder>  </titleStmt>  <editionStmt>   <edition>2.1</edition>  </editionStmt>  <extent>   <measure unit="speechesxml:lang="nl"    quantity="474964">474,964 toespraken</measure>   <measure unit="speechesxml:lang="en"    quantity="474964">474,964 speeches</measure>   <measure unit="wordsxml:lang="nl"    quantity="51451191">51,451,191 woorden</measure>   <measure unit="wordsxml:lang="en"    quantity="51451191">51,451,191 words</measure>  </extent>  <publicationStmt>   <publisher>    <orgName xml:lang="en">CLARIN research infrastructure</orgName>    <ref target="https://www.clarin.eu/">www.clarin.eu</ref>   </publisher>   <idno subtype="handletype="URI">http://hdl.handle.net/11356/1432</idno>   <availability status="free">    <licence>http://creativecommons.org/licenses/by/4.0/</licence>    <p xml:lang="en">This work is licensed under the<ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>    </p>   </availability>   <date when="2021-06-10">June 10, 2021</date>  </publicationStmt>  <sourceDesc>   <bibl>    <title type="main">Minutes of the Eerste Kamer of The Netherlands</title>    <idno type="URI">https://www.eerstekamer.nl/</idno>    <date from="2014-12-15to="2020-11-03">2014-12-15 - 2020-11-03</date>   </bibl>   <bibl>    <title type="main">Minutes of the Tweede Kamer of The Netherlands</title>    <idno type="URI">https://www.tweedekamer.nl/</idno>    <date from="2014-04-16to="2020-10-14">2014-04-16 - 2020-10-14</date>   </bibl>  </sourceDesc> </fileDesc>
ExampleExample of the <fileDesc> element in a corpus component:
<fileDesc>  <titleStmt>   <title type="mainxml:lang="en">Dutch parliamentary corpus ParlaMint-NL, Lower House 2014-04-16 [ParlaMint]</title>   <title type="mainxml:lang="nl">Corpus van het Nederlandse parlement ParlaMint-NL, Tweede Kamer 2014-04-16 [ParlaMint]</title>   <title type="subxml:lang="en">Report of the meeting of the Dutch Lower House, Meeting 76, Session 2 (2014-04-16)</title>   <title type="subxml:lang="nl">Verslag van de vergadering van de Tweede Kamer, Meeting 76, Session 2 (2014-04-16)</title>   <meeting ana="#parla.lower #parla.meeting.regular"    corresp="#TKn="76">Meeting 76</meeting>   <meeting ana="#parla.lower #parla.session"    corresp="#TKn="2">Session 2</meeting>   <meeting ana="#parla.lower #parla.term #TK.28"    corresp="#TKn="28-lower">Meeting of the 28th Tweede Kamer</meeting>   <respStmt>    <persName xml:id="RubenvanHeusden"     xml:lang="nl">Ruben van Heusden</persName>    <resp xml:lang="en">Downloading and converting the corpus to TEI format</resp>   </respStmt>   <funder>    <orgName xml:lang="en">The CLARIN research infrastructure</orgName>   </funder>  </titleStmt>  <editionStmt>   <edition>2.1</edition>  </editionStmt>  <extent>   <measure unit="speechesxml:lang="nl"    quantity="18">18 toespraken</measure>   <measure unit="speechesxml:lang="en"    quantity="18">18 speeches</measure>   <measure unit="wordsxml:lang="nl"    quantity="1094">1,094 woorden</measure>   <measure unit="wordsxml:lang="en"    quantity="1094">1,094 words</measure>  </extent>  <publicationStmt>   <publisher>    <orgName xml:lang="en">CLARIN research infrastructure</orgName>    <ref target="https://www.clarin.eu/">www.clarin.eu</ref>   </publisher>   <idno subtype="handletype="URI">http://hdl.handle.net/11356/1432</idno>   <availability status="free">    <licence>http://creativecommons.org/licenses/by/4.0/</licence>    <p xml:lang="en">This work is licensed under the<ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>   </availability>   <date when="2021-06-10">June 10, 2021</date>  </publicationStmt>  <sourceDesc>   <bibl>    <title type="main">Minutes of the Tweede Kamer of The Netherlands</title>    <idno type="URI">https://www.tweedekamer.nl/</idno>    <date when="2014-04-16">2014-04-16</date>   </bibl>  </sourceDesc> </fileDesc>
Content model
<content>
 <elementRef key="titleStmt"/>
 <elementRef key="editionStmt"/>
 <elementRef key="extent"/>
 <elementRef key="publicationStmt"/>
 <elementRef key="sourceDesc"/>
</content>
    
Schema Declaration
element fileDesc
{
   tei_titleStmt,
   tei_editionStmt,
   tei_extent,
   tei_publicationStmt,
   tei_sourceDesc
}

Appendix A.1.32 <forename>

<forename> (forename) contains a forename, given or baptismal name. [13.2.1. Personal Names]
Modulenamesdates — Formal specification
Attributesatt.global (@xml:id, @n, @xml:lang, @xml:base, @xml:space) (att.global.rendition (@rend, @style, @rendition)) (att.global.linking (@corresp, @synch, @next, @prev)) (att.global.analytic (@ana)) (att.global.responsibility (@resp)) (att.global.source (@source)) att.personal (@full) (att.naming (@role) (att.canonical (@key, @ref)) ) att.typed (@type, @subtype)
Member of
Contained by
namesdates: persName
May containCharacter data only
Example
<persName>  <surname>Bongiorno</surname>  <forename>Giulia</forename> </persName>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element forename
{
   tei_att.global.attributes,
   tei_att.personal.attributes,
   tei_att.typed.attributes,
   text
}

Appendix A.1.33 <funder>

<funder> (funding body) specifies the name of an individual, institution, or organisation responsible for the funding of a project or text. [2.2.1. The Title Statement]
Moduleheader — Formal specification
Contained by
header: titleStmt
May contain
core: ref
namesdates: orgName
Note

Funders provide financial support for a project; they are distinct from sponsors (see element <sponsor>), who provide intellectual support and authority.

Example
<funder>  <orgName xml:lang="es">CLARIN infraestructura de investigación científica</orgName>  <orgName xml:lang="en">The CLARIN research infrastructure</orgName> </funder>
Content model
<content>
 <elementRef key="orgName" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="ref" minOccurs="0"
  maxOccurs="1"/>
</content>
    
Schema Declaration
element funder { tei_orgName+, tei_ref? }

Appendix A.1.34 <gap>

<gap> (gap) indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible. [3.5.3. Additions, Deletions, and Omissions]
Modulecore — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang)
reason
StatusRecommended
Legal values are:
inaudible
editorial
foreign
Member of
Contained by
analysis: s
core: unit
linking: seg
spoken: u
textstructure: div
May contain
core: desc
Note

The <gap>, <unclear>, and <del> core tag elements may be closely allied in use with the <damage> and <supplied> elements, available when using the additional tagset for transcription of primary sources. See section 11.3.3.2. Use of the gap, del, damage, unclear, and supplied Elements in Combination for discussion of which element is appropriate for which circumstance.

The <gap> tag simply signals the editors decision to omit or inability to transcribe a span of text. Other information, such as the interpretation that text was deliberately erased or covered, should be indicated using the relevant tags, such as <del> in the case of deliberate deletion.

Example
<gap reason="inaudible">  <desc>microphone muted</desc> </gap>
Example
<gap reason="editorial">  <desc xml:lang="de">Zitierte Druckfassung entfernt</desc>  <desc xml:lang="en">Quoted printed matter omited</desc> </gap>
Example
<gap reason="foreign">  <desc xml:lang="und">Huliniahuanngittunga</desc> </gap>
Content model
<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element gap
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   attribute reason { "inaudible" | "editorial" | "foreign" }?,
   tei_desc+
}

Appendix A.1.35 <graphic>

<graphic> (graphic) indicates the location of a graphic or illustration, either forming part of a text, or providing an image of it. [3.10. Graphics and Other Non-textual Components 11.1. Digital Facsimiles]
Modulecore — Formal specification
Attributesatt.resourced (@url) att.media (width, height, @scale)
Member of
Contained by
figures: figure
May containEmpty element
Note

The mimeType attribute should be used to supply the MIME media type of the image specified by the url attribute.

Within the body of a text, a <graphic> element indicates the presence of a graphic component in the source itself. Within the context of a <facsimile> or <sourceDoc> element, however, a <graphic> element provides an additional digital representation of some part of the source being encoded.

Example
<figure>  <graphic url="https://www.dekamer.be//site/wwwroot/images/cv/06595.gif"/> </figure>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element graphic
{
   tei_att.media.attribute.scale,
   tei_att.resourced.attributes,
   empty
}

Appendix A.1.36 <head>

<head> (heading) contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. [4.2.1. Headings and Trailers]
Modulecore — Formal specification
Attributesatt.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.typed (subtype, @type)
Contained by
figures: figure
textstructure: div
May containCharacter data only
Note

The <head> element is used for headings at all levels; software which treats (e.g.) chapter headings, section headings, and list titles differently must determine the proper processing of a <head> element based on its structural position. A <head> occurring as the first element of a list is the title of that list; one occurring as the first element of a <div1> is the title of that chapter or section.

ExampleThe most common use for the <head> element is to mark the headings of sections:
<div type="debateSection">  <head>Regulation of Health and Social Care Professions Etc. Bill [HL]</head> ... </div>
ExampleThe <head> element may also be used to give the title to specialised lists:
<listEvent>  <head xml:lang="nl">Zittingsperiode</head>  <head xml:lang="en">Legislative period</head>  <event to="2007-05-02from="2003-06-05"   xml:id="period_51">   <label xml:lang="nl">Zittingsperiode 51</label>   <label xml:lang="en">Legislative period 51</label>  </event> ... </listEvent>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element head
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.typed.attribute.type,
   text
}

Appendix A.1.37 <hyphenation>

<hyphenation> (hyphenation) summarizes the way in which hyphenation in a source text has been treated in an encoded version of it. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements]
Moduleheader — Formal specification
Contained by
May contain
core: p
Example
<editorialDecl> ... <hyphenation>   <p xml:lang="en">No end-of-line hyphens were present in the source.</p>  </hyphenation> ... </editorialDecl>
Content model
<content>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element hyphenation { tei_p+ }

Appendix A.1.38 <idno>

<idno> (identifier) supplies an identifier used to identify some object, such as a person or organisation. If it is a URL, it should have @type="URI". [13.3.1. Basic Principles 2.2.4. Publication, Distribution, Licensing, etc. 2.2.5. The Series Statement 3.12.2.4. Imprint, Size of a Document, and Reprint Information]
Moduleheader — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
typecategorizes the identifier.
StatusRequired
Legal values are:
URI
Uniform Resource Identifier ParlaMint should be a resolvable URL, with the subtype classifying the type of web site.
VIAF
The URL of the Virtual Internet Authority File assigned to link different names in catalogs around the world for the same entity.
subtype
StatusOptional
Legal values are:
handle
The permanent identifier of type handle.
government
A governmental web site.
politicalParty
The web site of a political party.
parliament
A web site of the parliament.
ministry
The web site of a ministry.
personal
The personal web site of a person.
business
A web site belonging to a bussiness.
publicService
The web site of a pubic service.
wikimedia
A web site of Wikimedia, e.g. Wikipedia.
facebook
A Facebook web site.
twitter
A Twitter web site.
tiktok
A TikTok web site.
instagram
An Instagram web site.
Note

this attribute should always be used with type="URI"

Member of
Contained by
core: bibl
namesdates: org person
May containCharacter data only
Note

<idno> should be used for labels which identify an object or concept in a formal cataloguing system such as a database or an RDF store, or in a distributed system such as the World Wide Web. Some suggested values for type on <idno> are ISBN, ISSN, DOI, and URI.

Example
<publicationStmt> ... <idno type="URIsubtype="handle">http://hdl.handle.net/11356/1432</idno> ... </publicationStmt>
Example
<sourceDesc>  <bibl>   <title type="mainxml:lang="sl">Zapisi sej Državnega zbora Republike Slovenije</title>    ...  <idno type="URI">https://www.dz-rs.si</idno>    ...  </bibl> </sourceDesc>
Example
<idno type="URIsubtype="wikimedia"  xml:lang="sl">https://sl.wikipedia.org/wiki/Pozitivna_Slovenija</idno> <idno type="URIsubtype="wikimedia"  xml:lang="en">https://en.wikipedia.org/wiki/Positive_Slovenia</idno>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element idno
{
   tei_att.global.attribute.xmllang,
   attribute type { "URI" | "VIAF" },
   attribute subtype
   {
      "handle"
    | "government"
    | "politicalParty"
    | "parliament"
    | "ministry"
    | "personal"
    | "business"
    | "publicService"
    | "wikimedia"
    | "facebook"
    | "twitter"
    | "tiktok"
    | "instagram"
   }?,
   text
}

Appendix A.1.39 <incident>

<incident> (incident) marks any phenomenon or occurrence, not necessarily vocalized or communicative, for example incidental noises or other events affecting communication. [8.3.3. Vocal, Kinesic, Incident]
Modulespoken — Formal specification
Attributesatt.ascribed (@who) att.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.typed (type, @subtype)
type
StatusRecommended
Legal values are:
action
incident
leaving
entering
break
pause
sound
editorial
Member of
Contained by
analysis: s
core: unit
linking: seg
spoken: u
textstructure: div
May contain
core: desc
Example
<incident type="action">  <desc>He stands and with him the whole Assembly</desc> </incident>
Example
<incident type="sound">  <desc>The Assembly observed a minute of silence. Applause.</desc> </incident>
Example
<incident type="entering">  <desc>Arrival of the President of the Republic of Poland</desc> </incident>
Content model
<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element incident
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.typed.attribute.subtype,
   tei_att.ascribed.attributes,
   attribute type
   {
      "action"
    | "incident"
    | "leaving"
    | "entering"
    | "break"
    | "pause"
    | "sound"
    | "editorial"
   }?,
   tei_desc+
}

Appendix A.1.40 <include>

<include> is an element from the XML namespace of the XML Inclusions (XInclude) W3C recommendation. It is used to include, into a ParlaMint <teiCorpus> root file the elements of the corpus that are stored as separate files. These are the <TEI> corpus components and parts of the corpus root <teiHeader>. Inside <particDesc> these are <listPerson> & <listOrg>, and <taxonomy> inside <classDecl>.
Namespacehttp://www.w3.org/2001/XInclude
Modulederived-module-parlamint
Attributes
href
StatusOptional
Datatypeteidata.pointer
Contained by
core: teiCorpus
corpus: particDesc
header: classDecl
May containEmpty element
ExampleUsing XInclude in ParlaMint to include corpus components into the corpus root:
<teiCorpus xml:lang="en"  xml:id="ParlaMint-GB" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader> ...TEI header of the corpus...  </teiHeader> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2015/ParlaMint-GB_2015-01-05-commons.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2015/ParlaMint-GB_2015-01-06-commons.xml"/> ... </teiCorpus>

Appendix A.1.41 <kinesic>

<kinesic> (kinesic) marks any communicative phenomenon, not necessarily vocalized, for example a gesture, frown, etc. [8.3.3. Vocal, Kinesic, Incident]
Modulespoken — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.typed (type, @subtype) att.ascribed (@who)
type
StatusRecommended
Legal values are:
kinesic
applause
ringing
signal
playback
gesture
smiling
laughter
snapping
noise
Member of
Contained by
analysis: s
core: unit
linking: seg
spoken: u
textstructure: div
May contain
core: desc
Example
<kinesic type="signal">  <desc>sign for the end of discussion</desc> </kinesic>
Example
<kinesic type="laughter">  <desc xml:lang="hr">smijeh.</desc> </kinesic>
Example
<kinesic type="applause">  <desc xml:lang="sl">ploskanje</desc> </kinesic>
Content model
<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element kinesic
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.typed.attribute.subtype,
   tei_att.ascribed.attribute.who,
   attribute type
   {
      "kinesic"
    | "applause"
    | "ringing"
    | "signal"
    | "playback"
    | "gesture"
    | "smiling"
    | "laughter"
    | "snapping"
    | "noise"
   }?,
   tei_desc+
}

Appendix A.1.42 <label>

<label> (label) contains any label or heading used to identify part of a text, typically but not exclusively in a list or glossary. [3.8. Lists]
Modulecore — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Member of
Contained by
header: application
namesdates: event state
May contain
namesdates: orgName
character data
ExampleLabels denote the existence of organisations and connected events:
<org xml:id="DZrole="parliament"  ana="#parla.national #parla.lower">  <orgName xml:lang="slfull="yes">Državni zbor Republike Slovenije</orgName>  <orgName xml:lang="enfull="yes">National Assembly of the Republic of Slovenia</orgName>  <event from="1992-12-23">   <label xml:lang="en">existence</label>  </event> ... <listEvent>   <head xml:lang="sl">Mandatno obdobje</head>   <head xml:lang="en">Legislative period</head>   <event xml:id="DZ.7from="2014-08-01"    to="2018-06-21">    <label xml:lang="sl">7. mandat</label>    <label xml:lang="en">Term 7</label>   </event>   <event xml:id="DZ.8from="2018-06-22">    <label xml:lang="sl">8. mandat</label>    <label xml:lang="en">Term 8</label>   </event>  </listEvent> </org>
ExampleLabels may also be used to give a name to the tools used in compiling the corpus:
<application ident="int-tagger"  version="1.0">  <label>INT Tagger, lemmatizer and Tokenizer</label>  <desc xml:lang="en">INT Tagger, lemmatizer and Tokenizer for modern Dutch, based on old-school machine learning (SVM). It provides the legacy PoS tags (encoded in w/@ana) and the lemmata for Dutch. Not publicly available.</desc> </application>
ExampleLabels may also be used for other structured list items:
<listEvent>  <head xml:lang="lv">Saeimas sasaukumi</head>  <head xml:lang="en">Legislative period</head>  <event xml:id="PT.12from="2014-11-04"   to="2018-11-05">   <label xml:lang="lv">12. Saeima</label>   <label xml:lang="en">Term 12</label>  </event>  <event xml:id="PT.13from="2018-11-06">   <label xml:lang="lv">13. Saeima</label>   <label xml:lang="en">Term 13</label>  </event> </listEvent>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <textNode/>
  <elementRef key="orgName"/>
 </alternate>
</content>
    
Schema Declaration
element label { tei_att.global.attribute.xmllang, ( text | tei_orgName ) }

Appendix A.1.43 <langUsage>

<langUsage> (language usage) describes the languages, sublanguages, registers, dialects, etc. represented within a text. [2.4.2. Language Usage 2.4. The Profile Description 15.3.2. Declarable Elements]
Moduleheader — Formal specification
Contained by
header: profileDesc
May contain
header: language
Example
<langUsage>  <language ident="slxml:lang="sl">slovenski</language>  <language ident="enxml:lang="sl">angleški</language>  <language ident="slxml:lang="en">Slovenian</language>  <language ident="enxml:lang="en">English</language> </langUsage>
Content model
<content>
 <elementRef key="language" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element langUsage { tei_language+ }

Appendix A.1.44 <language>

<language> (language) characterizes a single language or sublanguage used within a text. [2.4.2. Language Usage]
Moduleheader — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
ident(identifier) Supplies a language code constructed as defined in BCP 47 which is used to identify the language documented by this element, and which is referenced by the global xml:lang attribute.
StatusRequired
Datatypeteidata.language
usagespecifies the approximate percentage (by volume) of the text which uses this language.
StatusOptional
DatatypenonNegativeInteger
Contained by
header: langUsage
May containCharacter data only
Note

Particularly for sublanguages, an informal prose characterization should be supplied as content for the element.

Example
<langUsage>  <language ident="esxml:lang="es">Español</language>  <language ident="esxml:lang="en">Spanish</language> </langUsage>
Example
<langUsage>  <language ident="bg-Latnxml:lang="en">Bulgarian in Latin script</language>  <language ident="bgxml:lang="bg">български</language>  <language ident="bgxml:lang="en">Bulgarian</language>  <language ident="enxml:lang="bg">английски</language>  <language ident="enxml:lang="en">English</language>  <language ident="frxml:lang="bg">френски</language>  <language ident="frxml:lang="en">French</language> </langUsage>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element language
{
   tei_att.global.attribute.xmllang,
   attribute ident { text },
   attribute usage { text }?,
   text
}

Appendix A.1.45 <licence>

<licence> contains information about a licence or other legal agreement applicable to the text. [2.2.4. Publication, Distribution, Licensing, etc.]
Moduleheader — Formal specification
Contained by
header: availability
May contain
XSD anyURI
Note

A <licence> element should be supplied for each licence agreement applicable to the text in question. The target attribute may be used to reference a full version of the licence. The when, notBefore, notAfter, from or to attributes may be used in combination to indicate the date or dates of applicability of the licence.

ExampleThe <licence> specifies fixed-value CC BY 4.0 URL, and in the following paragraph gives a prose description of the licence:
<licence>http://creativecommons.org/licenses/by/4.0/</licence> <p>This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref> </p>
ExampleThe textual information on licence can be given in more than one language:
<licence>http://creativecommons.org/licenses/by/4.0/</licence> <p xml:lang="hr">Ovaj rad je dostupan pod <ref target="http://creativecommons.org/licenses/by/4.0/">međunarodnom licencom Creative Commons Imenovanje 4.0</ref> </p> <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref> </p>
Content model
<content>
 <dataRef name="anyURI"/>
</content>
    
Schema Declaration
element licence { xsd:anyURI }

Appendix A.1.47 <linkGrp>

<linkGrp> (link group) defines a collection of associations or hypertextual links. [16.1. Links]
Modulelinking — Formal specification
Attributes
targFunc
StatusRequired
Legal values are:
head argument
type
StatusRequired
Legal values are:
UD-SYN
Member of
Contained by
analysis: s
May contain
linking: link
Note

May contain one or more <link> or <ptr> elements.

A web or link group is an administrative convenience, which should be used to collect a set of links together for any purpose, not simply to supply a default value for the type attribute.

ExampleSyntactic analysis is stored in the link group, <linkGrp> element, which is then composed of <link> elements. The example below illustrating this is given, for readability, without the word-level linguistic attributes and with shortened IDs:
<s xml:id="ParlaMint-GB_2021-01-06.seg393.8">  <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.1">I</w>  <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.2">support</w>  <w xml:id="ParlaMint-GB_2021-01-06.seg393.8.3">the</w>  <w join="right"   xml:id="ParlaMint-GB_2021-01-06.seg393.8.4">amendment</w>  <pc xml:id="ParlaMint-GB_2021-01-06.seg393.8.5">.</pc>  <linkGrp targFunc="head argument"   type="UD-SYN">   <link ana="ud-syn:nsubj"    target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.1"/>   <link ana="ud-syn:root"    target="#ParlaMint-GB_2021-01-06.seg393.8 #ParlaMint-GB_2021-01-06.seg393.8.2"/>   <link ana="ud-syn:det"    target="#ParlaMint-GB_2021-01-06.seg393.8.4 #ParlaMint-GB_2021-01-06.seg393.8.3"/>   <link ana="ud-syn:obj"    target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.4"/>   <link ana="ud-syn:punct"    target="#ParlaMint-GB_2021-01-06.seg393.8.2 #ParlaMint-GB_2021-01-06.seg393.8.5"/>  </linkGrp> </s>
Content model
<content>
 <elementRef maxOccurs="unbounded"
  key="link"/>
</content>
    
Schema Declaration
element linkGrp
{
   attribute targFunc { "head argument" },
   attribute type { "UD-SYN" },
   tei_link+
}

Appendix A.1.48 <listEvent>

<listEvent> (list of events) contains a list of descriptions, each of which provides information about an identifiable event. [13.3.1. Basic Principles]
Modulenamesdates — Formal specification
Member of
Contained by
namesdates: org
May contain
core: head
namesdates: event
Example
<listEvent>  <event xml:id="GOV.11from="2013-03-20"   to="2014-09-18">   <label xml:lang="sl">11. vlada Republike Slovenije (20. marec 2013 - 18. september 2014)</label>   <label xml:lang="en">11th Government of the Republic of Slovenia (20 March 2013 - 18 September 2014)</label>  </event> ... <event xml:id="GOV.14from="2018-03-13">   <label xml:lang="sl">14. vlada Republike Slovenije (13. marec 2020 - danes)</label>   <label xml:lang="en">14th Government of the Republic of Slovenia (March 13, 2020 - today)</label>  </event> </listEvent>
Example
<org ana="#parla.national #parla.upper"  role="parliamentxml:id="LEG">  <orgName full="yesxml:lang="it">Senato della Repubblica Italiana</orgName>  <orgName full="yesxml:lang="it">Senate of the Republic of Italy</orgName> ... <listEvent>   <event from="2013-03-15to="2018-03-22"    xml:id="LEG.17">    <label xml:lang="it">XVII Legislatura</label>    <label xml:lang="en">XVII Legislative Term</label>   </event>   <event from="2018-03-23xml:id="LEG.18">    <label xml:lang="it">XVIII Legislatura</label>    <label xml:lang="en">XVIII Legislative Term</label>   </event>  </listEvent> </org>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="head" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="event" minOccurs="0"
   maxOccurs="unbounded"/>
 </sequence>
</content>
    
Schema Declaration
element listEvent { tei_head*, tei_event* }

Appendix A.1.49 <listOrg>

<listOrg> (list of organizations) contains a list of elements, each of which provides information about an identifiable organisation. [13.2.2. Organizational Names]
Modulenamesdates — Formal specification
Attributesatt.global (n, xml:base, xml:space, @xml:id, @xml:lang)
Member of
Contained by
corpus: particDesc
May contain
core: head
namesdates: listRelation org
Note

The type attribute may be used to distinguish lists of organizations of a particular type if convenient.

Example
<listOrg>  <org xml:id="government.GB"   role="government"> ...  </org>  <org xml:id="PoGBrole="parliament"> ...  </org>  <org role="parliamentaryGroup"   xml:id="party.LI"> ...  </org> ... <listRelation> ...  </listRelation> </listOrg>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="head" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="org" minOccurs="1"
   maxOccurs="unbounded"/>
  <elementRef key="listRelation"
   minOccurs="0" maxOccurs="1"/>
 </sequence>
</content>
    
Schema Declaration
element listOrg
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   ( tei_head*, tei_org+, tei_listRelation? )
}

Appendix A.1.50 <listPerson>

<listPerson> (list of persons) contains a list of descriptions, each of which provides information about an identifiable person or a group of people, for example the participants in a language interaction, or the people referred to in a historical source. [13.3.2. The Person Element 15.2. Contextual Information 2.4. The Profile Description 15.3.2. Declarable Elements]
Modulenamesdates — Formal specification
Attributesatt.global (n, xml:base, xml:space, @xml:id, @xml:lang)
Member of
Contained by
corpus: particDesc
May contain
core: head
namesdates: person
Note

The type attribute may be used to distinguish lists of people of a particular type if convenient.

Example
<listPerson>  <head>List of speakers</head>  <person xml:id="SayeedaWarsi"> ...  </person>  <person xml:id="DavidHamilton"> ...  </person> ... </listPerson>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="head" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="person" minOccurs="1"
   maxOccurs="unbounded"/>
 </sequence>
</content>
    
Schema Declaration
element listPerson
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   ( tei_head*, tei_person+ )
}

Appendix A.1.51 <listPrefixDef>

<listPrefixDef> (list of prefix definitions) contains a list of definitions of prefixing schemes used in teidata.pointer values, showing how abbreviated URIs using each scheme may be expanded into full URIs. [16.2.3. Using Abbreviated Pointers]
Moduleheader — Formal specification
Contained by
header: encodingDesc
May contain
header: prefixDef
ExampleIn this example, two private URI scheme prefixes are defined and patterns are provided for dereferencing them. Each prefix is also supplied with a human-readable explanation in a <p> element.
<listPrefixDef>  <prefixDef ident="ud-syn"   matchPattern="(.+)replacementPattern="#$1">   <p>Private URIs with this prefix point to elements giving their name. In this document they are simply local references into the UD-SYN taxonomy categories in the corpus root TEI header.</p>  </prefixDef>  <prefixDef ident="nematchPattern="(.+)"   replacementPattern="#NER.cnec2.0.$1">   <p>Taxonomy for named entities (cnec2.0)</p>  </prefixDef> </listPrefixDef>
Content model
<content>
 <elementRef key="prefixDef" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element listPrefixDef { tei_prefixDef+ }

Appendix A.1.52 <listRelation>

<listRelation> provides information about relationships identified amongst people, places, and organisations, either informally as prose or as formally expressed relation links. [13.3.2.3. Personal Relationships]
Modulenamesdates — Formal specification
Member of
Contained by
namesdates: listOrg
May contain
namesdates: relation
Note

May contain a prose description organized as paragraphs, or a sequence of <relation> elements.

Example
<listOrg>  <org role="parliamentaryGroup"   xml:id="party.LD">   <orgName full="yes">Liberal Democrat</orgName>   <orgName full="abb">LD</orgName>  </org>  <org role="parliamentaryGroup"   xml:id="party.I">   <orgName full="yes">Independent</orgName>   <orgName full="abb">I</orgName>  </org>  <org role="parliamentaryGroup"   xml:id="party.0UBS">   <orgName full="yes">Independent Conservative</orgName>   <orgName full="abb">0UBS</orgName>  </org>  <org>... </org>  <listRelation>   <relation name="coalition"    mutual="#party.CON #party.LDfrom="2010-05-06to="2015-05-07"/>   <relation name="opposition"    active="#party.LAB #party.SO0T #party.64RT #party.SDLP #party.L1QU #party.0UBS #party.BI #party.LI #party.LB #party.LJ95 #party.IGC #party.NPBE #party.CB #party.QMZZ #party.IL #party.UUP #party.FZPG #party.A #party.GP #party.SNP #party.I #party.L8TA #party.CON #party.NA #party.DUP #party.UUSL #party.ZKPW #party.UKIP #party.PCpassive="#government.GB"    from="2010-05-06to="2015-05-07"/>   <relation>...</relation>    ...  </listRelation> </listOrg>
Content model
<content>
 <elementRef key="relation" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element listRelation { tei_relation+ }

Appendix A.1.53 <measure>

<measure> (measure) contains a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name. [3.6.3. Numbers and Measures]
Modulecore — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
unit
StatusRequired
Legal values are:
speeches
words
tokens
optional value
quantity(quantity) specifies the number of the specified units that comprise the measurement
Derived fromatt.measurement
StatusRequired
Datatypeteidata.numeric
Member of
Contained by
header: extent
May containCharacter data only
Example
<measure unit="speechesquantity="75122"  xml:lang="sl">75.122 govorov</measure> <measure unit="speechesquantity="75122"  xml:lang="en">75,122 speeches</measure> <measure unit="wordsquantity="20190034"  xml:lang="sl">20.190.034 besed</measure> <measure unit="wordsquantity="20190034"  xml:lang="en">20,190,034 words</measure>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element measure
{
   tei_att.global.attribute.xmllang,
   attribute unit { "speeches" | "words" | "tokens" },
   attribute quantity { text },
   text
}

Appendix A.1.54 <media>

<media> indicates the location of any form of external media such as an audio or video clip etc. [3.10. Graphics and Other Non-textual Components]
Modulecore — Formal specification
Attributesatt.resourced (@url) att.global (n, xml:lang, xml:base, xml:space, @xml:id) att.global.source (@source)
mimeType(MIME media type) specifies the applicable multimedia internet mail extension (MIME) media type
Derived fromatt.internetMedia
StatusRequired
Datatype1–∞ occurrences of teidata.word separated by whitespace
Member of
Contained by
spoken: recording
May containEmpty element
Note

The attributes available for this element are not appropriate in all cases. For example, it makes no sense to specify the temporal duration of a graphic. Such errors are not currently detected.

The mimeType attribute must be used to specify the MIME media type of the resource specified by the url attribute.

Example
<recording type="audio">  <media xml:id="ps2013-009-01-001-001.audio1"   mimeType="audio/mp3"   source="https://www.psp.cz/eknih/2013ps/audio/2014/05/07/2014050713581412.mp3"   url="2013ps/audio/2014/05/07/2014050713581412.mp3"/>  <media xml:id="ps2013-009-01-001-001.audio2"   mimeType="audio/mp3"   source="https://www.psp.cz/eknih/2013ps/audio/2014/05/07/2014050714081422.mp3"   url="2013ps/audio/2014/05/07/2014050714081422.mp3"/>  <media xml:id="ps2013-009-01-001-001.audio3"   mimeType="audio/mp3"   source="https://www.psp.cz/eknih/2013ps/audio/2014/05/07/2014050714181432.mp3"   url="2013ps/audio/2014/05/07/2014050714181432.mp3"/> ... </recording>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element media
{
   tei_att.global.attribute.xmlid,
   tei_att.global.source.attribute.source,
   tei_att.resourced.attributes,
   attribute mimeType { list { + } },
   empty
}

Appendix A.1.55 <meeting>

<meeting> contains the formalized descriptive title for a meeting or conference, for use in a bibliographic description for an item derived from such a meeting, or as a heading or preamble to publications emanating from it. [3.12.2.2. Titles, Authors, and Editors]
Modulecore — Formal specification
Attributesatt.global (xml:id, xml:base, xml:space, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp) att.global.analytic (@ana)
Contained by
header: titleStmt
May containCharacter data only
ExampleThe specification of the particular sessions that the corpus or corpus component contains are encoded with <meeting>:
<meeting n="7corresp="#DZ"  ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting> <meeting n="8corresp="#DZ"  ana="#parla.lower #parla.term #DZ.8">8. mandat</meeting>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element meeting
{
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   tei_att.global.analytic.attribute.ana,
   text
}

Appendix A.1.56 <name>

<name> (name, proper noun) contains a proper noun or noun phrase. [3.6.1. Referring Strings]
Modulecore — Formal specification
Attributesatt.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.personal (@full) att.canonical (@key, @ref) att.typed (type, @subtype)
type
StatusOptional
Legal values are:
PER
LOC
ORG
MISC
city
country
address
org
place
Member of
Contained by
analysis: s
core: name unit
corpus: setting
header: change
namesdates: placeName
May contain
analysis: pc w
core: date name num pb
character data
Note

Proper nouns referring to people, places, and organizations may be tagged instead with <persName>, <placeName>, or <orgName>, when the TEI module for names and dates is included.

ExampleThe element is used to mark up Named Entities in the linguistically analysed corpus, in which case it should have the type attribute with one of the allowed values. It can also have a ref attribute to link it a definition:
... <w lemma="andmsd="UPosTag=CCONJ">and</w> <name type="ORG"  ref="https://en.wikipedia.org/wiki/Westminster">  <w join="rightlemma="Westminster"   msd="UPosTag=PROPN|Number=Sing">Westminster</w> </name> <w lemma=",msd="UPosTag=PUNCT">,</w> ...
ExampleElement <name> is used in the TEI header to specify the location of the parliament:
<name type="place">Westminster</name> <name type="city">London</name> <name type="countrykey="GB">U.K.</name>
ExampleThe element is used in the TEI header to denote person's responsibility for changes:
<revisionDesc>  <change when="2021-06-11">   <name>Tomaž Erjavec</name>: Finalized encoding.</change>  <change when="2021-05-28">   <name>Tomaž Erjavec</name>: Built corpus.</change> </revisionDesc>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <elementRef key="name"/>
  <elementRef key="date"/>
  <elementRef key="num"/>
  <elementRef key="pb"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element name
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.personal.attribute.full,
   tei_att.canonical.attribute.key,
   tei_att.canonical.attribute.ref,
   tei_att.typed.attribute.subtype,
   attribute type
   {
      "PER"
    | "LOC"
    | "ORG"
    | "MISC"
    | "city"
    | "country"
    | "address"
    | "org"
    | "place"
   }?,
   ( tei_w | tei_pc | tei_name | tei_date | tei_num | tei_pb | text )+
}

Appendix A.1.58 <namespace>

<namespace> (namespace) supplies the formal name of the namespace to which the elements documented by its children belong. [2.3.4. The Tagging Declaration]
Moduleheader — Formal specification
Attributes
name
StatusRequired
Legal values are:
http://www.tei-c.org/ns/1.0
Contained by
header: tagsDecl
May contain
header: tagUsage
ExampleTo distinguish the TEI elements from the possible use of elements from other namespaces, a <namespace> element giving the TEI namespace is introduced first:
<tagsDecl>  <namespace name="http://www.tei-c.org/ns/1.0">   <tagUsage gi="textoccurs="414"/>   <tagUsage gi="bodyoccurs="414"/>   <tagUsage gi="divoccurs="414"/>   <tagUsage gi="headoccurs="826"/>   <tagUsage gi="uoccurs="75122"/>   <tagUsage gi="segoccurs="280971"/>   <tagUsage gi="noteoccurs="85525"/>   <tagUsage gi="gapoccurs="7897"/>   <tagUsage gi="vocaloccurs="1740"/>   <tagUsage gi="incidentoccurs="37"/>   <tagUsage gi="kinesicoccurs="560"/>   <tagUsage gi="descoccurs="10234"/>  </namespace> </tagsDecl>
Content model
<content>
 <elementRef key="tagUsage" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element namespace
{
   attribute name { "http://www.tei-c.org/ns/1.0" },
   tei_tagUsage+
}

Appendix A.1.59 <normalization>

<normalization> (normalization) indicates the extent of normalization or regularization of the original source carried out in converting it to electronic form. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements]
Moduleheader — Formal specification
Contained by
May contain
core: p
Example
<editorialDecl> ... <normalization>   <p xml:lang="en">Text has not been normalised, except for spacing.</p>  </normalization> ... </editorialDecl>
Content model
<content>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element normalization { tei_p+ }

Appendix A.1.60 <note>

<note> (note) contains a note or annotation. [3.9.1. Notes and Simple Annotation 2.2.6. The Notes Statement 3.12.2.8. Notes and Statement of Language 9.3.5.4. Notes within Entries]
Modulecore — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.typed (type, @subtype)
type
StatusRecommended
Sample values include:
narrative
Description in the third person of events taking place in the meeting, e.g. "Mr X. takes the Chair".
summary
Summaries of speeches that are individually not interesting, e.g. "Question put and agreed to".
speaker
Name, role and possible description of a person doing the speech
vote
Outcome of a vote
location
The location of the speaker, who was not on the podium
date
Date of the session
president
Chairman of a meeting
comment
Comment of parliamentary reporter
time
Date and time of the beginning and end of the debate
quorum
The presence of the members of parliament
debate
Comments on the conduct of debates
Member of
Contained by
analysis: s
core: unit
linking: seg
namesdates: state
spoken: u
textstructure: div
May contain
core: pb time
character data
Example<note> element is used to encode transcriber comments such as who spoke, what the time was, interruptions, notes on what is happening in the chamber, results of voting etc.:
<note type="speaker">The president, Dr. Milan Brglez:</note> ... <note type="time">The session began at 10 o'clock.</note> ... <note type="vote-ayes">84 voted for the adoption of the measure.</note> ... <note type="vote-noes">2 voted against the adoption of the measure.</note> ...
ExampleThe <note> element can be further qualified by the <time> element to specify the date and time recorded in the note; and can also contain a page break, <pb>:
<note type="time">The session began <pb/> at <time when="2016-04-13T010:00:00">10 o'clock</time>.</note>
ExampleThe <note> element may also be used to mark any additional information on debate sections:
<div type="debateSection">  <head>Business Before Questions</head>  <note>Death of a Member</note>  <u xml:id="ParlaMint-GB_2019-02-18-commons.u1">...</u> ... <note>End of debateSection.</note> </div>
Content model
<content>
 <alternate minOccurs="0"
  maxOccurs="unbounded">
  <textNode/>
  <elementRef key="pb"/>
  <elementRef key="time"/>
 </alternate>
</content>
    
Schema Declaration
element note
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.typed.attribute.subtype,
   attribute type { text }?,
   ( text | tei_pb | tei_time )*
}

Appendix A.1.61 <num>

<num> (number) contains a number, written in any form. [3.6.3. Numbers and Measures]
Modulecore — Formal specification
Attributesatt.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.typed (type, @subtype)
typeindicates the type of numeric value.
Derived fromatt.typed
StatusOptional
Datatypeteidata.enumerated
Suggested values include:
cardinal
absolute number, e.g. 21, 21.5
ordinal
ordinal number, e.g. 21st
fraction
fraction, e.g. one half or three-quarters
percentage
a percentage
Note

If a different typology is desired, other values can be used for this attribute.

Member of
Contained by
analysis: s
core: name unit
May contain
analysis: pc w
character data
Note

Detailed analyses of quantities and units of measure in historical documents may also use the feature structure mechanism described in chapter 18. Feature Structures. The <num> element is intended for use in simple applications.

ExampleThe element can be used for fine-grained Named Entities which include numbers:
<num ana="ne:n_"  xml:id="ParlaMint-CZ_2018-11-13-ps2017-020-09-004-010.ne138">  <w xml:id="ParlaMint-CZ_2018-11-13-ps2017-020-09-004-010.u6.p17.s3.w12"   lemma="428"   msd="UPosTag=NUM|NumForm=Digit|NumType=Cardjoin="right">428</w> </num>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element num
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.typed.attribute.subtype,
   attribute type { "cardinal" | "ordinal" | "fraction" | "percentage" }?,
   ( tei_w | tei_pc | text )+
}

Appendix A.1.62 <occupation>

<occupation> (occupation) contains an informal description of a person's trade, profession or occupation. [15.2.2. The Participant Description]
Modulenamesdates — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang) att.datable.w3c (notBefore, notAfter, @when, @from, @to)
Contained by
namesdates: person
May containCharacter data only
Note

The content of this element may be used as an alternative to the more formal specification made possible by its attributes; it may also be used to supplement the formal specification with commentary or clarification.

Example
<person n="2678xml:id="SimeonovValeri">  <persName xml:lang="bg">   <forename>Валери</forename>   <surname>Симеонов</surname>  </persName>  <sex value="M"/>  <birth when="1955-03-14">   <placeName>Долни Чифлик, България</placeName>  </birth>  <education>инженер</education>  <occupation>политик</occupation> ... </person>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element occupation
{
   tei_att.global.attribute.xmllang,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   text
}

Appendix A.1.63 <org>

<org> (organization) provides information about an identifiable organisation such as the government, political party, ministry etc. [13.3.3. Organizational Data]
Modulenamesdates — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang) att.global.analytic (@ana)
xml:id(identifier) provides a unique identifier for the element bearing the attribute.
Derived fromatt.global
StatusRequired
DatatypeID
role
StatusRequired
Legal values are:
country
federatedState
republic
government
ministry
parliament
politicalParty
parliamentaryGroup
conferenceOfChairs
boardOfParliament
ngo
institution
senate
committee
subcommittee
commission
delegation
supervisoryBoard
workingGroup
interparliamentaryFriendshipGroup
nationalCouncil
chamberOfThePeople
chamberOfTheNations
europeanCommission
europeanParliament
europeanInstitution
internationalOrganisation
boardOfDirectors
ethnicCommunity
Contained by
namesdates: listOrg
May contain
core: desc head
header: idno
Example
<org xml:id="government.BE"  role="government">  <orgName xml:lang="enfull="yes">Federal Government of Belgium</orgName>  <orgName xml:lang="nlfull="yes">Federale regering</orgName>  <orgName xml:lang="frfull="yes">Gouvernement fédéral</orgName> </org> <org ana="#parla.federal #parla.lower"  role="parliamentxml:id="be_federal_parliament">  <orgName full="yesxml:lang="nl">Federaal Parlement van België</orgName>  <orgName full="yesxml:lang="en">Belgian Federal Parliament</orgName>  <event from="1831-02-07">   <label xml:lang="en">existence</label>  </event> ... </org>
Example
<org xml:id="party.PS2"  role="parliamentaryGroup">  <orgName full="yesxml:lang="sl">Pozitivna Slovenija</orgName>  <orgName full="yesxml:lang="en">Positive Slovenia</orgName>  <orgName full="abb">PS</orgName>  <event from="2011-10-22">   <label xml:lang="en">existence</label>  </event>  <idno type="URIxml:lang="sl"   subtype="wikimedia">https://sl.wikipedia.org/wiki/Pozitivna_Slovenija</idno>  <idno type="URIxml:lang="en"   subtype="wikimedia">https://en.wikipedia.org/wiki/Positive_Slovenia</idno> </org>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="head" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="orgName" minOccurs="1"
   maxOccurs="unbounded"/>
  <elementRef key="event" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="idno" minOccurs="0"
   maxOccurs="unbounded"/>
  <elementRef key="desc" minOccurs="0"
   maxOccurs="1"/>
  <elementRef key="listEvent" minOccurs="0"
   maxOccurs="1"/>
  <elementRef key="state" minOccurs="0"
   maxOccurs="unbounded"/>
 </sequence>
</content>
    
Schema Declaration
element org
{
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   attribute xml:id { text },
   attribute role
   {
      "country"
    | "federatedState"
    | "republic"
    | "government"
    | "ministry"
    | "parliament"
    | "politicalParty"
    | "parliamentaryGroup"
    | "conferenceOfChairs"
    | "boardOfParliament"
    | "ngo"
    | "institution"
    | "senate"
    | "committee"
    | "subcommittee"
    | "commission"
    | "delegation"
    | "supervisoryBoard"
    | "workingGroup"
    | "interparliamentaryFriendshipGroup"
    | "nationalCouncil"
    | "chamberOfThePeople"
    | "chamberOfTheNations"
    | "europeanCommission"
    | "europeanParliament"
    | "europeanInstitution"
    | "internationalOrganisation"
    | "boardOfDirectors"
    | "ethnicCommunity"
   },
   (
      tei_head*,
      tei_orgName+,
      tei_event*,
      tei_idno*,
      tei_desc?,
      tei_listEvent?,
      tei_state*
   )
}

Appendix A.1.64 <orgName>

<orgName> (organization name) contains an organisational name. [13.2.2. Organizational Names]
Modulenamesdates — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang) att.canonical (key, @ref)
fromindicates the starting point of the period in standard form, e.g. yyyy-mm-dd.
Derived fromatt.datable.w3c
StatusOptional
Datatypeteidata.temporal.w3c
Note

Used when "the same" party changes its name

toindicates the ending point of the period in standard form, e.g. yyyy-mm-dd.
Derived fromatt.datable.w3c
StatusOptional
Datatypeteidata.temporal.w3c
Note

Used when "the same" party changes its name

full
StatusOptional
Legal values are:
yes
abb
Member of
Contained by
header: funder
namesdates: affiliation org
May containCharacter data only
Example
<funder>  <orgName xml:lang="en">The CLARIN research infrastructure</orgName>  <orgName xml:lang="sl">Raziskovalna infrastruktura CLARIN</orgName> </funder>
Example
<org xml:id="party.PS1"  role="parliamentaryGroup">  <orgName full="yesxml:lang="en">Positive Slovenia</orgName>  <orgName full="yesxml:lang="sl">Pozitivna Slovenija</orgName>  <orgName full="abbxml:lang="sl">PS</orgName> </org>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element orgName
{
   tei_att.global.attribute.xmllang,
   tei_att.canonical.attribute.ref,
   attribute from { text }?,
   attribute to { text }?,
   attribute full { "yes" | "abb" }?,
   text
}

Appendix A.1.65 <p>

<p> (paragraph) marks paragraphs in prose. [3.1. Paragraphs 7.2.5. Speech Contents]
Modulecore — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Member of
Contained by
May contain
core: ref
character data
Example
<projectDesc>  <p>   <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>  </p> </projectDesc>
Example
<availability status="free">  <licence>http://creativecommons.org/licenses/by/4.0/</licence>  <p>This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>  <p>This work is also licensed under the <ref target="https://www.parliament.uk/site-information/copyright-parliament/open-parliament-licence/">Open Parliament Licence v3.0</ref>.</p> </availability>
Schematron
<sch:report test="(ancestor::tei:ab or ancestor::tei:p) and not( ancestor::tei:floatingText |parent::tei:exemplum |parent::tei:item |parent::tei:note |parent::tei:q |parent::tei:quote |parent::tei:remarks |parent::tei:said |parent::tei:sp |parent::tei:stage |parent::tei:cell |parent::tei:figure )"> Abstract model violation: Paragraphs may not occur inside other paragraphs or ab elements. </sch:report>
Schematron
<sch:report test="(ancestor::tei:l or ancestor::tei:lg) and not( ancestor::tei:floatingText |parent::tei:figure |parent::tei:note )"> Abstract model violation: Lines may not contain higher-level structural elements such as div, p, or ab, unless p is a child of figure or note, or is a descendant of floatingText. </sch:report>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="ref"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element p { tei_att.global.attribute.xmllang, ( tei_ref | text )+ }

Appendix A.1.66 <particDesc>

<particDesc> (participation description) describes the identifiable speakers and organisations in a ParlaMint corpus. This informations is given in the corpus root teiHeder. Note that the listPerson and listOrg elements are typically stored in separate files. [15.2. Contextual Information]
Modulecorpus — Formal specification
Contained by
header: profileDesc
May contain
derived-module-parlamint: include
namesdates: listOrg listPerson
Note

May contain a prose description organized as paragraphs, or a structured list of persons and person groups, with an optional formal specification of any relationships amongst them.

Example
<particDesc> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-listOrg.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="href="ParlaMint-SI-listPerson.xml"/> </particDesc>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <alternate minOccurs="1" maxOccurs="1">
   <elementRef key="listOrg"/>
   <elementRef key="include"/>
  </alternate>
  <alternate minOccurs="1" maxOccurs="1">
   <elementRef key="listPerson"/>
   <elementRef key="include"/>
  </alternate>
 </sequence>
</content>
    
Schema Declaration
element particDesc
{
   ( tei_listOrg | tei_include ), ( tei_listPerson | tei_include )
}

Appendix A.1.67 <pb>

<pb> (page beginning) marks the beginning of a new page in a paginated document. [3.11.3. Milestone Elements]
Modulecore — Formal specification
Attributesatt.global (xml:lang, xml:base, xml:space, @xml:id, @n) att.global.linking (synch, next, prev, @corresp) att.global.source (@source)
Member of
Contained by
analysis: phr s
core: name note
linking: seg
spoken: u
textstructure: div
May containEmpty element
Note

A <pb> element should appear at the start of the page which it identifies. The global n attribute indicates the number or other value associated with this page. This will normally be the page number or signature printed on it, since the physical sequence number is implicit in the presence of the <pb> element itself.

The type attribute may be used to characterize the page break in any respect. The more specialized attributes break, ed, or edRef should be preferred when the intent is to indicate whether or not the page break is word-breaking, or to note the source from which it derives.

Example
<body>  <div type="debateSection">   <pb source="https://www.psp.cz/eknih/2013ps/stenprot/017schuz/s017357.htm"    n="1"    xml:id="ParlaMint-CZ_2014-10-01-ps2013-017-09-003-036.pb1corresp="#ps2013-017-09-003-036.audio1"/>    ...  </div> </body>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element pb
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.linking.attribute.corresp,
   tei_att.global.source.attribute.source,
   empty
}

Appendix A.1.68 <pc>

<pc> (punctuation character) contains a character or string of characters regarded as constituting a single punctuation mark. [17.1.2. Below the Word Level 17.4.2. Lightweight Linguistic Annotation]
Moduleanalysis — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang) att.global.analytic (@ana) att.linguistic (lemma, msd, @pos, @join) att.lexicographic.normalized (@norm)
xml:id
StatusRequired
DatatypeID
msd
StatusRequired
Datatypeteidata.text
Member of
Contained by
analysis: phr s
May containCharacter data only
Example
<s>  <w lemma="I"   msd="UPosTag=PRON|Case=Nom|Number=Sing|Person=1|PronType=Prspos="PRP">I</w>  <w lemma="support"   msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Finpos="VBP">support</w>  <w lemma="the"   msd="UPosTag=DET|Definite=Def|PronType=Artpos="DT">the</w>  <w lemma="amendment"   msd="UPosTag=NOUN|Number=Singpos="NNjoin="right">amendment</w>  <pc msd="UPosTag=PUNCTpos=".">.</pc> </s>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element pc
{
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.linguistic.attribute.pos,
   tei_att.linguistic.attribute.join,
   tei_att.lexicographic.normalized.attribute.norm,
   attribute xml:id { text },
   attribute msd { text },
   text
}

Appendix A.1.69 <persName>

<persName> (personal name) contains a proper noun or proper-noun phrase referring to a person, possibly including one or more of the person's forenames, surnames, honorifics, added names, etc. [13.2.1. Personal Names]
Modulenamesdates — Formal specification
Attributesatt.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.datable.w3c (when, notBefore, notAfter, @from, @to) att.canonical (key, @ref)
Member of
Contained by
core: respStmt
namesdates: person
May contain
core: term
character data
Note

Special persons (like 'anonymous', 'group' etc.) have their name in <term>.

Example
<persName>  <surname>Broekers-Knol</surname>  <forename>Ankie</forename> </persName>
Example
<respStmt>  <persName>Matthew Coole</persName>  <resp>TEI corpus encoding</resp> </respStmt>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <alternate minOccurs="1"
   maxOccurs="unbounded">
   <elementRef key="forename" minOccurs="1"
    maxOccurs="unbounded"/>
   <elementRef key="addName" minOccurs="0"
    maxOccurs="unbounded"/>
   <elementRef key="nameLink" minOccurs="0"
    maxOccurs="1"/>
   <elementRef key="roleName" minOccurs="0"
    maxOccurs="unbounded"/>
   <elementRef key="surname" minOccurs="1"
    maxOccurs="unbounded"/>
  </alternate>
  <alternate minOccurs="1"
   maxOccurs="unbounded">
   <elementRef key="term"/>
  </alternate>
  <alternate minOccurs="1" maxOccurs="1">
   <textNode/>
  </alternate>
 </alternate>
</content>
    
Schema Declaration
element persName
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   tei_att.canonical.attribute.ref,
   (
      (
         tei_forename+
       | tei_addName*
       | tei_nameLink?
       | tei_roleName*
       | tei_surname+
      )+
    | tei_term+
    | ( text )
   )
}

Appendix A.1.70 <person>

<person> (person) provides information about an identifiable individual, for example a participant in a language interaction, or a person referred to in a historical source. [13.3.2. The Person Element 15.2.2. The Participant Description]
Modulenamesdates — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang)
Contained by
namesdates: listPerson
May contain
Note

May contain either a prose description organized as paragraphs, or a sequence of more specific demographic elements drawn from the model.personPart class.

Example
<person xml:id="AliciaKearns">  <persName>   <forename>Alicia</forename>   <forename>Alexandra Martha</forename>   <surname>Kearns</surname>  </persName>  <sex value="F"/>  <affiliation from="2019-12-12"   ref="#parla.lowerrole="member"/>  <affiliation from="2019-12-12"   ref="#party.CONrole="member"/>  <idno subtype="contacttype="URI">https://members.parliament.uk/member/4805/contact</idno> </person>
Example
<person xml:id="AdamowiczPiotr">  <persName>   <forename>Piotr</forename>   <surname>Adamowicz</surname>  </persName>  <birth when="1961-06-26">26.06.1961</birth>  <sex value="M"/>  <affiliation role="memberref="#party.KO"/> </person>
Content model
<content>
 <sequence minOccurs="1" maxOccurs="1">
  <elementRef key="persName" minOccurs="1"
   maxOccurs="unbounded"/>
  <alternate minOccurs="0"
   maxOccurs="unbounded">
   <elementRef key="sex" minOccurs="0"
    maxOccurs="1"/>
   <elementRef key="birth" minOccurs="0"
    maxOccurs="1"/>
   <elementRef key="death" minOccurs="0"
    maxOccurs="1"/>
   <elementRef key="affiliation"
    minOccurs="0" maxOccurs="unbounded"/>
   <elementRef key="occupation"
    minOccurs="0" maxOccurs="unbounded"/>
   <elementRef key="education"
    minOccurs="0" maxOccurs="unbounded"/>
   <elementRef key="idno" minOccurs="0"
    maxOccurs="unbounded"/>
   <elementRef key="figure" minOccurs="0"
    maxOccurs="unbounded"/>
  </alternate>
 </sequence>
</content>
    
Schema Declaration
element person
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   (
      tei_persName+,
      (
         tei_sex?
       | tei_birth?
       | tei_death?
       | tei_affiliation*
       | tei_occupation*
       | tei_education*
       | tei_idno*
       | tei_figure*
      )*
   )
}

Appendix A.1.71 <phr>

<phr> (phrase) contains a semantic multi-word unit. [17.1. Linguistic Segment Categories]
Moduleanalysis — Formal specification
Attributesatt.global (n, xml:base, xml:space, @xml:id, @xml:lang)
ana(analysis) indicates one or more elements containing interpretations of the element on which the ana attribute appears.
Derived fromatt.global.analytic
StatusRequired
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
function(function) characterizes the function of the segment.
Derived fromatt.segLike
StatusRequired
Datatypeteidata.enumerated
type
StatusOptional
Legal values are:
sem
Member of
Contained by
analysis: s
May contain
analysis: pc w
core: pb
character data
Note

The type attribute may be used to indicate the type of phrase, taking values such as noun, verb, preposition, etc. as appropriate.

ExampleThe element is used to mark multi-word units (MWEs) which have a semantic interpretation. The type should be set to sem. The MWE should be marked with the function (all semantic tags) and ana (semantic categories) attributes:
... ... <phr type="semfunction="Z4ana="sem:Z4">  <w pos="INmsd="UPosTag=ADPlemma="on"   function="Z4ana="sem:Z4">On</w>  <w pos="DT"   msd="UPosTag=DET|Definite=Def|PronType=Artlemma="thefunction="Z4ana="sem:Z4">the</w>  <w pos="JJmsd="UPosTag=ADJ|Degree=Pos"   lemma="otherfunction="Z4ana="sem:Z4">other</w>  <w pos="NNmsd="UPosTag=NOUN|Number=Sing"   lemma="handfunction="Z4ana="sem:Z4join="right">hand</w> </phr> ...
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <elementRef key="pb"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element phr
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   attribute ana { list { + } },
   attribute function { text },
   attribute type { "sem" }?,
   ( tei_w | tei_pc | tei_pb | text )+
}

Appendix A.1.72 <placeName>

<placeName> (place name) contains a place name. [13.2.3. Place Names]
Modulenamesdates — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang) att.canonical (key, @ref)
Member of
Contained by
namesdates: birth death
May contain
core: name
character data
Example
<placeName ref="https://www.geonames.org/2523918">Palermo</placeName>
Example
<placeName>Tours-Saint-Symphorien, Indre-et-Loire</placeName>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="name" minOccurs="0"
   maxOccurs="1"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element placeName
{
   tei_att.global.attribute.xmllang,
   tei_att.canonical.attribute.ref,
   ( tei_name? | text )
}

Appendix A.1.73 <prefixDef>

<prefixDef> (prefix definition) defines a prefixing scheme used in teidata.pointer values, showing how abbreviated URIs using the scheme may be expanded into full URIs. [16.2.3. Using Abbreviated Pointers]
Moduleheader — Formal specification
Attributes
matchPatternspecifies a regular expression against which the values of other attributes can be matched.
Derived fromatt.patternReplacement
StatusRequired
Datatypeteidata.pattern
replacementPatternspecifies a ‘replacement pattern’, that is, the skeleton of a relative or absolute URI containing references to groups in the matchPattern which, once subpattern substitution has been performed, complete the URI.
Derived fromatt.patternReplacement
StatusRequired
Datatypeteidata.replacement
Note

Using TEI-defined XPointer schemes is not allowed.

identsupplies a name which functions as the prefix for an abbreviated pointing scheme such as a private URI scheme. The prefix constitutes the text preceding the first colon.
StatusRequired
Datatypeteidata.prefix
Note

The value is limited to teidata.prefix so that it may be mapped directly to a URI prefix.

Contained by
May contain
core: p
Note

The abbreviated pointer may be dereferenced to produce either an absolute or a relative URI reference. In the latter case it is combined with the value of xml:base in force at the place where the pointing attribute occurs to form an absolute URI in the usual manner as prescribed by XML Base.

Example
<prefixDef ident="mtematchPattern="(.+)"  replacementPattern="http://nl.ijs.si/ME/V6/msd/tables/msd-fslib-hbs.xml#$1">  <p xml:lang="en">Private URIs with this prefix point to feature-structure elements defining the Serbocroatian MULTEXT-East Version 6 MSDs.</p> </prefixDef>
Content model
<content>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element prefixDef
{
   attribute matchPattern { text },
   attribute replacementPattern { text },
   attribute ident { text },
   tei_p+
}

Appendix A.1.74 <profileDesc>

<profileDesc> (text-profile description) provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting. [2.4. The Profile Description 2.1.1. The TEI Header and Its Components]
Moduleheader — Formal specification
Contained by
header: teiHeader
May contain
Note

Although the content model permits it, it is rarely meaningful to supply multiple occurrences for any of the child elements of <profileDesc> unless these are documenting multiple texts.

ExampleGeneral structure of the element <profileDesc>:
<profileDesc>  <settingDesc>...</settingDesc>  <textClass>...</textClass>  <particDesc>...</particDesc>  <langUsage>...</langUsage> </profileDesc>
ExampleProfile description of a corpus root:
<profileDesc>  <settingDesc>   <setting>    <name type="address">Šubičeva ulica 4</name>    <name type="city">Ljubljana</name>    <name type="countrykey="SI">Slovenia</name>    <date from="2014-08-01to="2020-07-16">1.8.2014 - 16.7.2020</date>   </setting>  </settingDesc>  <textClass>   <textClass>    <catRef scheme="#parla.legislature"     target="#parla.bi #parla.lower"/>   </textClass>  </textClass>  <particDesc>    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"    href="href="ParlaMint-SI-listOrg.xml"/>    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"    href="href="ParlaMint-SI-listPerson.xml"/>  </particDesc>  <langUsage>   <langUsage>    <language ident="slxml:lang="sl">slovenski</language>    <language ident="enxml:lang="sl">angleški</language>    <language ident="slxml:lang="en">Slovenian</language>    <language ident="enxml:lang="en">English</language>   </langUsage>  </langUsage> </profileDesc>
ExampleProfile description for a corpus component. In contrast to the corpus root, only the first, the <settingDesc> is used in corpus components.
<profileDesc>  <settingDesc>   <setting>    <name type="city">Ljubljana</name>    <name type="countrykey="SI">Slovenija</name>    <date when="2014-08-28"     ana="#parla.sitting">28.8.2014</date>   </setting>  </settingDesc> </profileDesc>
Content model
<content>
 <elementRef key="settingDesc"/>
 <elementRef key="textClass" minOccurs="0"
  maxOccurs="1"/>
 <elementRef key="particDesc" minOccurs="0"
  maxOccurs="1"/>
 <elementRef key="langUsage" minOccurs="0"
  maxOccurs="1"/>
</content>
    
Schema Declaration
element profileDesc
{
   tei_settingDesc,
   tei_textClass?,
   tei_particDesc?,
   tei_langUsage?
}

Appendix A.1.75 <projectDesc>

<projectDesc> (project description) describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected. [2.3.1. The Project Description 2.3. The Encoding Description 15.3.2. Declarable Elements]
Moduleheader — Formal specification
Contained by
header: encodingDesc
May contain
core: p
Example
<projectDesc>  <p xml:lang="sl">Glavni cilji projekta <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> so    (1) izdelati večjezično množico na enak način kodiranih korpusov    zapiskov parlamentarnih sej, ...</p>  <p xml:lang="en">The <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>    project aims to (1) create a multilingual set of uniformly encoded    comparable corpora of parliamentary proceedings, ...</p> </projectDesc>
Content model
<content>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element projectDesc { tei_p+ }

Appendix A.1.76 <pubPlace>

<pubPlace> (publication place) contains the name of the place where a bibliographic item was published. [3.12.2.4. Imprint, Size of a Document, and Reprint Information]
Modulecore — Formal specification
Contained by
May contain
core: ref
character data
Example
<pubPlace>  <ref target="https://github.com/clarin-eric/ParlaMint">https://github.com/clarin-eric/ParlaMint</ref> </pubPlace>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="ref"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element pubPlace { tei_ref | text }

Appendix A.1.77 <publicationStmt>

<publicationStmt> (publication statement) groups information concerning the publication or distribution of an electronic or other text. [2.2.4. Publication, Distribution, Licensing, etc. 2.2. The File Description]
Moduleheader — Formal specification
Contained by
header: fileDesc
May contain
Note

Where a publication statement contains several members of the model.publicationStmtPart.agency or model.publicationStmtPart.detail classes rather than one or more paragraphs or anonymous blocks, care should be taken to ensure that the repeated elements are presented in a meaningful order. It is a conformance requirement that elements supplying information about publication place, address, identifier, availability, and date be given following the name of the publisher, distributor, or authority concerned, and preferably in that order.

Example
<publicationStmt>  <publisher>   <orgName xml:lang="sl">Raziskovalna infrastrukutra CLARIN</orgName>   <orgName xml:lang="en">CLARIN research infrastructure</orgName>   <ref target="https://www.clarin.eu/">www.clarin.eu</ref>  </publisher>  <idno type="URIsubtype="handle">http://hdl.handle.net/11356/1432</idno>  <availability status="free">   <licence>http://creativecommons.org/licenses/by/4.0/</licence>   <p xml:lang="sl">To delo je ponujeno pod <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Priznanje avtorstva 4.0 mednarodna licenca</ref>.</p>   <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>  </availability>  <date when="2021-06-11">11. 6. 2021</date> </publicationStmt>
Content model
<content>
 <elementRef key="publisher"/>
 <elementRef key="idno"/>
 <elementRef key="pubPlace" minOccurs="0"
  maxOccurs="1"/>
 <elementRef key="availability"/>
 <elementRef key="date"/>
</content>
    
Schema Declaration
element publicationStmt
{
   tei_publisher,
   tei_idno,
   tei_pubPlace?,
   tei_availability,
   tei_date
}

Appendix A.1.78 <publisher>

<publisher> (publisher) provides the name of the organisation responsible for the publication or distribution of a bibliographic item. [3.12.2.4. Imprint, Size of a Document, and Reprint Information 2.2.4. Publication, Distribution, Licensing, etc.]
Modulecore — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Contained by
core: bibl
May contain
core: ref
namesdates: orgName
character data
Note

Use the full form of the name by which a company is usually referred to, rather than any abbreviation of it which may appear on a title page

Example
<publisher>  <orgName>CLARIN research infrastructure</orgName>  <ref target="https://www.clarin.eu/">www.clarin.eu</ref> </publisher>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <sequence minOccurs="1" maxOccurs="1">
   <elementRef key="orgName" minOccurs="1"
    maxOccurs="unbounded"/>
   <elementRef key="ref" minOccurs="0"
    maxOccurs="1"/>
  </sequence>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element publisher
{
   tei_att.global.attribute.xmllang,
   ( ( tei_orgName+, tei_ref? ) | text )
}

Appendix A.1.79 <quotation>

<quotation> (quotation) specifies editorial practice adopted with respect to quotation marks in the original. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements]
Moduleheader — Formal specification
Contained by
May contain
core: p
Example
<editorialDecl> ... <quotation>   <p xml:lang="en">Quotation marks have been left in the text and are not explicitly marked up.</p>  </quotation> </editorialDecl>
Schematron
<sch:report test="not(@marks) and not (tei:p)">On <sch:name/>, either the @marks attribute should be used, or a paragraph of description provided</sch:report>
Content model
<content>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element quotation { tei_p+ }

Appendix A.1.80 <recording>

<recording> (recording event) provides details of an audio or video recording event used as the source of a spoken text, either directly or from a public broadcast. [8.2. Documenting the Source of Transcribed Speech 15.3.2. Declarable Elements]
Modulespoken — Formal specification
Attributes
typethe kind of recording.
Derived fromatt.typed
StatusOptional
Datatypeteidata.enumerated
Legal values are:
audio
audio recording[Default]
video
audio and video recording
Contained by
May contain
core: media
Note

The dur attribute is used to indicate the original duration of the recording.

Example
<recording type="audio">  <media xml:id="ps2013-044-02-000-000.audio1"   mimeType="audio/mp3"   source="https://www.psp.cz/eknih/2013ps/audio/2016/04/13/2016041308580912.mp3"   url="2013ps/audio/2016/04/13/2016041308580912.mp3"/> </recording>
Content model
<content>
 <elementRef key="media" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element recording { attribute type { "audio" | "video" }?, tei_media+ }

Appendix A.1.81 <recordingStmt>

<recordingStmt> (recording statement) describes a set of recordings used as the basis for transcription of a spoken text. [8.2. Documenting the Source of Transcribed Speech 2.2.7. The Source Description]
Modulespoken — Formal specification
Contained by
header: sourceDesc
May contain
spoken: recording
Example
<recordingStmt>  <recording type="audio">   <media xml:id="ps2017-020-09-004-010.audio1"    mimeType="audio/mp3"    source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318081822.mp3"    url="2017ps/audio/2018/11/13/2018111318081822.mp3"/>   <media xml:id="ps2017-020-09-004-010.audio2"    mimeType="audio/mp3"    source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318181832.mp3"    url="2017ps/audio/2018/11/13/2018111318181832.mp3"/>   <media xml:id="ps2017-020-09-004-010.audio3"    mimeType="audio/mp3"    source="https://www.psp.cz/eknih/2017ps/audio/2018/11/13/2018111318281842.mp3"    url="2017ps/audio/2018/11/13/2018111318281842.mp3"/>    ...  </recording> </recordingStmt>
Content model
<content>
 <elementRef key="recording" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element recordingStmt { tei_recording+ }

Appendix A.1.82 <ref>

<ref> (reference) defines a reference to another location, possibly modified by additional text or comment. [3.7. Simple Links and Cross-References 16.1. Links]
Modulecore — Formal specification
Attributes
targetspecifies the destination of the reference by supplying one or more URI References
Derived fromatt.pointing
StatusRecommended
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Member of
Contained by
May containCharacter data only
Note

The target and cRef attributes are mutually exclusive.

Example
<projectDesc>  <p>   <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref> is a    project that aims to create a multilingual set of comparable corpora of    parliamentary proceedings uniformly encoded according to the <ref target="https://github.com/clarin-eric/parla-clarin">Parla-CLARIN      recommendations</ref> and ...</p> </projectDesc>
Schematron
<sch:report test="@target and @cRef">Only one of the attributes @target' and @cRef' may be supplied on <sch:name/> </sch:report>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element ref { attribute target { list { + } }?, text }

Appendix A.1.83 <relation>

<relation> (relationship) describes a relationship between two organisations. [13.3.2.3. Personal Relationships]
Modulenamesdates — Formal specification
Attributesatt.global.analytic (@ana) att.datable.w3c (notBefore, notAfter, @when, @from, @to)
name
StatusRequired
Legal values are:
coalition
opposition
renaming
successor
representing
activeidentifies the ‘active’ participants in a non-mutual relationship, or all the participants in a mutual one.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
mutualsupplies a list of participants amongst all of whom the relationship holds equally.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
passiveidentifies the ‘passive’ participants in a non-mutual relationship.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Contained by
namesdates: listRelation
May containEmpty element
Note

Only one of the attributes active and mutual may be supplied; the attribute passive may be supplied only if the attribute active is supplied. Not all of these constraints can be enforced in all schema languages.

ExampleSpecification of coalition and opposition political parties (or parliamentary groups) in a given time period and legislative period:
<relation name="coalition"  mutual="#MR #OpenVld #N-VA #CD_en_Vfrom="2014-10-11to="2018-12-09"  ana="#period_54"/> <relation name="opposition"  active="#Ecolo #cdH #DéFi #Vuye_Wouters #sp.a #PP #PS #PTB #FDFpassive="#government.BE"  from="2014-10-11to="2018-12-09ana="#period_54"/>
ExampleSpecification of parliamentary group representing political parties in the parliament:
<relation name="representing"  active="#parliamentaryGroup.CSSD.1107"  passive="#politicalParty.CSSD.153 #politicalParty.ENO.1from="2013-10-29to="2017-10-26"/>
Schematron
<sch:assert test="@ref or @key or @name">One of the attributes 'name', 'ref' or 'key' must be supplied</sch:assert>
Schematron
<sch:report test="@active and @mutual">Only one of the attributes @active and @mutual may be supplied</sch:report>
Schematron
<sch:report test="@passive and not(@active)">the attribute 'passive' may be supplied only if the attribute 'active' is supplied</sch:report>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element relation
{
   tei_att.global.analytic.attribute.ana,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   attribute name
   {
      "coalition" | "opposition" | "renaming" | "successor" | "representing"
   },
   ( attribute active { list { + } }? | attribute mutual { list { + } }? ),
   attribute passive { list { + } }?,
   empty
}

Appendix A.1.84 <resp>

<resp> (responsibility) contains a phrase describing the nature of a person's intellectual responsibility, or an organisation's role in the production or distribution of a work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement]
Modulecore — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Contained by
core: respStmt
May containCharacter data only
Note

The attribute ref, inherited from the class att.canonical may be used to indicate the kind of responsibility in a normalized form by referring directly to a standardized list of responsibility types, such as that maintained by a naming authority, for example the list maintained at http://www.loc.gov/marc/relators/relacode.html for bibliographic usage.

Example
<respStmt>  <persName>Andrej Pančur</persName>  <resp>Kodiranje TEI</resp>  <resp xml:lang="en">TEI corpus encoding</resp> </respStmt>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element resp { tei_att.global.attribute.xmllang, text }

Appendix A.1.85 <respStmt>

<respStmt> (statement of responsibility) supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply. May also be used to encode information about individuals or organisations which have played a role in the production or distribution of a bibliographic work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.2. The Edition Statement 2.2.5. The Series Statement]
Modulecore — Formal specification
Contained by
header: titleStmt
May contain
core: resp
namesdates: persName
Example
<respStmt>  <persName>Matthew Coole</persName>  <resp>Data retrieval, Parla-CLARIN TEI XML corpus encoding and linguistic annotation.</resp> </respStmt>
Example
<respStmt>  <persName ref="https://orcid.org/0000-0003-3063-2239">Tommaso Agnoloni</persName>  <persName ref="https://orcid.org/0000-0002-8126-6294">Francesca Frontini</persName>  <persName ref="https://orcid.org/0000-0002-2953-8619">Simonetta Montemagni</persName>  <persName ref="https://orcid.org/0000-0002-1321-5444">Valeria Quochi</persName>  <persName ref="https://orcid.org/0000-0001-5849-0979">Giulia Venturi</persName>  <resp xml:lang="it">Definizione del progetto e metodologia</resp>  <resp xml:lang="en">Project set-up and methodology</resp> </respStmt> <respStmt>  <persName>Manuela Ruisi</persName>  <persName>Carlo Marchetti</persName>  <persName>Roberto Battistoni</persName>  <resp xml:lang="it">Recupero dei dati</resp>  <resp xml:lang="en">Data retrieval</resp> </respStmt> <respStmt>  <persName>Tommaso Agnoloni</persName>  <resp xml:lang="it">Codifica corpus in ParlaMint TEI XML</resp>  <resp xml:lang="en">ParlaMint TEI XML corpus encoding</resp>  <resp xml:lang="it">Pulizia, normalizzazione e conversione in ParlaMint TEI XML</resp>  <resp xml:lang="en">Cleaning, normalisation and conversion to ParlaMint TEI XML</resp> </respStmt> ...
Content model
<content>
 <elementRef key="persName" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="resp" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element respStmt { tei_persName+, tei_resp+ }

Appendix A.1.86 <revisionDesc>

<revisionDesc> (revision description) summarizes the revision history for a file [2.6. The Revision Description 2.1.1. The TEI Header and Its Components]
Moduleheader — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Contained by
header: teiHeader
May contain
header: change
Note

If present on this element, the status attribute should indicate the current status of the document. The same attribute may appear on any <change> to record the status at the time of that change. Conventionally <change> elements should be given in reverse date order, with the most recent change at the start of the list.

Example
<revisionDesc>  <change when="2021-06-11">   <name>Tomaž Erjavec</name>: Finalized encoding.</change>  <change when="2021-05-28">   <name>Tomaž Erjavec</name>: Built corpus.</change> </revisionDesc>
Content model
<content>
 <elementRef key="change" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element revisionDesc { tei_att.global.attribute.xmllang, tei_change+ }

Appendix A.1.87 <roleName>

<roleName> (role name) contains a name component which indicates that the referent has a particular role or position in society, such as an official title or rank. [13.2.1. Personal Names]
Modulenamesdates — Formal specification
Attributes
xml:lang
StatusOptional
Datatypeteidata.language
Member of
Contained by
namesdates: affiliation persName
May containCharacter data only
Note

A <roleName> may be distinguished from an <addName> by virtue of the fact that, like a title, it typically exists independently of its holder.

Example
<persName>  <surname>Murgel</surname>  <forename>Jasna</forename>  <roleName>dr.</roleName> </persName>
Example
<affiliation role="ministerref="#GOV"  from="2020-08-01">  <roleName xml:lang="sl">Minister za obrambo</roleName>  <roleName xml:lang="en">Minister of Defence</roleName> </affiliation>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element roleName { attribute xml:lang { text }?, text }

Appendix A.1.88 <s>

<s> (s-unit) contains a sentence-like division of a text. [17.1. Linguistic Segment Categories 8.4.1. Segmentation]
Moduleanalysis — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp)
Member of
Contained by
linking: seg
May contain
analysis: pc phr w
linking: linkGrp
Note

The <s> element may be used to mark orthographic sentences, or any other segmentation of a text, provided that the segmentation is end-to-end, complete, and non-nesting. For segmentation which is partial or recursive, the <seg> should be used instead.

The type attribute may be used to indicate the type of segmentation intended, according to any convenient typology.

Example
<s xml:id="ParlaMint-GB_2017-10-30-lords.seg4.1">  <w lemma="I"   msd="UPosTag=PRON|Case=Nom|Number=Sing|Person=1|PronType=Prspos="PRP">I</w>  <w lemma="support"   msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Finpos="VBP">support</w>  <w lemma="the"   msd="UPosTag=DET|Definite=Def|PronType=Artpos="DT">the</w>  <w lemma="amendment"   msd="UPosTag=NOUN|Number=Singpos="NNjoin="right">amendment</w>  <pc msd="UPosTag=PUNCTpos=".">.</pc> </s>
Schematron
<sch:report test="tei:s">You may not nest one s element within another: use seg instead</sch:report>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <elementRef key="name"/>
  <elementRef key="phr"/>
  <elementRef key="num"/>
  <elementRef key="date"/>
  <elementRef key="time"/>
  <elementRef key="note"/>
  <elementRef key="vocal"/>
  <elementRef key="kinesic"/>
  <elementRef key="incident"/>
  <elementRef key="gap"/>
  <elementRef key="pb"/>
 </alternate>
 <elementRef key="linkGrp" minOccurs="0"
  maxOccurs="1"/>
</content>
    
Schema Declaration
element s
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   (
      tei_w
    | tei_pc
    | tei_name
    | tei_phr
    | tei_num
    | tei_date
    | tei_time
    | tei_note
    | tei_vocal
    | tei_kinesic
    | tei_incident
    | tei_gap
    | tei_pb
   )+,
   tei_linkGrp?
}

Appendix A.1.89 <seg>

<seg> (arbitrary segment) represents any segmentation of text below the ‘chunk’ level. [16.3. Blocks, Segments, and Anchors 6.2. Components of the Verse Line 7.2.5. Speech Contents]
Modulelinking — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, next, prev, @corresp)
Member of
Contained by
spoken: u
May contain
analysis: s
core: gap note pb
character data
Note

The <seg> element may be used at the encoder's discretion to mark any segments of the text of interest for processing. One use of the element is to mark text features for which no appropriate markup is otherwise defined. Another use is to provide an identifier for some segment which is to be pointed at by some other element—i.e. to provide a target, or a part of a target, for a <ptr> or other similar element.

Example
<u who="#DavidPriorana="#regular">  <seg>I ask that the draft Regulations laid before the House on 5 December be approved.</seg>  <seg>The relevant document is the 20th Report from the Legislation Committee.</seg> </u>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="note"/>
  <elementRef key="vocal"/>
  <elementRef key="kinesic"/>
  <elementRef key="incident"/>
  <elementRef key="gap"/>
  <elementRef key="pb"/>
  <alternate minOccurs="0"
   maxOccurs="unbounded">
   <textNode/>
   <elementRef key="s"/>
  </alternate>
 </alternate>
</content>
    
Schema Declaration
element seg
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   (
      tei_note
    | tei_vocal
    | tei_kinesic
    | tei_incident
    | tei_gap
    | tei_pb
    | ( text | tei_s )*
   )+
}

Appendix A.1.90 <segmentation>

<segmentation> (segmentation) describes the principles according to which the text has been segmented, for example into sentences, tone-units, graphemic strata, etc. [2.3.3. The Editorial Practices Declaration 15.3.2. Declarable Elements]
Moduleheader — Formal specification
Contained by
May contain
core: p
Example
<editorialDecl>  <segmentation>   <p xml:lang="en">The texts are segmented into utterances (speeches) and segments (corresponding to paragraphs in the source transcription).</p>  </segmentation> </editorialDecl>
Content model
<content>
 <elementRef key="p" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element segmentation { tei_p+ }

Appendix A.1.91 <setting>

<setting> describes one particular setting in which a language interaction takes place. [15.2.3. The Setting Description]
Modulecorpus — Formal specification
Contained by
corpus: settingDesc
May contain
core: date name
Note

If the who attribute is not supplied, the setting is assumed to be that of all participants in the language interaction.

Example
<setting>  <name type="place">Commons Chamber</name>  <name type="place">Westminster</name>  <name type="city">London</name>  <name type="countrykey="GB">U.K.</name>  <date when="2019-02-18">February 18th, 2019</date> </setting>
Content model
<content>
 <elementRef key="name" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="date"/>
</content>
    
Schema Declaration
element setting { tei_name+, tei_date }

Appendix A.1.92 <settingDesc>

<settingDesc> (setting description) describes the setting or settings within which a language interaction takes place, or other places otherwise referred to in a text, edition, or metadata. [15.2. Contextual Information 2.4. The Profile Description]
Modulecorpus — Formal specification
Contained by
header: profileDesc
May contain
corpus: setting
Note

May contain a prose description organized as paragraphs, or a series of <setting> elements. If used to record not settings of language interactions, but other places mentioned in the text, then <place> optionally grouped by <listPlace> inside <standOff> should be preferred.

Example
<settingDesc>  <setting>   <name type="address">Trg sv. Marka 6</name>   <name type="city">Zagreb</name>   <name type="countrykey="HR">Croatia</name>   <date from="2016-11-15to="2020-05-18">15.11.2016 - 18.5.2020</date>  </setting> </settingDesc>
Content model
<content>
 <elementRef key="setting" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element settingDesc { tei_setting+ }

Appendix A.1.93 <sex>

<sex> (sex) specifies the sex of a person. [13.3.2.1. Personal Characteristics]
Modulenamesdates — Formal specification
Attributes
value
StatusRequired
Legal values are:
M
F
U
O
N
Contained by
namesdates: person
May containEmpty element
Note

As with other culturally-constructed traits such as age and gender, the way in which this concept is described in different cultural contexts varies. The normalizing attributes are provided only as an optional means of simplifying that variety for purposes of interoperability or project-internal taxonomies for consistency, and should not be used where that is inappropriate or unhelpful. The content of the element may be used to describe the intended concept in more detail.

Example
<sex value="M"/>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element sex { attribute value { "M" | "F" | "U" | "O" | "N" }, empty }

Appendix A.1.94 <sourceDesc>

<sourceDesc> (source description) describes the source(s) from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence. [2.2.7. The Source Description]
Moduleheader — Formal specification
Contained by
header: fileDesc
May contain
core: bibl
ExampleThe source description <sourceDesc> of the corpus root encodes the original digital source of the ParlaMint corpus:
<sourceDesc>  <bibl>   <title type="mainxml:lang="sl">Zapisi sej Državnega zbora Republike Slovenije</title>   <title type="mainxml:lang="en">Minutes of the National Assembly of the Republic of Slovenia</title>   <idno type="URI">https://www.dz-rs.si</idno>   <date from="2014-08-01to="2020-07-16">1.8.2014 - 16.7.2020</date>  </bibl> </sourceDesc>
ExampleFor corpus components the source description is very similar to the one for the corpus root, except it reflects information of the exact meeting. Furthermore, if the audio or video of the meeting is available, this information can also be given:
<sourceDesc>  <bibl>   <title type="mainxml:lang="cs">Parlament České republiky, Poslanecká sněmovna</title>   <title type="mainxml:lang="en">Parliament of the Czech Republic, Chamber of Deputies</title>   <idno type="URI">https://www.psp.cz/eknih/2013ps/stenprot/044schuz/s044033.htm</idno>   <date when="2016-04-13">13.04.2016</date>  </bibl>  <recordingStmt>   <recording type="audio">    <media xml:id="ps2013-044-02-000-000.audio1"     mimeType="audio/mp3"     source="https://www.psp.cz/eknih/2013ps/audio/2016/04/13/2016041308580912.mp3"     url="2013ps/audio/2016/04/13/2016041308580912.mp3"/>   </recording>  </recordingStmt> </sourceDesc>
Content model
<content>
 <elementRef key="bibl" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="recordingStmt"
  minOccurs="0" maxOccurs="1"/>
</content>
    
Schema Declaration
element sourceDesc { tei_bibl+, tei_recordingStmt? }

Appendix A.1.95 <state>

<state> (state) defines additional metadata on the political orientation of a political party or parliamentary group, e.g. its political orientation. [13.3.1. Basic Principles 13.3.2.1. Personal Characteristics]
Modulenamesdates — Formal specification
Attributesatt.global (xml:id, xml:lang, xml:base, xml:space, @n) att.global.source (@source) att.datable.w3c (when, notBefore, notAfter, @from, @to)
ana(analysis) indicates one or more elements containing interpretations of the element on which the ana attribute appears.
Derived fromatt.global.analytic
StatusOptional
Datatypeteidata.pointer
type
StatusOptional
Legal values are:
politicalOrientation
encoder
Wikipedia
CHES
variable
value
Member of
Contained by
namesdates: org
May contain
core: label note
Note

Where there is confusion between <trait> and <state> the more general purpose element <state> should be used even for unchanging characteristics. If you wish to distinguish between characteristics that are generally perceived to be time-bound states and those assumed to be fixed traits, then <trait> is available for the more static of these. The <state> element encodes characteristics which are sometimes assumed to change, often at specific times or over a date range, whereas the <trait> elements are used to record characteristics, such as eye-colour, which are less subject to change. Traits are typically, but not necessarily, independent of the volition or action of the holder.

Example
<state type="politicalOrientation"  subtype="unknownana="#orientation.L"/>
Example
<state type="politicalOrientation"  subtype="Wikipedia"  source="https://en.wikipedia.org/wiki/Christian_Democratic_and_Flemishana="#orientation.CCR">  <note xml:lang="en">CD&amp;V, CDV, CVP (until 2001)</note> </state>
Content model
<content>
 <alternate minOccurs="1" maxOccurs="1">
  <elementRef key="label"/>
  <elementRef key="note" minOccurs="0"
   maxOccurs="unbounded"/>
 </alternate>
</content>
    
Schema Declaration
element state
{
   tei_att.global.attribute.n,
   tei_att.global.source.attribute.source,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   attribute ana { text }?,
   attribute type
   {
      "politicalOrientation"
    | "encoder"
    | "Wikipedia"
    | "CHES"
    | "variable"
    | "value"
   }?,
   ( tei_label | tei_note* )
}

Appendix A.1.96 <surname>

<surname> (surname) contains a family (inherited) name, as opposed to a given, baptismal, or nick name. [13.2.1. Personal Names]
Modulenamesdates — Formal specification
Member of
Contained by
namesdates: persName
May containCharacter data only
Example
<persName>  <surname>Accetto</surname>  <forename>Matej</forename> </persName>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element surname { text }

Appendix A.1.97 <tagUsage>

<tagUsage> (element usage) documents the usage of a specific element within a specified document. [2.3.4. The Tagging Declaration]
Moduleheader — Formal specification
Attributes
gi(generic identifier) specifies the name (generic identifier) of the element indicated by the tag, within the namespace indicated by the parent <namespace> element. All descendats of <text> element and <text> element counts have to be included.
StatusRequired
Datatypeteidata.name
occursspecifies the number of occurrences of this element within the text.
StatusRequired
Datatypeteidata.count
Contained by
header: namespace
May containEmpty element
Example
<tagsDecl>  <namespace name="http://www.tei-c.org/ns/1.0">   <tagUsage gi="textoccurs="414"/>   <tagUsage gi="bodyoccurs="414"/>   <tagUsage gi="divoccurs="414"/>   <tagUsage gi="headoccurs="826"/>   <tagUsage gi="uoccurs="75122"/>   <tagUsage gi="segoccurs="280971"/>   <tagUsage gi="noteoccurs="85525"/>   <tagUsage gi="gapoccurs="7897"/>   <tagUsage gi="vocaloccurs="1740"/>   <tagUsage gi="incidentoccurs="37"/>   <tagUsage gi="kinesicoccurs="560"/>   <tagUsage gi="descoccurs="10234"/>  </namespace> </tagsDecl>
Content model
<content>
 <empty/>
</content>
    
Schema Declaration
element tagUsage { attribute gi { text }, attribute occurs { text }, empty }

Appendix A.1.98 <tagsDecl>

<tagsDecl> (tagging declaration) provides detailed information about the tagging applied to a document. [2.3.4. The Tagging Declaration 2.3. The Encoding Description]
Moduleheader — Formal specification
Contained by
header: encodingDesc
May contain
header: namespace
ExampleThe tags declaration, <tagsDecl> of the corpus root gives the count of all the XML tags used in the data part (so, not in the TEI header) of the corpus (for the corpus root) or in an individual component of the corpus.
<encodingDesc> ... <tagsDecl>   <namespace name="http://www.tei-c.org/ns/1.0">    <tagUsage gi="textoccurs="414"/>    <tagUsage gi="bodyoccurs="414"/>    <tagUsage gi="divoccurs="414"/>      ...   </namespace>  </tagsDecl> </encodingDesc>
Content model
<content>
 <elementRef key="namespace"/>
</content>
    
Schema Declaration
element tagsDecl { tei_namespace }

Appendix A.1.99 <taxonomy>

<taxonomy> (taxonomy) defines a typology explicitly by a structured taxonomy. [2.3.7. The Classification Declaration]
Moduleheader — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
xml:id(identifier) provides a unique identifier for the element bearing the attribute.
Derived fromatt.global
StatusRequired
DatatypeID
Contained by
header: classDecl
May contain
core: desc
header: category
Note

Nested taxonomies are common in many fields, so the <taxonomy> element can be nested.

Example
<taxonomy xml:id="subcorpus">  <desc xml:lang="sl">   <term>Podkorpusi</term>  </desc>  <desc xml:lang="en">   <term>Subcorpora</term>  </desc>  <category xml:id="reference">   <catDesc xml:lang="sl">    <term>Referenca</term>: referenčni podkorpus, do 2020-01-30</catDesc>   <catDesc xml:lang="en">    <term>Reference</term>: reference subcorpus, until 2020-01-30</catDesc>  </category>  <category xml:id="covid">   <catDesc xml:lang="sl">    <term>COVID</term>: COVID podkorpus, od 2020-01-31 dalje</catDesc>   <catDesc xml:lang="en">    <term>COVID</term>: COVID subcorpus, from 2020-01-31 onwards</catDesc>  </category> </taxonomy>
Example
<taxonomy xml:id="parla.legislature">  <desc xml:lang="it">   <term>Legislatura</term>  </desc>  <desc xml:lang="en">   <term>Legislature</term>  </desc>  <category xml:id="parla.geo-political">   <catDesc xml:lang="it">    <term>Unità geo-politica o amministrativa</term>   </catDesc>   <catDesc xml:lang="en">    <term>Geo-political or administrative units</term>   </catDesc>   <category xml:id="parla.supranational">    <catDesc xml:lang="it">     <term>Legislatura sovranazionale</term>    </catDesc>    <catDesc xml:lang="en">     <term>Supranational legislature</term>    </catDesc>   </category>   <category xml:id="parla.national">    <catDesc xml:lang="it">     <term>Legislatura nazionale</term>    </catDesc>    <catDesc xml:lang="en">     <term>National legislature</term>    </catDesc>   </category>    ...  </category> </taxonomy> ... <org ana="#parla.national #parla.upper"  role="parliamentxml:id="LEG">  <orgName full="yesxml:lang="it">Senato della Repubblica Italiana</orgName>  <orgName full="yesxml:lang="it">Senate of the Republic of Italy</orgName> </org>
Content model
<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="category" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element taxonomy
{
   tei_att.global.attribute.xmllang,
   attribute xml:id { text },
   tei_desc+,
   tei_category+
}

Appendix A.1.100 <teiCorpus>

<teiCorpus> (TEI corpus) contains one whole corpus, stored in the corpus root file comprising the corpus header and XInclude references to corpus component files, each containing a <TEI> element. [4. Default Text Structure 15.1. Varieties of Composite Text]
Modulecore — Formal specification
Attributesatt.global.linking (synch, next, prev, @corresp)
xml:id
StatusRequired
DatatypeID
xml:lang
StatusRequired
Datatypeteidata.language
Contained by
May contain
derived-module-parlamint: include
header: teiHeader
textstructure: TEI
Note

Should contain one TEI header for the corpus, and a series of <TEI> elements, one for each text.

ExampleGeneral structure of a ParlaMint corpus root:
<teiCorpus xml:lang="en"  xml:id="ParlaMint-GB" xmlns="http://www.tei-c.org/ns/1.0">  <teiHeader> ...TEI header of the corpus...  </teiHeader> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2015/ParlaMint-GB_2015-01-05-commons.xml"/> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="2015/ParlaMint-GB_2015-01-06-commons.xml"/> ... </teiCorpus>
Content model
<content>
 <elementRef key="teiHeader"/>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="TEI"/>
  <elementRef key="include"/>
 </alternate>
</content>
    
Schema Declaration
element teiCorpus
{
   tei_att.global.linking.attribute.corresp,
   attribute xml:id { text },
   attribute xml:lang { text },
   tei_teiHeader,
   ( tei_TEI | tei_include )+
}

Appendix A.1.101 <teiHeader>

<teiHeader> (TEI header) supplies descriptive and declarative metadata associated with a digital resource or set of resources. [2.1.1. The TEI Header and Its Components 15.1. Varieties of Composite Text]
Moduleheader — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
Contained by
core: teiCorpus
textstructure: TEI
May contain
Note

One of the few elements unconditionally required in any TEI document.

ExampleBasic structure of the <teiHeader>:
<teiHeader>  <fileDesc>...</fileDesc>  <encodingDesc>...</encodingDesc>  <profileDesc>...</profileDesc>  <revisionDesc>...</revisionDesc> </teiHeader>
ExampleExample of a ParlaMint corpus component <teiHeader>:
<teiHeader>  <fileDesc>   <titleStmt>    <title type="mainxml:lang="lv">Latvijas parlamenta corpus ParlaMint-LV, 12. Saeima, 2014-11-04 [ParlaMint]</title>    <title type="mainxml:lang="en">Latvian parliamentary corpus ParlaMint-LV, 12th Term, 2014-11-04 [ParlaMint]</title>    <meeting corresp="#PT"     ana="#parla.meeting.regular">Regulārā</meeting>    <meeting n="13corresp="#PT"     ana="#parla.term #PT.13">13. sasaukums</meeting>   </titleStmt>   <editionStmt>    <edition>2.1</edition>   </editionStmt>   <extent>    <measure unit="speechesquantity="257"     xml:lang="en">257 speeches</measure>    <measure unit="wordsquantity="11847"     xml:lang="en">11,847 words</measure>    <measure unit="tokensquantity="14628"     xml:lang="en">14628 tokens</measure>   </extent>   <publicationStmt>    <publisher>     <orgName xml:lang="en">CLARIN research infrastructure</orgName>     <ref target="https://www.clarin.eu/">www.clarin.eu</ref>    </publisher>    <idno subtype="handletype="URI">http://hdl.handle.net/11356/1432</idno>    <availability status="free">     <licence>http://creativecommons.org/licenses/by/4.0/</licence>     <p xml:lang="en">This work is licensed under the <ref target="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ref>.</p>    </availability>    <date when="2021-06-10">June 10, 2021</date>   </publicationStmt>   <sourceDesc>    <bibl>     <title type="mainxml:lang="lv">Saeimas sēžu stenogrammas</title>     <idno type="URI">https://www.saeima.lv/lv/transcripts/view/264</idno>    </bibl>   </sourceDesc>  </fileDesc>  <encodingDesc>   <projectDesc>    <p xml:lang="en">     <ref target="https://www.clarin.eu/content/parlamint">ParlaMint</ref>    </p>   </projectDesc>   <tagsDecl>    <namespace name="http://www.tei-c.org/ns/1.0">     <tagUsage gi="textoccurs="1"/>     <tagUsage gi="bodyoccurs="1"/>     <tagUsage gi="divoccurs="1"/>     <tagUsage gi="headoccurs="2"/>     <tagUsage gi="noteoccurs="257"/>     <tagUsage gi="uoccurs="257"/>     <tagUsage gi="segoccurs="647"/>    </namespace>   </tagsDecl>  </encodingDesc>  <profileDesc>   <settingDesc>    <setting>     <name type="city">Rīga</name>     <name type="countrykey="LV">Latvija</name>     <date when="2014-11-04"      ana="#parla.sitting">2014-11-04</date>    </setting>   </settingDesc>  </profileDesc> </teiHeader>
Content model
<content>
 <elementRef key="fileDesc"/>
 <elementRef key="encodingDesc"/>
 <elementRef key="profileDesc"/>
 <elementRef key="revisionDesc"
  minOccurs="0" maxOccurs="1"/>
</content>
    
Schema Declaration
element teiHeader
{
   tei_att.global.attribute.xmllang,
   tei_fileDesc,
   tei_encodingDesc,
   tei_profileDesc,
   tei_revisionDesc?
}

Appendix A.1.102 <term>

<term> (term) contains a single-word, multi-word, or symbolic designation which is regarded as a technical term. [3.4.1. Terms and Glosses]
Modulecore — Formal specification
Member of
Contained by
core: desc
header: catDesc
namesdates: persName
May containCharacter data only
Note

When this element appears within an <index> element, it is understood to supply the form under which an index entry is to be made for that location. Elsewhere, it is understood simply to indicate that its content is to be regarded as a technical or specialised term. It may be associated with a <gloss> element by means of its ref attribute; alternatively a <gloss> element may point to a <term> element by means of its target attribute.

In formal terminological work, there is frequently discussion over whether terms must be atomic or may include multi-word lexical items, symbolic designations, or phraseological units. The <term> element may be used to mark any of these. No position is taken on the philosophical issue of what a term can be; the looser definition simply allows the <term> element to be used by practitioners of any persuasion.

As with other members of the att.canonical class, instances of this element occuring in a text may be associated with a canonical definition, either by means of a URI (using the ref attribute), or by means of some system-specific code value (using the key attribute). Because the mutually exclusive target and cRef attributes overlap with the function of the ref attribute, they are deprecated and may be removed at a subsequent release.

Example<term> is used inside taxonomies to name the taxonomy and its categories:
<taxonomy xml:id="subcorpus">  <desc xml:lang="sl">   <term>Podkorpusi</term>  </desc>  <desc xml:lang="en">   <term>Subcorpora</term>  </desc>  <category xml:id="reference">   <catDesc xml:lang="sl">    <term>Referenca</term>: referenčni podkorpus, do 2020-10-30</catDesc>   <catDesc xml:lang="en">    <term>Reference</term>: reference subcorpus, until 2020-01-30</catDesc>  </category> ... </taxonomy>
Example
<catDesc xml:lang="en">  <term>acl</term>: Clausal modifier of noun (adjectival clause) </catDesc> <catDesc xml:lang="en">  <term>dep</term>: Unspecified dependency </catDesc> <catDesc xml:lang="en">  <term>punct</term>: Punctuation </catDesc>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element term { text }

Appendix A.1.103 <text>

<text> (text) contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample. [4. Default Text Structure 15.1. Varieties of Composite Text]
Moduletextstructure — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.analytic (@ana) att.global.source (@source)
Contained by
textstructure: TEI
May contain
textstructure: body
Note

This element should not be used to represent a text which is inserted at an arbitrary point within the structure of another, for example as in an embedded or quoted narrative; the <floatingText> is provided for this purpose.

Example
<text ana="#reference">  <body>   <div type="debateSection">...</div>   <div type="debateSection">...</div>    ...  </body> </text>
Content model
<content>
 <elementRef key="body"/>
</content>
    
Schema Declaration
element text
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.global.source.attribute.source,
   tei_body
}

Appendix A.1.104 <textClass>

<textClass> (text classification) groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc. [2.4.3. The Text Classification]
Moduleheader — Formal specification
Contained by
header: profileDesc
May contain
header: catRef
Example
<textClass>  <catRef scheme="#parla.legislature"   target="#parla.bi #parla.lower #parla.upper"/> </textClass>
Content model
<content>
 <elementRef key="catRef"/>
</content>
    
Schema Declaration
element textClass { tei_catRef }

Appendix A.1.105 <time>

<time> (time) contains a phrase defining a time of day in any format. [3.6.4. Dates and Times]
Modulecore — Formal specification
Attributesatt.typed (@type, @subtype) att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana) att.datable.w3c (notBefore, notAfter, @when, @from, @to)
Member of
Contained by
analysis: s
core: note unit
May contain
analysis: pc w
character data
ExampleA note giving the time when e.g. the session started:
<note type="time">  <time when="2016-04-13T09:10:00">(9.10 hodin)</time> </note>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <textNode/>
 </alternate>
</content>
    
Schema Declaration
element time
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.datable.w3c.attribute.when,
   tei_att.datable.w3c.attribute.from,
   tei_att.datable.w3c.attribute.to,
   tei_att.typed.attributes,
   ( tei_w | tei_pc | text )+
}

Appendix A.1.106 <title>

<title> (title) contains a title for any kind of work. [3.12.2.2. Titles, Authors, and Editors 2.2.1. The Title Statement 2.2.5. The Series Statement]
Modulecore — Formal specification
Attributesatt.global (xml:id, n, xml:base, xml:space, @xml:lang)
type
StatusRecommended
Legal values are:
main
sub
Note

Attribute is required in <titleStmt> context.

Member of
Contained by
core: bibl
header: titleStmt
May containCharacter data only
Note

The attributes key and ref, inherited from the class att.canonical may be used to indicate the canonical form for the title; the former, by supplying (for example) the identifier of a record in some external library system; the latter by pointing to an XML element somewhere containing the canonical form of the title.

ExampleThe <title> element as used in the <titleStmt> of the corpus root <teiHeader>:
<title type="mainxml:lang="cs">Český parlamentní korpus ParlaMint-CZ [ParlaMint]</title> <title type="mainxml:lang="en">Czech parliamentary corpus ParlaMint-CZ [ParlaMint]</title> <title type="subxml:lang="cs">Parlament České republiky, Poslanecká sněmovna</title> <title type="subxml:lang="en">Parliament of the Czech Republic, Chamber of Deputies</title>
ExampleThe <title> element as used in the <titleStmt> of the corpus component <teiHeader>:
<title type="mainxml:lang="cs">Český parlamentní korpus ParlaMint-CZ, 2013-11-25 ps2013-001-01-000-000 [ParlaMint]</title> <title type="mainxml:lang="en">Czech parliamentary corpus ParlaMint-CZ, 2013-11-25 ps2013-001-01-000-000 [ParlaMint]</title> <title type="subxml:lang="cs">Parlament České republiky, Poslanecká sněmovna, 2013-11-25, Začátek schůze Poslanecké sněmovny 25. listopadu 2013 ve 14.05 hodin Přítomno: 199 poslanců</title> <title type="subxml:lang="en">Parliament of the Czech Republic, Chamber of Deputies, 2013-11-25</title>
Content model
<content>
 <textNode/>
</content>
    
Schema Declaration
element title
{
   tei_att.global.attribute.xmllang,
   attribute type { "main" | "sub" }?,
   text
}

Appendix A.1.107 <titleStmt>

<titleStmt> (title statement) groups information about the title of a work and those responsible for its content. [2.2.1. The Title Statement 2.2. The File Description]
Moduleheader — Formal specification
Contained by
header: fileDesc
May contain
ExampleThe <titleStmt> element gives the title of the corpus root or component, along with the specification of the particular session(s) of the parliament contained, the persons responsible for compiling the corpus and the funder(s) of the project:
<titleStmt>  <title type="main">Slovenski parlamentarni korpus ParlaMint-SI [ParlaMint]</title>  <title type="mainxml:lang="en">Slovenian parliamentary corpus ParlaMint-SI [ParlaMint]</title>  <title type="sub">Zapisi sej Državnega zbora Republike Slovenije, 7. in 8. mandat (2014 - 2020)</title>  <title type="subxml:lang="en">Minutes of the National Assembly of the Republic of Slovenia, Term 7 and 8 (2014 - 2020)</title>  <meeting n="7corresp="#DZ"   ana="#parla.lower #parla.term #DZ.7">7. mandat</meeting>  <meeting n="8corresp="#DZ"   ana="#parla.lower #parla.term #DZ.8">8. mandat</meeting>  <respStmt>   <persName ref="https://orcid.org/0000-0001-6143-6877">Andrej Pančur</persName>   <persName ref="https://orcid.org/0000-0002-1560-4099">Tomaž Erjavec</persName>   <resp>Kodiranje ParlaMint TEI XML</resp>   <resp xml:lang="en">ParlaMint TEI XML corpus encoding</resp>  </respStmt>  <funder>   <orgName>Raziskovalna infrastruktura CLARIN</orgName>   <orgName xml:lang="en">The CLARIN research infrastructure</orgName>  </funder>  <funder>   <orgName>Slovenska raziskovalna infrastruktura CLARIN.SI</orgName>   <orgName xml:lang="en">The Slovenian research infrastructure CLARIN.SI</orgName>  </funder> </titleStmt>
Content model
<content>
 <elementRef key="title" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="meeting" minOccurs="1"
  maxOccurs="unbounded"/>
 <elementRef key="respStmt" minOccurs="0"
  maxOccurs="unbounded"/>
 <elementRef key="funder" minOccurs="0"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element titleStmt { tei_title+, tei_meeting+, tei_respStmt*, tei_funder* }

Appendix A.1.108 <u>

<u> (utterance) contains a stretch of speech usually preceded and followed by silence or by a change of speaker. [8.3.1. Utterances]
Modulespoken — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.linking (synch, @corresp, @next, @prev) att.global.analytic (@ana) att.global.source (@source) att.ascribed (@who)
Member of
Contained by
textstructure: div
May contain
core: gap note pb
linking: seg
Note

Prose and a mixture of speech elements

Although individual transcriptions may consistently use <u> elements for turns or other units, and although in most cases a <u> will be delimited by pause or change of speaker, <u> is not required to represent a turn or any communicative event, nor to be bounded by pauses or change of speaker. At a minimum, a <u> is some phonetic production by a given speaker.

ExampleThe element <u> marks up a speech, as illustrated below:
<u who="#DavidPriorana="#regular">  <seg>I ask that the draft Regulations laid before the House on 5 December be approved.</seg>  <seg>The relevant document is the 20th Report from the Legislation Committee.</seg> </u>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="note"/>
  <elementRef key="vocal"/>
  <elementRef key="kinesic"/>
  <elementRef key="incident"/>
  <elementRef key="gap"/>
  <elementRef key="pb"/>
  <elementRef key="seg"/>
 </alternate>
</content>
    
Schema Declaration
element u
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.linking.attribute.corresp,
   tei_att.global.linking.attribute.next,
   tei_att.global.linking.attribute.prev,
   tei_att.global.analytic.attribute.ana,
   tei_att.global.source.attribute.source,
   tei_att.ascribed.attribute.who,
   (
      tei_note
    | tei_vocal
    | tei_kinesic
    | tei_incident
    | tei_gap
    | tei_pb
    | tei_seg
   )+
}

Appendix A.1.109 <unit>

<unit> contains a symbol, a word or a phrase referring to a unit of measurement in any kind of formal or informal system. [3.6.3. Numbers and Measures]
Modulecore — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.global.analytic (@ana)
Member of
Contained by
core: unit
May contain
ExampleThe element can be used for fine-grained Named Entities which include units:
<num ana="ne:nc"  xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.ne53">  <w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w9"   lemma="3"   msd="UPosTag=NUM|NumForm=Digit|NumType=Card">3</w>  <w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w10"   lemma="miliarda"   msd="UPosTag=NOUN|Case=Gen|Gender=Fem|Number=Sing|Polarity=Pos">miliardy</w> </num> <unit ana="ne:om"  xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.ne54">  <w xml:id="ParlaMint-CZ_2013-12-06-ps2013-003-01-001-001.u2.p10.s1.w11"   lemma=""   msd="UPosTag=NOUN|Gender=Fem|Polarity=Posjoin="right"></w> </unit>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <elementRef key="w"/>
  <elementRef key="pc"/>
  <elementRef key="name"/>
  <elementRef key="date"/>
  <elementRef key="time"/>
  <elementRef key="num"/>
  <elementRef key="unit"/>
  <elementRef key="email"/>
  <elementRef key="ref"/>
  <elementRef key="note"/>
  <elementRef key="gap"/>
  <elementRef key="kinesic"/>
  <elementRef key="incident"/>
  <elementRef key="vocal"/>
 </alternate>
</content>
    
Schema Declaration
element unit
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   (
      tei_w
    | tei_pc
    | tei_name
    | tei_date
    | tei_time
    | tei_num
    | tei_unit
    | tei_email
    | tei_ref
    | tei_note
    | tei_gap
    | tei_kinesic
    | tei_incident
    | tei_vocal
   )+
}

Appendix A.1.110 <vocal>

<vocal> (vocal) marks any vocalized but not necessarily lexical phenomenon, for example voiced pauses, non-lexical backchannels, etc. [8.3.3. Vocal, Kinesic, Incident]
Modulespoken — Formal specification
Attributesatt.global (xml:base, xml:space, @xml:id, @n, @xml:lang) att.ascribed (@who) att.typed (type, @subtype)
type
StatusRecommended
Legal values are:
greeting
question
clarification
speaking
interruption
exclamat
laughter
shouting
murmuring
noise
signal
Member of
Contained by
analysis: s
core: unit
linking: seg
spoken: u
textstructure: div
May contain
core: desc
Example
<vocal type="interruption">  <desc>Interruption from the chair: Your time is up.</desc> </vocal>
Content model
<content>
 <elementRef key="desc" minOccurs="1"
  maxOccurs="unbounded"/>
</content>
    
Schema Declaration
element vocal
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.n,
   tei_att.global.attribute.xmllang,
   tei_att.ascribed.attribute.who,
   tei_att.typed.attribute.subtype,
   attribute type
   {
      "greeting"
    | "question"
    | "clarification"
    | "speaking"
    | "interruption"
    | "exclamat"
    | "laughter"
    | "shouting"
    | "murmuring"
    | "noise"
    | "signal"
   }?,
   tei_desc+
}

Appendix A.1.111 <w>

<w> (word) represents a grammatical (not necessarily orthographic) word. [17.1. Linguistic Segment Categories 17.4.2. Lightweight Linguistic Annotation]
Moduleanalysis — Formal specification
Attributesatt.linguistic (@lemma, @pos, @msd, @join) (att.lexicographic.normalized (@norm)) att.global (n, xml:base, xml:space, @xml:id, @xml:lang) att.global.analytic (@ana)
Member of
Contained by
analysis: phr s w
May contain
analysis: w
character data
Example
<s xml:id="ParlaMint-GB_2017-10-30-lords.seg4.1">  <w lemma="I"   msd="UPosTag=PRON|Case=Nom|Number=Sing|Person=1|PronType=Prspos="PRP">I</w>  <w lemma="support"   msd="UPosTag=VERB|Mood=Ind|Tense=Pres|VerbForm=Finpos="VBP">support</w>  <w lemma="the"   msd="UPosTag=DET|Definite=Def|PronType=Artpos="DT">the</w>  <w lemma="amendment"   msd="UPosTag=NOUN|Number=Singpos="NNjoin="right">amendment</w>  <pc msd="UPosTag=PUNCTpos=".">.</pc> </s>
ExampleCertain frameworks, in particular the Universal Dependencies, allow for tokens to be decomposed into several words, and it is these syntactic words, and not tokens, that are further annotated. For example, Czech has the word ‘abyste’ which is in UD decomposed into two syntactic words, ‘aby’ and ‘byste’, which can be encoded in the <w> element:
<w>abyste <w norm="abylemma="aby"   msd="UPosTag=SCONJ"/>  <w norm="bystelemma="být"   msd="UPosTag=AUX|Mood=Cnd|Number=Plur|Person=2|VerbForm=Fin"/> </w>
Content model
<content>
 <alternate minOccurs="1"
  maxOccurs="unbounded">
  <textNode/>
  <elementRef key="w"/>
 </alternate>
</content>
    
Schema Declaration
element w
{
   tei_att.global.attribute.xmlid,
   tei_att.global.attribute.xmllang,
   tei_att.global.analytic.attribute.ana,
   tei_att.linguistic.attributes,
   ( text | tei_w )+
}

Appendix A.2 Model classes

Appendix A.2.1 model.addressLike

model.addressLike groups elements used to represent a postal or email address. [1. The TEI Infrastructure]
Moduletei — Formal specification
Used by
Membersaffiliation email

Appendix A.2.2 model.attributable

model.attributable groups elements that contain a word or phrase that can be attributed to a source. [3.3.3. Quotation 4.3.2. Floating Texts]
Moduletei — Formal specification
Used by
Membersmodel.quoteLike

Appendix A.2.3 model.biblLike

model.biblLike groups elements containing a bibliographic description. [3.12. Bibliographic Citations and References]
Moduletei — Formal specification
Used by
Membersbibl

Appendix A.2.4 model.dateLike

model.dateLike groups elements containing temporal expressions. [3.6.4. Dates and Times 13.4. Dates]
Moduletei — Formal specification
Used by
Membersdate time

Appendix A.2.5 model.divPart

model.divPart groups paragraph-level elements appearing directly within divisions. [1.3. The TEI Class System]
Moduletei — Formal specification
Used by
Membersmodel.divPart.spoken[u] model.lLike model.pLike[p]
Note

Note that this element class does not include members of the model.inter class, which can appear either within or between paragraph-level items.

Appendix A.2.6 model.divPart.spoken

model.divPart.spoken groups elements structurally analogous to paragraphs within spoken texts. [8.1. General Considerations and Overview]
Modulespoken — Formal specification
Used by
Membersu
Note

Spoken texts may be structured in many ways; elements in this class are typically larger units such as turns or utterances.

Appendix A.2.7 model.emphLike

model.emphLike groups phrase-level elements which are typographically distinct and to which a specific function can be attributed. [3.3. Highlighting and Quotation]
Moduletei — Formal specification
Used by
Membersterm title

Appendix A.2.8 model.global

Appendix A.2.9 model.global.edit

model.global.edit groups globally available elements which perform a specifically editorial function. [1.3. The TEI Class System]
Moduletei — Formal specification
Used by
Membersgap

Appendix A.2.10 model.global.meta

model.global.meta groups globally available elements which describe the status of other elements. [1.3. The TEI Class System]
Moduletei — Formal specification
Used by
Memberslink linkGrp
Note

Elements in this class are typically used to hold groups of links or of abstract interpretations, or by provide indications of certainty etc. It may find be convenient to localize all metadata elements, for example to contain them within the same divison as the elements that they relate to; or to locate them all to a division of their own. They may however appear at any point in a TEI text.

Appendix A.2.11 model.global.spoken

model.global.spoken groups elements which may appear globally within spoken texts. [8.1. General Considerations and Overview]
Modulespoken — Formal specification
Used by
Membersincident kinesic vocal
Note

This class groups elements which can appear anywhere within transcribed speech.

Appendix A.2.12 model.graphicLike

model.graphicLike groups elements containing images, formulae, and similar objects. [3.10. Graphics and Other Non-textual Components]
Moduletei — Formal specification
Used by
Membersgraphic media

Appendix A.2.13 model.highlighted

model.highlighted groups phrase-level elements which are typographically distinct. [3.3. Highlighting and Quotation]
Moduletei — Formal specification
Used by
Membersmodel.emphLike[term title] model.hiLike

Appendix A.2.14 model.inter

model.inter groups elements which can appear either within or between paragraph-like elements. [1.3. The TEI Class System]
Moduletei — Formal specification
Used by
Membersmodel.attributable[model.quoteLike] model.biblLike[bibl] model.egLike model.labelLike[desc label] model.listLike[listEvent listOrg listPerson listRelation] model.oddDecl model.stageLike

Appendix A.2.15 model.labelLike

model.labelLike groups elements used to gloss or explain other parts of a document.
Moduletei — Formal specification
Used by
Membersdesc label

Appendix A.2.16 model.limitedPhrase

model.limitedPhrase groups phrase-level elements excluding those elements primarily intended for transcription of existing sources. [1.3. The TEI Class System]
Moduletei — Formal specification
Used by
Membersmodel.emphLike[term title] model.hiLike model.pPart.data[model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno]] model.pPart.editorial model.pPart.msdesc model.phrase.xml model.ptrLike[ref]

Appendix A.2.17 model.listLike

model.listLike groups list-like elements. [3.8. Lists]
Moduletei — Formal specification
Used by
MemberslistEvent listOrg listPerson listRelation

Appendix A.2.18 model.measureLike

model.measureLike groups elements which denote a number, a quantity, a measurement, or similar piece of text that conveys some numerical meaning. [3.6.3. Numbers and Measures]
Moduletei — Formal specification
Used by
Membersmeasure num unit

Appendix A.2.19 model.milestoneLike

model.milestoneLike groups milestone-style elements used to represent reference systems. [1.3. The TEI Class System 3.11.3. Milestone Elements]
Moduletei — Formal specification
Used by
Memberspb

Appendix A.2.20 model.nameLike

model.nameLike groups elements which name or refer to a person, place, or organization.
Moduletei — Formal specification
Used by
Membersmodel.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno
Note

A superset of the naming elements that may appear in datelines, addresses, statements of responsibility, etc.

Appendix A.2.21 model.nameLike.agent

model.nameLike.agent groups elements which contain names of individuals or corporate bodies. [3.6. Names, Numbers, Dates, Abbreviations, and Addresses]
Moduletei — Formal specification
Used by
Membersname orgName persName
Note

This class is used in the content model of elements which reference names of people or organizations.

Appendix A.2.22 model.noteLike

model.noteLike groups globally-available note-like elements. [3.9. Notes, Annotation, and Indexing]
Moduletei — Formal specification
Used by
Membersnote

Appendix A.2.23 model.pLike

model.pLike groups paragraph-like elements.
Moduletei — Formal specification
Used by
Membersp

Appendix A.2.25 model.pPart.edit

model.pPart.edit groups phrase-level elements for simple editorial correction and transcription. [3.5. Simple Editorial Changes]
Moduletei — Formal specification
Used by
Membersmodel.pPart.editorial model.pPart.transcriptional

Appendix A.2.27 model.persNamePart

model.persNamePart groups elements which form part of a personal name. [13.2.1. Personal Names]
Modulenamesdates — Formal specification
Used by
MembersaddName forename nameLink roleName surname

Appendix A.2.28 model.phrase

model.phrase groups elements which can occur at the level of individual words or phrases. [1.3. The TEI Class System]
Moduletei — Formal specification
Used by
Membersmodel.graphicLike[graphic media] model.highlighted[model.emphLike[term title] model.hiLike] model.lPart model.pPart.data[model.addressLike[affiliation email] model.dateLike[date time] model.measureLike[measure num unit] model.nameLike[model.nameLike.agent[name orgName persName] model.offsetLike model.persNamePart[addName forename nameLink roleName surname] model.placeStateLike[model.placeNamePart[placeName] state] idno]] model.pPart.edit[model.pPart.editorial model.pPart.transcriptional] model.pPart.msdesc model.phrase.xml model.ptrLike[ref] model.segLike[pc phr s seg w] model.specDescLike
Note

This class of elements can occur within paragraphs, list items, lines of verse, etc.

Appendix A.2.29 model.placeNamePart

model.placeNamePart groups elements which form part of a place name. [13.2.3. Place Names]
Moduletei — Formal specification
Used by
MembersplaceName

Appendix A.2.30 model.placeStateLike

model.placeStateLike groups elements which describe changing states of a place.
Moduletei — Formal specification
Used by
Membersmodel.placeNamePart[placeName] state

Appendix A.2.31 model.ptrLike

model.ptrLike groups elements used for purposes of location and reference. [3.7. Simple Links and Cross-References]
Moduletei — Formal specification
Used by
Membersref

Appendix A.2.32 model.segLike

model.segLike groups elements used for arbitrary segmentation. [16.3. Blocks, Segments, and Anchors 17.1. Linguistic Segment Categories]
Moduletei — Formal specification
Used by
Memberspc phr s seg w
Note

The principles on which segmentation is carried out, and any special codes or attribute values used, should be defined explicitly in the <segmentation> element of the <encodingDesc> within the associated TEI header.

Appendix A.3 Attribute classes

Appendix A.3.1 att.ascribed

att.ascribed provides attributes for elements representing speech or action that can be ascribed to a specific individual. [3.3.3. Quotation 8.3. Elements Unique to Spoken Texts]
Moduletei — Formal specification
Membersatt.ascribed.directed[kinesic u vocal] change incident setting
Attributes
whoindicates the person, or group of people, to whom the element content is ascribed.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
In the following example from Hamlet, speeches (<sp>) in the body of the play are linked to <castItem> elements in the <castList> using the who attribute.
<castItem type="role">  <role xml:id="Barnardo">Bernardo</role> </castItem> <castItem type="role">  <role xml:id="Francisco">Francisco</role>  <roleDesc>a soldier</roleDesc> </castItem> <!-- ... --> <sp who="#Barnardo">  <speaker>Bernardo</speaker>  <l n="1">Who's there?</l> </sp> <sp who="#Francisco">  <speaker>Francisco</speaker>  <l n="2">Nay, answer me: stand, and unfold yourself.</l> </sp>
Note

For transcribed speech, this will typically identify a participant or participant group; in other contexts, it will point to any identified <person> element.

Appendix A.3.2 att.canonical

att.canonical provides attributes that can be used to associate a representation such as a name or title with canonical information about the object being named or referenced. [13.1.1. Linking Names and Their Referents]
Moduletei — Formal specification
Membersatt.naming[att.personal[addName forename name orgName persName placeName roleName surname] affiliation birth death education event occupation pubPlace state] catDesc date funder meeting publisher relation resp respStmt term time title
Attributes
keyprovides an externally-defined means of identifying the entity (or entities) being named, using a coded value of some kind.
StatusOptional
Datatypeteidata.text
<author>  <name key="name 427308"   type="organisation">[New Zealand Parliament, Legislative Council]</name> </author>
<author>  <name key="Hugo, Victor (1802-1885)"   ref="http://www.idref.fr/026927608">Victor Hugo</name> </author>
Note

The value may be a unique identifier from a database, or any other externally-defined string identifying the referent.

No particular syntax is proposed for the values of the key attribute, since its form will depend entirely on practice within a given project. For the same reason, this attribute is not recommended in data interchange, since there is no way of ensuring that the values used by one project are distinct from those used by another. In such a situation, a preferable approach for magic tokens which follows standard practice on the Web is to use a ref attribute whose value is a tag URI as defined in RFC 4151.

ref(reference) provides an explicit means of locating a full definition or identity for the entity being named by means of one or more URIs.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
<name ref="http://viaf.org/viaf/109557338"  type="person">Seamus Heaney</name>
Note

The value must point directly to one or more XML elements or other resources by means of one or more URIs, separated by whitespace. If more than one is supplied the implication is that the name identifies several distinct entities.

Appendix A.3.3 att.datable.custom

att.datable.custom provides attributes for normalization of elements that contain datable events to a custom dating system (i.e. other than the Gregorian used by W3 and ISO). [13.4. Dates]
Modulenamesdates — Formal specification
Membersatt.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title]
Attributes
when-customsupplies the value of a date or time in some custom standard form.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
The following are examples of custom date or time formats that are not valid ISO or W3C format normalizations, normalized to a different dating system
<p>Alhazen died in Cairo on the <date when="1040-03-06"   when-custom="431-06-12"> 12th day of Jumada t-Tania, 430 AH  </date>.</p> <p>The current world will end at the <date when="2012-12-21"   when-custom="13.0.0.0.0">end of B'ak'tun 13</date>.</p> <p>The Battle of Meggidu (<date when-custom="Thutmose_III:23">23rd year of reign of Thutmose III</date>).</p> <p>Esidorus bixit in pace annos LXX plus minus sub <date when-custom="Ind:4-10-11">die XI mensis Octobris indictione IIII</date> </p>
Not all custom date formulations will have Gregorian equivalents.The when-custom attribute and other custom dating are not constrained to a datatype by the TEI, but individual projects are recommended to regularize and document their dating formats.
notBefore-customspecifies the earliest possible date for the event in some custom standard form.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
notAfter-customspecifies the latest possible date for the event in some custom standard form.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
from-customindicates the starting point of the period in some custom standard form.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
<event xml:id="FIRE1"  datingMethod="#julian"  from-custom="1666-09-02"  to-custom="1666-09-05">  <head>The Great Fire of London</head>  <p>The Great Fire of London burned through a large part    of the city of London.</p> </event>
to-customindicates the ending point of the period in some custom standard form.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
datingPointsupplies a pointer to some location defining a named point in time with reference to which the datable item is understood to have occurred
StatusOptional
Datatypeteidata.pointer
datingMethodsupplies a pointer to a <calendar> element or other means of interpreting the values of the custom dating attributes.
StatusOptional
Datatypeteidata.pointer
Contayning the Originall, Antiquity, Increaſe, Moderne eſtate, and deſcription of that Citie, written in the yeare <date when-custom="1598"  calendar="#julian"  datingMethod="#julian">1598</date>. by Iohn Stow Citizen of London.
In this example, the calendar attribute points to a <calendar> element for the Julian calendar, specifying that the text content of the <date> element is a Julian date, and the datingMethod attribute also points to the Julian calendar to indicate that the content of the when-custom attribute value is Julian too.
<date when="1382-06-28"  when-custom="6890-06-20"  datingMethod="#creationOfWorld"> μηνὶ Ἰουνίου εἰς <num>κ</num> ἔτους <num>ςωϞ</num> </date>
In this example, a date is given in a Mediaeval text measured ‘from the creation of the world’, which is normalized (in when) to the Gregorian date, but is also normalized (in when-custom) to a machine-actionable, numeric version of the date from the Creation.
Note

Note that the datingMethod attribute (unlike calendar defined in att.datable) defines the calendar or dating system to which the date described by the parent element is normalized (i.e. in the when-custom or other X-custom attributes), not the calendar of the original date in the element.

Appendix A.3.4 att.datable.iso

att.datable.iso provides attributes for normalization of elements that contain datable events using the ISO 8601:2004 standard. [3.6.4. Dates and Times 13.4. Dates]
Modulenamesdates — Formal specification
Membersatt.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title]
Attributes
when-isosupplies the value of a date or time in a standard form.
StatusOptional
Datatypeteidata.temporal.iso
The following are examples of ISO date, time, and date & time formats that are not valid W3C format normalizations.
<date when-iso="1996-09-24T07:25+00">Sept. 24th, 1996 at 3:25 in the morning</date> <date when-iso="1996-09-24T03:25-04">Sept. 24th, 1996 at 3:25 in the morning</date> <time when-iso="1999-01-04T20:42-05">4 Jan 1999 at 8:42 pm</time> <time when-iso="1999-W01-1T20,70-05">4 Jan 1999 at 8:42 pm</time> <date when-iso="2006-05-18T10:03">a few minutes after ten in the morning on Thu 18 May</date> <time when-iso="03:00">3 A.M.</time> <time when-iso="14">around two</time> <time when-iso="15,5">half past three</time>
All of the examples of the when attribute in the att.datable.w3c class are also valid with respect to this attribute.
He likes to be punctual. I said <q>  <time when-iso="12">around noon</time> </q>, and he showed up at <time when-iso="12:00:00">12 O'clock</time> on the dot.
The second occurence of <time> could have been encoded with the when attribute, as 12:00:00 is a valid time with respect to the W3C XML Schema Part 2: Datatypes Second Edition specification. The first occurence could not.
notBefore-isospecifies the earliest possible date for the event in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.iso
notAfter-isospecifies the latest possible date for the event in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.iso
from-isoindicates the starting point of the period in standard form.
StatusOptional
Datatypeteidata.temporal.iso
to-isoindicates the ending point of the period in standard form.
StatusOptional
Datatypeteidata.temporal.iso
Note

The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by ISO 8601:2004, using the Gregorian calendar.

If both when-iso and dur-iso are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. That is,
<date when-iso="2007-06-01dur-iso="P8D"/>
indicates the same time period as
<date when-iso="2007-06-01/P8D"/>

In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading.

Appendix A.3.5 att.datable.w3c

att.datable.w3c provides attributes for normalization of elements that contain datable events conforming to the W3C XML Schema Part 2: Datatypes Second Edition. [3.6.4. Dates and Times 13.4. Dates]
Moduletei — Formal specification
Membersatt.datable[affiliation application birth change date death education event funder idno licence meeting name occupation orgName persName placeName relation resp sex state time title]
Attributes
whensupplies the value of the date or time in a standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
Examples of W3C date, time, and date & time formats.
<p>  <date when="1945-10-24">24 Oct 45</date>  <date when="1996-09-24T07:25:00Z">September 24th, 1996 at 3:25 in the morning</date>  <time when="1999-01-04T20:42:00-05:00">Jan 4 1999 at 8 pm</time>  <time when="14:12:38">fourteen twelve and 38 seconds</time>  <date when="1962-10">October of 1962</date>  <date when="--06-12">June 12th</date>  <date when="---01">the first of the month</date>  <date when="--08">August</date>  <date when="2006">MMVI</date>  <date when="0056">AD 56</date>  <date when="-0056">56 BC</date> </p>
This list begins in the year 1632, more precisely on Trinity Sunday, i.e. the Sunday after Pentecost, in that year the <date calendar="#julian"  when="1632-06-06">27th of May (old style)</date>.
<opener>  <dateline>   <placeName>Dorchester, Village,</placeName>   <date when="1828-03-02">March 2d. 1828.</date>  </dateline>  <salute>To    Mrs. Cornell,</salute> Sunday <time when="12:00:00">noon.</time> </opener>
notBeforespecifies the earliest possible date for the event in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
notAfterspecifies the latest possible date for the event in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
fromindicates the starting point of the period in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
toindicates the ending point of the period in standard form, e.g. yyyy-mm-dd.
StatusOptional
Datatypeteidata.temporal.w3c
Schematron
<sch:rule context="tei:*[@when]"> <sch:report test="@notBefore|@notAfter|@from|@to"  role="nonfatal">The @when attribute cannot be used with any other att.datable.w3c attributes.</sch:report> </sch:rule>
Schematron
<sch:rule context="tei:*[@from]"> <sch:report test="@notBefore"  role="nonfatal">The @from and @notBefore attributes cannot be used together.</sch:report> </sch:rule>
Schematron
<sch:rule context="tei:*[@to]"> <sch:report test="@notAfter"  role="nonfatal">The @to and @notAfter attributes cannot be used together.</sch:report> </sch:rule>
Example
<date from="1863-05-28to="1863-06-01">28 May through 1 June 1863</date>
Note

The value of these attributes should be a normalized representation of the date, time, or combined date & time intended, in any of the standard formats specified by XML Schema Part 2: Datatypes Second Edition, using the Gregorian calendar.

The most commonly-encountered format for the date portion of a temporal attribute is yyyy-mm-dd, but yyyy, --mm, ---dd, yyyy-mm, or --mm-dd may also be used. For the time part, the form hh:mm:ss is used.

Note that this format does not currently permit use of the value 0000 to represent the year 1 BCE; instead the value -0001 should be used.

Appendix A.3.6 att.datcat

att.datcat provides attributes that are used to align XML elements or attributes with the appropriate Data Categories (DCs) defined by an external taxonomy, in this way establishing the identity of information containers and values, and providing means of interpreting them. [9.5.2. Lexical View 18.3. Other Atomic Feature Values]
Moduletei — Formal specification
Membersatt.segLike[pc phr s seg w] tagUsage
Attributes
datcatprovides a pointer to a definition of, and/or general information about, (a) an information container (element or attribute) or (b) a value of an information container (element content or attribute value), by referencing an external taxonomy or ontology. If valueDatcat is present in the immediate context, this attribute takes on role (a), while valueDatcat performs role (b).
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
valueDatcatprovides a definition of, and/or general information about a value of an information container (element content or attribute value), by reference to an external taxonomy or ontology. Used especially where a contrast with datcat is needed.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
targetDatcatprovides a definition of, and/or general information about, information structure of an object referenced or modeled by the containing element, by reference to an external taxonomy or ontology. This attribute has the characteristics of the datcat attribute, except that it addresses not its containing element, but an object that is being referenced or modeled by its containing element.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
ExampleThe example below presents the TEI encoding of the name-value pair <part of speech, common noun>, where the name (key) ‘part of speech’ is abbreviated as ‘POS’, and the value, ‘common noun’ is symbolized by ‘NN’. The entire name-value pair is encoded by means of the element <f>. In TEI XML, that element acts as the container, labeled with the name attribute. Its contents may be complex or simple. In the case at hand, the content is the symbol ‘NN’.The datcat attribute relates the feature name (i.e., the key) to the data category ‘part of speech’, while the attribute valueDatcat relates the feature value to the data category common noun. Both these data categories should be defined in an external and preferably open reference taxonomy or ontology.
<fs>  <f name="POS"   datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">   <symbol valueDatcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545"    value="NN"/>  </f> <!-- ... --> </fs>
‘NN’ is the symbol for common noun used e.g. in the CLAWS-7 tagset defined by the University Centre for Computer Corpus Research on Language at the University of Lancaster. The very same data category used for tagging an early version of the British National Corpus, and coming from the BNC Basic (C5) tagset, uses the symbol ‘NN0’ (rather than ‘NN’). Making these values semantically interoperable would be extremely difficult without a human expert if they were not anchored in a single point of an established reference taxonomy of morphosyntactic data categories. In the case at hand, the string http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545 is both a persistent identifier of the data category in question, as well as a pointer to a shared definition of common noun.While the symbols ‘NN’, ‘NN0’, and many others (often coming from languages other than English) are implicitly members of the container category ‘part of speech’, it is sometimes useful not to rely on such an implicit relationship but rather use an explicit identifier for that data category, to distinguish it from other morphosyntactic data categories, such as gender, tense, etc. For that purpose, the above example uses the datcat attribute to reference a definition of part of speech. The reference taxonomy in this example is the CLARIN Concept Registry.If the feature structure markup exemplified above is to be repeated many times in a single document, it is much more efficient to gather the persistent identifiers in a single place and to only reference them, implicitly or directly, from feature structure markup. The following example is much more concise than the one above and relies on the concepts of feature structure declaration and feature value library, discussed in chapter [[undefined FS]].
<fs>  <f name="POSfVal="#commonNoun"/> <!-- ... --> </fs>
The assumption here is that the relevant feature values are collected in a place that the annotation document in question has access to — preferably, a single document per linguistic resource, for example an <fsdDecl> that is XIncluded as a sibling of <text> or a child of <encodingDesc>; a <taxonomy> available resource-wide (e.g., in a shared header) is also an option.The example below presents an <fvLib> element that collects the relevant feature values (most of them omitted). At the same time, this example shows one way of encoding a tagset, i.e., an established inventory of values of (in the case at hand) morphosyntactic categories.
<fvLib n="POS values">  <symbol xml:id="commonNounvalue="NN"   datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"/>  <symbol xml:id="properNounvalue="NP"   datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/> <!-- ... --> </fvLib>
Note that these Guidelines do not prescribe a specific choice between datcat and valueDatcat in such cases. The former is the generic way of referencing a data category, whereas the latter is more specific, in that it references a data category that represents a value. The choice between them comes into play where a single element — or a tight element complex, such as the <f>/<symbol> complex illustrated above — make it necessary or useful to distinguish between the container data category and its value.
ExampleIn the context of dictionaries designed with semantic interoperability in mind, the following example ensures that the <pos> element is interpreted as the same information container as in the case of the example of <f name="POS"> above.
<gramGrp>  <pos datcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"   valueDatcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545">NN</pos> </gramGrp>
Efficiency of this type of interoperable markup demands that the references to the particular data categories should best be provided in a single place within the dictionary (or a single place within the project), rather than being repeated inside every entry. For the container elements, this can be achieved at the level of <tagUsage>, although here, the valueDatcat attribute should be used, because it is not the <tagUsage> element that is associated with the relevant data category, but rather the element <pos> (or <case>, etc.) that is described by <tagUsage>:
<tagsDecl partial="true"> <!-- ... -->  <namespace name="http://www.tei-c.org/ns/1.0">   <tagUsage gi="pos"    targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">Contains the part of speech.</tagUsage>   <tagUsage gi="case"    targetDatcat="http://hdl.handle.net/11459/CCR_C-1840_9f4e319c-f233-6c90-9117-7270e215f039">Contains information about the grammatical case that the described form is inflected for.</tagUsage> <!-- ... -->  </namespace> </tagsDecl>
Another possibility is to shorten the URIs by means of the <prefixDef> mechanism, as illustrated below:
<listPrefixDef>  <prefixDef ident="ccrmatchPattern="pos"   replacementPattern="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3"/>  <prefixDef ident="ccrmatchPattern="adj"   replacementPattern="http://hdl.handle.net/11459/CCR_C-1230_23653c21-fca1-edf8-fd7c-3df2d6499157"/> </listPrefixDef> <!-- ... --> <entry> <!--...-->  <form>   <orth>isotope</orth>  </form>  <gramGrp>   <pos datcat="ccr:pos"    valueDatcat="ccr:adj">adj</pos>  </gramGrp> <!--...--> </entry>
This mechanism creates implications that are not always wanted, among others, in the case at hand, suggesting that the identifiers ‘pos’ and ‘adj’ belong to a namespace associated with the CLARIN Concept Repository (CCR), whereas that is solely a shorthand mechanism whose scope is the current resource. Documenting this clearly in the header of the dictionary is therefore advised.Yet another possibility is to associate the information about the relationship between a TEI markup element and the data category that it is intended to model already at the level of modeling the dictionary resource, that is, at the level of the ODD, in <equiv> element that is a child of <elementSpec> or <attDef>.
ExampleThe targetDatcat attribute is designed to be used in, e.g., feature structure declarations, and is analogous to the targetLang attribute of the att.pointing class, in that it describes the object that is being referenced, rather than the referencing object.
<fDecl name="POS"  targetDatcat="http://hdl.handle.net/11459/CCR_C-396_5a972b93-2294-ab5c-a541-7c344c5f26c3">  <fDescr>part of speech (morphosyntactic category)</fDescr>  <vRange>   <vAlt>    <symbol value="NN"     datcat="http://hdl.handle.net/11459/CCR_C-1256_7ec6083c-23d4-224d-6f94-eecbe6861545"/>    <symbol value="NP"     datcat="http://hdl.handle.net/11459/CCR_C-1371_fbebd9ec-a7f4-9a36-d6e9-88ee16b944ae"/> <!-- ... -->   </vAlt>  </vRange> </fDecl>
Above, the <fDecl> uses targetDatcat, because if it were to use datcat, it would be asserting that it is an instance of the container data category part of speech, whereas it is not — it models a container (<f>) that encodes a part of speech. Note also that it is the <f> that is modeled above, not its values, which are used as direct references to data categories; hence the use of datcat in the <symbol> element.
Note

The TEI Abstract Model can be expressed as a hierarchy of attribute-value matrices (AVMs) of various types and of various levels of complexity, nested or grouped in various ways. At the most abstract level, an AVM consists of an information container and the value (contents) of that container.

A simple example of an XML serialization of such structures is, on the one hand, the opening and closing tags that delimit and name the container, and, on the other, the content enclosed by the two tags that constitues the value. An analogous example is an attribute name and the value of that attribute.

In a TEI XML example of two equivalent serializations expressing the name-value pair <part-of-speech,common-noun>, namely <pos>commonNoun</pos> and pos="common-noun", one would classify the element <pos> and the attribute pos as containers (mapping onto the first member of the relevant name-value pair), while the character data content of <pos> or the value of pos would be seen as mapping onto the second member of the pair.

The att.datcat class provides means of addressing the containers and their values, while at the same time providing a way to interpret them in the context of external taxonomies or ontologies. Aligning e.g. both the <pos> element and the pos attribute with the same value of an external reference point (i.e., an entry in an agreed taxonomy) affirms the identity of the concept serialised by both the element container and the attribute container, and optionally provides a definition of that concept (in the case at hand, the concept part of speech).

The value of the att.datcat attributes should be a PID (persistent identifier) that points to a specific — and, ideally, shared — taxonomy or ontology. Among the resources that can, to a lesser or greater extent, be used as inventories of (more or less) standardized linguistic categories are the GOLD ontology, CLARIN CCR, OLiA, or TermWeb's DatCatInfo, and also the Universal Dependencies inventory, on the assumption that its URIs are going to persist. It is imaginable that a project may choose to address a local taxonomy store instead, but this risks losing the advantage of interchangeability with other projects.

Historically, datcat and valueDatcat originate from the (the now obsolete) ISO 12620:2009 standard, describing the data model and procedures for a Data Category Registry (DCR). The current version of that standard, ISO 12620-1, does not standardize the serialization of pointers, merely mentioning the TEI att.datcat as an example.

Note that no constraint prevents the occurrence of a combination of att.datcat attributes: the <fDecl> element, which is a natural bearer of the targetDatcat attribute, is an instance of a specific modeling element, and, in principle, could be semantically fixed by an appropriate reference taxonomy of modeling devices.

Appendix A.3.7 att.declarable

att.declarable provides attributes for those elements in the TEI header which may be independently selected by means of the special purpose decls attribute. [15.3. Associating Contextual Information with a Text]
Moduletei — Formal specification
Membersavailability bibl correction editorialDecl equipment equipment hyphenation langUsage listEvent listOrg listPerson normalization particDesc projectDesc quotation recording segmentation settingDesc sourceDesc textClass
Attributes
defaultindicates whether or not this element is selected by default when its parent is selected.
StatusOptional
Datatypeteidata.truthValue
Legal values are:
true
This element is selected if its parent is selected
false
This element can only be selected explicitly, unless it is the only one of its kind, in which case it is selected if its parent is selected.[Default]
Note

The rules governing the association of declarable elements with individual parts of a TEI text are fully defined in chapter 15.3. Associating Contextual Information with a Text. Only one element of a particular type may have a default attribute with a value of true.

Appendix A.3.8 att.duration

att.duration provides attributes for normalization of elements that contain datable events.
Modulespoken — Formal specification
Membersatt.timed[gap incident kinesic media u vocal] date recording time
Attributesatt.duration.w3c (@dur) att.duration.iso (@dur-iso)
Note

This ‘superclass’ provides attributes that can be used to provide normalized values of temporal information. By default, the attributes from the att.duration.w3c class are provided. If the module for names & dates is loaded, this class also provides attributes from the att.duration.iso class. In general, the possible values of attributes restricted to the W3C datatypes form a subset of those values available via the ISO 8601 standard. However, the greater expressiveness of the ISO datatypes is rarely needed, and there exists much greater software support for the W3C datatypes.

Appendix A.3.9 att.duration.iso

att.duration.iso provides attributes for recording normalized temporal durations. [3.6.4. Dates and Times 13.4. Dates]
Moduletei — Formal specification
Membersatt.duration[att.timed[gap incident kinesic media u vocal] date recording time]
Attributes
dur-iso(duration) indicates the length of this element in time.
StatusOptional
Datatypeteidata.duration.iso
Note

If both when and dur or dur-iso are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. In order to represent a time range by a duration and its ending time the when-iso attribute must be used.

In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading.

Appendix A.3.10 att.duration.w3c

att.duration.w3c provides attributes for recording normalized temporal durations. [3.6.4. Dates and Times 13.4. Dates]
Moduletei — Formal specification
Membersatt.duration[att.timed[gap incident kinesic media u vocal] date recording time]
Attributes
dur(duration) indicates the length of this element in time.
StatusOptional
Datatypeteidata.duration.w3c
Note

If both when and dur are specified, the values should be interpreted as indicating a span of time by its starting time (or date) and duration. In order to represent a time range by a duration and its ending time the when-iso attribute must be used.

In providing a ‘regularized’ form, no claim is made that the form in the source text is incorrect; the regularized form is simply that chosen as the main form for purposes of unifying variant forms under a single heading.

Appendix A.3.11 att.fragmentable

att.fragmentable provides attributes for representing fragmentation of a structural element, typically as a consequence of some overlapping hierarchy.
Moduletei — Formal specification
Membersatt.divLike[div] att.segLike[pc phr s seg w] p
Attributes
partspecifies whether or not its parent element is fragmented in some way, typically by some other overlapping structure: for example a speech which is divided between two or more verse stanzas, a paragraph which is split across a page division, a verse line which is divided between two speakers.
StatusOptional
Datatypeteidata.enumerated
Legal values are:
Y
(yes) the element is fragmented in some (unspecified) respect
N
(no) the element is not fragmented, or no claim is made as to its completeness[Default]
I
(initial) this is the initial part of a fragmented element
M
(medial) this is a medial part of a fragmented element
F
(final) this is the final part of a fragmented element
Note

The values I, M, or F should be used only where it is clear how the element may be reconstituted.

Appendix A.3.12 att.global

att.global provides attributes common to all elements in the TEI encoding scheme. [1.3.1.1. Global Attributes]
Moduletei — Formal specification
MembersTEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w
Attributesatt.global.rendition (@rend, @style, @rendition) att.global.linking (@corresp, @synch, @next, @prev) att.global.analytic (@ana) att.global.responsibility (@resp) att.global.source (@source)
xml:id(identifier) provides a unique identifier for the element bearing the attribute.
StatusOptional
DatatypeID
Note

The xml:id attribute may be used to specify a canonical reference for an element; see section 3.11. Reference Systems.

n(number) gives a number (or other label) for an element, which is not necessarily unique within the document.
StatusOptional
Datatypeteidata.text
Note

The value of this attribute is always understood to be a single token, even if it contains space or other punctuation characters, and need not be composed of numbers only. It is typically used to specify the numbering of chapters, sections, list items, etc.; it may also be used in the specification of a standard reference system for the text.

xml:lang(language) indicates the language of the element content using a ‘tag’ generated according to BCP 47.
StatusOptional
Datatypeteidata.language
<p> … The consequences of this rapid depopulation were the loss of the last <foreign xml:lang="rap">ariki</foreign> or chief (Routledge 1920:205,210) and their connections to ancestral territorial organization.</p>
Note

The xml:lang value will be inherited from the immediately enclosing element, or from its parent, and so on up the document hierarchy. It is generally good practice to specify xml:lang at the highest appropriate level, noticing that a different default may be needed for the <teiHeader> from that needed for the associated resource element or elements, and that a single TEI document may contain texts in many languages.

Only attributes with free text values (rare in these guidelines) will be in the scope of xml:lang.

The authoritative list of registered language subtags is maintained by IANA and is available at http://www.iana.org/assignments/language-subtag-registry. For a good general overview of the construction of language tags, see https://www.w3.org/International/articles/language-tags/, and for a practical step-by-step guide, see https://www.w3.org/International/questions/qa-choosing-language-tags.en.php.

The value used must conform with BCP 47. If the value is a private use code (i.e., starts with x- or contains -x-), a <language> element with a matching value for its ident attribute should be supplied in the TEI header to document this value. Such documentation may also optionally be supplied for non-private-use codes, though these must remain consistent with their (IETF)Internet Engineering Task Force definitions.

xml:baseprovides a base URI reference with which applications can resolve relative URI references into absolute URI references.
StatusOptional
Datatypeteidata.pointer
<div type="bibl">  <head>Selections from <title level="m">The Collected Letters of Robert Southey. Part 1: 1791-1797</title>  </head>  <listBibl xml:base="https://romantic-circles.org/sites/default/files/imported/editions/southey_letters/XML/">   <bibl>    <ref target="letterEEd.26.3.xml">     <title>Robert Southey to Grosvenor Charles Bedford</title>, <date when="1792-04-03">3 April 1792</date>.    </ref>   </bibl>   <bibl>    <ref target="letterEEd.26.57.xml">     <title>Robert Southey to Anna Seward</title>, <date when="1793-09-18">18 September 1793</date>.    </ref>   </bibl>   <bibl>    <ref target="letterEEd.26.85.xml">     <title>Robert Southey to Robert Lovell</title>, <date from="1794-04-05"      to="1794-04-06">5-6 April, 1794</date>.    </ref>   </bibl>  </listBibl> </div>
xml:spacesignals an intention about how white space should be managed by applications.
StatusOptional
Datatypeteidata.enumerated
Legal values are:
default
signals that the application's default white-space processing modes are acceptable
preserve
indicates the intent that applications preserve all white space
Note

The XML specification provides further guidance on the use of this attribute. Note that many parsers may not handle xml:space correctly.

Appendix A.3.13 att.global.analytic

att.global.analytic provides additional global attributes for associating specific analyses or interpretations with appropriate portions of a text. [17.2. Global Attributes for Simple Analyses 17.3. Spans and Interpretations]
Moduleanalysis — Formal specification
Membersatt.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w]
Attributes
ana(analysis) indicates one or more elements containing interpretations of the element on which the ana attribute appears.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Note

When multiple values are given, they may reflect either multiple divergent interpretations of an ambiguous text, or multiple mutually consistent interpretations of the same passage in different contexts.

Appendix A.3.14 att.global.linking

att.global.linking provides a set of attributes for hypertextual linking. [16. Linking, Segmentation, and Alignment]
Modulelinking — Formal specification
Membersatt.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w]
Attributes
corresp(corresponds) points to elements that correspond to the current element in some way.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
<group>  <text xml:id="t1-g1-t1"   xml:lang="mi">   <body xml:id="t1-g1-t1-body1">    <div type="chapter">     <head>He Whakamaramatanga mo te Ture Hoko, Riihi hoki, i nga Whenua Maori, 1876.</head>     <p></p>    </div>   </body>  </text>  <text xml:id="t1-g1-t2"   xml:lang="en">   <body xml:id="t1-g1-t2-body1"    corresp="#t1-g1-t1-body1">    <div type="chapter">     <head>An Act to regulate the Sale, Letting, and Disposal of Native Lands, 1876.</head>     <p></p>    </div>   </body>  </text> </group>
In this example a <group> contains two <text>s, each containing the same document in a different language. The correspondence is indicated using corresp. The language is indicated using xml:lang, whose value is inherited; both the tag with the corresp and the tag pointed to by the corresp inherit the value from their immediate parent.
<!-- In a placeography called "places.xml" --><place xml:id="LOND1"  corresp="people.xml#LOND2 people.xml#GENI1">  <placeName>London</placeName>  <desc>The city of London...</desc> </place> <!-- In a literary personography called "people.xml" --> <person xml:id="LOND2"  corresp="places.xml#LOND1 #GENI1">  <persName type="lit">London</persName>  <note>   <p>Allegorical character representing the city of <placeName ref="places.xml#LOND1">London</placeName>.</p>  </note> </person> <person xml:id="GENI1"  corresp="places.xml#LOND1 #LOND2">  <persName type="lit">London’s Genius</persName>  <note>   <p>Personification of London’s genius. Appears as an      allegorical character in mayoral shows.   </p>  </note> </person>
In this example, a <place> element containing information about the city of London is linked with two <person> elements in a literary personography. This correspondence represents a slightly looser relationship than the one in the preceding example; there is no sense in which an allegorical character could be substituted for the physical city, or vice versa, but there is obviously a correspondence between them.
synch(synchronous) points to elements that are synchronous with the current element.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
nextpoints to the next element of a virtual aggregate of which the current element is part.
StatusOptional
Datatypeteidata.pointer
Note

It is recommended that the element indicated be of the same type as the element bearing this attribute.

prev(previous) points to the previous element of a virtual aggregate of which the current element is part.
StatusOptional
Datatypeteidata.pointer
Note

It is recommended that the element indicated be of the same type as the element bearing this attribute.

Appendix A.3.15 att.global.rendition

att.global.rendition provides rendering attributes common to all elements in the TEI encoding scheme. [1.3.1.1.3. Rendition Indicators]
Moduletei — Formal specification
Membersatt.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w]
Attributes
rend(rendition) indicates how the element in question was rendered or presented in the source text.
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
<head rend="align(center) case(allcaps)">  <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/>  <hi rend="case(mixed)">New Blazing-World</hi>. </head>
Note

These Guidelines make no binding recommendations for the values of the rend attribute; the characteristics of visual presentation vary too much from text to text and the decision to record or ignore individual characteristics varies too much from project to project. Some potentially useful conventions are noted from time to time at appropriate points in the Guidelines. The values of the rend attribute are a set of sequence-indeterminate individual tokens separated by whitespace.

stylecontains an expression in some formal style definition language which defines the rendering or presentation used for this element in the source text
StatusOptional
Datatypeteidata.text
<head style="text-align: center; font-variant: small-caps">  <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/>  <hi style="font-variant: normal">New Blazing-World</hi>. </head>
Note

Unlike the attribute values of rend, which uses whitespace as a separator, the style attribute may contain whitespace. This attribute is intended for recording inline stylistic information concerning the source, not any particular output.

The formal language in which values for this attribute are expressed may be specified using the <styleDefDecl> element in the TEI header.

If style and rendition are both present on an element, then style overrides or complements rendition. style should not be used in conjunction with rend, because the latter does not employ a formal style definition language.

renditionpoints to a description of the rendering or presentation used for this element in the source text.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
<head rendition="#ac #sc">  <lb/>To The <lb/>Duchesse <lb/>of <lb/>Newcastle, <lb/>On Her <lb/>  <hi rendition="#normal">New Blazing-World</hi>. </head> <!-- elsewhere... --> <rendition xml:id="sc"  scheme="css">font-variant: small-caps</rendition> <rendition xml:id="normal"  scheme="css">font-variant: normal</rendition> <rendition xml:id="ac"  scheme="css">text-align: center</rendition>
Note

The rendition attribute is used in a very similar way to the class attribute defined for XHTML but with the important distinction that its function is to describe the appearance of the source text, not necessarily to determine how that text should be presented on screen or paper.

If rendition is used to refer to a style definition in a formal language like CSS, it is recommended that it not be used in conjunction with rend. Where both rendition and rend are supplied, the latter is understood to override or complement the former.

Each URI provided should indicate a <rendition> element defining the intended rendition in terms of some appropriate style language, as indicated by the scheme attribute.

Appendix A.3.16 att.global.responsibility

att.global.responsibility provides attributes indicating the agent responsible for some aspect of the text, the markup or something asserted by the markup, and the degree of certainty associated with it. [1.3.1.1.4. Sources, certainty, and responsibility 3.5. Simple Editorial Changes 11.3.2.2. Hand, Responsibility, and Certainty Attributes 17.3. Spans and Interpretations 13.1.1. Linking Names and Their Referents]
Moduletei — Formal specification
Membersatt.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w]
Attributes
resp(responsible party) indicates the agency responsible for the intervention or interpretation, for example an editor or transcriber.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Note

To reduce the ambiguity of a resp pointing directly to a person or organization, we recommend that resp be used to point not to an agent (<person> or <org>) but to a <respStmt>, <author>, <editor> or similar element which clarifies the exact role played by the agent. Pointing to multiple <respStmt>s allows the encoder to specify clearly each of the roles played in part of a TEI file (creating, transcribing, encoding, editing, proofing etc.).

Example
Blessed are the <choice>  <sic>cheesemakers</sic>  <corr resp="#editorcert="high">peacemakers</corr> </choice>: for they shall be called the children of God.
Example
<!-- in the <text> ... --><lg> <!-- ... -->  <l>Punkes, Panders, baſe extortionizing    sla<choice>    <sic>n</sic>    <corr resp="#JENS1_transcriber">u</corr>   </choice>es,</l> <!-- ... --> </lg> <!-- in the <teiHeader> ... --> <!-- ... --> <respStmt xml:id="JENS1_transcriber">  <resp when="2014">Transcriber</resp>  <name>Janelle Jenstad</name> </respStmt>

Appendix A.3.17 att.global.source

att.global.source provides attributes used by elements to point to an external source. [1.3.1.1.4. Sources, certainty, and responsibility 3.3.3. Quotation 8.3.4. Writing]
Moduletei — Formal specification
Membersatt.global[TEI addName affiliation appInfo application availability bibl birth body catDesc catRef category change classDecl correction date death desc div edition editionStmt editorialDecl education email encodingDesc equipment equipment event extent figure fileDesc forename funder gap graphic head hyphenation idno incident kinesic label langUsage language licence link linkGrp listEvent listOrg listPerson listPrefixDef listRelation measure media meeting name nameLink namespace normalization note num occupation org orgName p particDesc pb pc persName person phr placeName prefixDef profileDesc projectDesc pubPlace publicationStmt publisher quotation recording recordingStmt ref relation resp respStmt revisionDesc roleName s seg segmentation setting settingDesc sex sourceDesc state surname tagUsage tagsDecl taxonomy teiCorpus teiHeader term text textClass time title titleStmt u unit vocal w]
Attributes
sourcespecifies the source from which some aspect of this element is drawn.
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Schematron
<sch:rule context="tei:*[@source]"> <sch:let name="srcs"  value="tokenize( normalize-space(@source),' ')"/> <sch:report test="( self::tei:classRef | self::tei:dataRef | self::tei:elementRef | self::tei:macroRef | self::tei:moduleRef | self::tei:schemaSpec ) and $srcs[2]"> When used on a schema description element (like <sch:value-of select="name(.)"/>), the @source attribute should have only 1 value. (This one has <sch:value-of select="count($srcs)"/>.) </sch:report> </sch:rule>
Note

The source attribute points to an external source. When used on an element describing a schema component (<classRef>, <dataRef>, <elementRef>, <macroRef>, <moduleRef>, or <schemaSpec>), it identifies the source from which declarations for the components should be obtained.

On other elements it provides a pointer to the bibliographical source from which a quotation or citation is drawn.

In either case, the location may be provided using any form of URI, for example an absolute URI, a relative URI, a private scheme URI of the form tei:x.y.z, where x.y.z indicates the version number, e.g. tei:4.3.2 for TEI P5 release 4.3.2 or (as a special case) tei:current for whatever is the latest release, or a private scheme URI that is expanded to an absolute URI as documented in a <prefixDef>.

When used on elements describing schema components, source should have only one value; when used on other elements multiple values are permitted.

Example
<p> <!-- ... --> As Willard McCarty (<bibl xml:id="mcc_2012">2012, p.2</bibl>) tells us, <quote source="#mcc_2012">‘Collaboration’ is a problematic and should be a contested    term.</quote> <!-- ... --> </p>
Example
<p> <!-- ... -->  <quote source="#chicago_15_ed">Grammatical theories are in flux, and the more we learn, the    less we seem to know.</quote> <!-- ... --> </p> <!-- ... --> <bibl xml:id="chicago_15_ed">  <title level="m">The Chicago Manual of Style</title>, <edition>15th edition</edition>. <pubPlace>Chicago</pubPlace>: <publisher>University of    Chicago Press</publisher> (<date>2003</date>), <biblScope unit="page">p.147</biblScope>. </bibl>
Example
<elementRef key="psource="tei:2.0.1"/>
Include in the schema an element named <p> available from the TEI P5 2.0.1 release.
Example
<schemaSpec ident="myODD"  source="mycompiledODD.xml"> <!-- further declarations specifying the components required --> </schemaSpec>
Create a schema using components taken from the file mycompiledODD.xml.

Appendix A.3.18 att.internetMedia

att.internetMedia provides attributes for specifying the type of a computer resource using a standard taxonomy.
Moduletei — Formal specification
Membersatt.media[graphic media] ref
Attributes
mimeType(MIME media type) specifies the applicable multimedia internet mail extension (MIME) media type
StatusOptional
Datatype1–∞ occurrences of teidata.word separated by whitespace
ExampleIn this example mimeType is used to indicate that the URL points to a TEI XML file encoded in UTF-8.
<ref mimeType="application/tei+xml; charset=UTF-8"  target="https://raw.githubusercontent.com/TEIC/TEI/dev/P5/Source/guidelines-en.xml"/>
Note

This attribute class provides an attribute for describing a computer resource, typically available over the internet, using a value taken from a standard taxonomy. At present only a single taxonomy is supported, the Multipurpose Internet Mail Extensions (MIME) Media Type system. This typology of media types is defined by the Internet Engineering Task Force in RFC 2046. The list of types is maintained by the Internet Assigned Numbers Authority (IANA). The mimeType attribute must have a value taken from this list.

Appendix A.3.19 att.lexicographic.normalized

att.lexicographic.normalized provides attributes for usage within word-level elements in the analysis module and within lexicographic microstructure in the dictionaries module.
Moduleanalysis — Formal specification
Membersatt.linguistic[pc w]
Attributes
norm(normalized) provides the normalized/standardized form of information present in the source text in a non-normalized form
StatusOptional
Datatypeteidata.text
Normalization of part-of-speech information within a dictionary entry.
<gramGrp>  <pos norm="noun">n</pos> </gramGrp>
Normalization of a source form in a tokenized historical corpus.
<s>  <w>for</w>  <w norm="virtue's">vertues</w>  <w>sake</w> </s>
<s>  <w norm="persuasion">perswasion</w>  <w>of</w>  <w norm="Unity">Vnitie</w> </s>
Example of normalization from Aviso. Relation oder Zeitung. Wolfenbüttel, 1609. In: Deutsches Textarchiv.
<s>  <w norm="freiwillig">freywillig</w>  <pc norm=","   join="left">/</pc>  <w norm="unbedrängt">vnbedraͤngt</w>  <w norm="und">vnd</w>  <w norm="unverhindert">vnuerhindert</w> </s>
<w norm="Teil">Theyll</w>
<w norm="Freude">Frewde</w>
Note

It needs to be stressed that the two attributes in this class are meant for strictly lexicographic and linguistic uses, and not for editorial interventions. For the latter, the mechanism based on <choice>, <orig>, and <reg> needs to be employed.

Appendix A.3.20 att.linguistic

att.linguistic provides a set of attributes concerning linguistic features of tokens, for usage within token-level elements, specifically <w> and <pc> in the analysis module. [17.4.2. Lightweight Linguistic Annotation]
Moduleanalysis — Formal specification
Memberspc w
Attributesatt.lexicographic.normalized (@norm)
lemmaprovides a lemma (base form) for the word, typically uninflected and serving both as an identifier (e.g. in dictionary contexts, as a headword), and as a basis for potential inflections.
StatusOptional
Datatypeteidata.text
<w lemma="wife">wives</w>
<w lemma="Arznei">Artzeneyen</w>
pos(part of speech) indicates the part of speech assigned to a token (i.e. information on whether it is a noun, adjective, or verb), usually according to some official reference vocabulary (e.g. for German: STTS, for English: CLAWS, for Polish: NKJP, etc.).
StatusOptional
Datatypeteidata.text
The German sentence ‘Wir fahren in den Urlaub.’ tagged with the Stuttgart-Tuebingen-Tagset (STTS).
<s>  <w pos="PPER">Wir</w>  <w pos="VVFIN">fahren</w>  <w pos="APPR">in</w>  <w pos="ART">den</w>  <w pos="NN">Urlaub</w>  <w pos="$.">.</w> </s>
The English sentence ‘We're going to Brazil.’ tagged with the CLAWS-5 tagset, arranged inline (with significant whitespace).
<p><w pos="PNP">We</w><w pos="VBB">'re</w> <w pos="VVG">going</w> <w pos="PRP">to</w> <w pos="NP0">Brazil</w><pc pos="PUN">.</pc></p>         
The English sentence ‘We're going on vacation to Brazil for a month!’ tagged with the CLAWS-7 tagset and arranged sequentially.
<p>  <w pos="PPIS2">We</w>  <w pos="VBR">'re</w>  <w pos="VVG">going</w>  <w pos="II">on</w>  <w pos="NN1">vacation</w>  <w pos="II">to</w>  <w pos="NP1">Brazil</w>  <w pos="IF">for</w>  <w pos="AT1">a</w>  <w pos="NNT1">month</w>  <pc pos="!">!</pc> </p>
msd(morphosyntactic description) supplies morphosyntactic information for a token, usually according to some official reference vocabulary (e.g. for German: STTS-large tagset; for a feature description system designed as (pragmatically) universal, see Universal Features).
StatusOptional
Datatypeteidata.text
<ab>  <w pos="PPER"   msd="1.Pl.*.Nom">Wir</w>  <w pos="VVFIN"   msd="1.Pl.Pres.Ind">fahren</w>  <w pos="APPR"   msd="--">in</w>  <w pos="ART"   msd="Def.Masc.Akk.Sg">den</w>  <w pos="NN"   msd="Masc.Akk.Sg">Urlaub</w>  <pc pos="$."   msd="--">.</pc> </ab>
joinwhen present, provides information on whether the token in question is adjacent to another, and if so, on which side.
StatusOptional
Datatypeteidata.text
Legal values are:
no
(the token is not adjacent to another)
left
(there is no whitespace on the left side of the token)
right
(there is no whitespace on the right side of the token)
both
(there is no whitespace on either side of the token)
overlap
(the token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream)
The example below assumes that the lack of whitespace is marked redundantly, by using the appropriate values of join.
<s>  <pc join="right">"</pc>  <w join="left">Friends</w>  <w>will</w>  <w>be</w>  <w join="right">friends</w>  <pc join="both">.</pc>  <pc join="left">"</pc> </s>
Note that a project may make a decision to only indicate lack of whitespace in one direction, or do that non-redundantly. The existing proposal is the broadest possible, on the assumption that we adopt the "streamable view", where all the information on the current element needs to be represented locally.
The English sentence ‘We're going on vacation.’ tagged with the CLAWS-5 tagset, arranged sequentially, tagged on the assumption that only the lack of the preceding whitespace is indicated.
<p>  <w pos="PNP">We</w>  <w pos="VBB"   join="left">'re</w>  <w pos="VVG">going</w>  <w pos="PRP">on</w>  <w pos="NN1">vacation</w>  <pc pos="PUN"   join="left">.</pc> </p>
Note

The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012.

Note

These attributes make it possible to encode simple language corpora and to add a layer of linguistic information to any tokenized resource. See section 17.4.2. Lightweight Linguistic Annotation for discussion.

Appendix A.3.21 att.naming

att.naming provides attributes common to elements which refer to named persons, places, organizations etc. [3.6.1. Referring Strings 13.3.6. Names and Nyms]
Moduletei — Formal specification
Membersatt.personal[addName forename name orgName persName placeName roleName surname] affiliation birth death education event occupation pubPlace state
Attributesatt.canonical (@key, @ref)
rolemay be used to specify further information about the entity referenced by this name in the form of a set of whitespace-separated values, for example the occupation of a person, or the status of a place.
StatusOptional
Datatype1–∞ occurrences of teidata.enumerated separated by whitespace

Appendix A.3.22 att.personal

att.personal (attributes for components of names usually, but not necessarily, personal names) common attributes for those elements which form part of a name usually, but not necessarily, a personal name. [13.2.1. Personal Names]
Moduletei — Formal specification
MembersaddName forename name orgName persName placeName roleName surname
Attributesatt.naming (@role) (att.canonical (@key, @ref))
fullindicates whether the name component is given in full, as an abbreviation or simply as an initial.
StatusOptional
Datatypeteidata.enumerated
Legal values are:
yes
(yes) the name component is spelled out in full.[Default]
abb
(abbreviated) the name component is given in an abbreviated form.
init
(initial letter) the name component is indicated only by one initial.

Appendix A.3.23 att.pointing

att.pointing provides a set of attributes used by all elements which point to other elements by means of one or more URI references. [1.3.1.1.2. Language Indicators 3.7. Simple Links and Cross-References]
Moduletei — Formal specification
Membersatt.pointing.group[linkGrp] catRef licence link note ref term
Attributes
targetspecifies the destination of the reference by supplying one or more URI References
StatusOptional
Datatype1–∞ occurrences of teidata.pointer separated by whitespace
Note

One or more syntactically valid URI references, separated by whitespace. Because whitespace is used to separate URIs, no whitespace is permitted inside a single URI. If a whitespace character is required in a URI, it should be escaped with the normal mechanism, e.g. TEI%20Consortium.

Appendix A.3.24 att.ranging

att.ranging provides attributes for describing numerical ranges.
Moduletei — Formal specification
Membersatt.dimensions[birth date death gap state time] measure num
Attributes
atLeastgives a minimum estimated value for the approximate measurement.
StatusOptional
Datatypeteidata.numeric
atMostgives a maximum estimated value for the approximate measurement.
StatusOptional
Datatypeteidata.numeric
minwhere the measurement summarizes more than one observation or a range, supplies the minimum value observed.
StatusOptional
Datatypeteidata.numeric
maxwhere the measurement summarizes more than one observation or a range, supplies the maximum value observed.
StatusOptional
Datatypeteidata.numeric
confidencespecifies the degree of statistical confidence (between zero and one) that a value falls within the range specified by min and max, or the proportion of observed values that fall within that range.
StatusOptional
Datatypeteidata.probability
Example
The MS. was lost in transmission by mail from <del rend="overstrike">  <gap reason="illegible"   extent="one or two lettersatLeast="1atMost="2unit="chars"/> </del> Philadelphia to the Graphic office, New York.
Example
Americares has been supporting the health sector in Eastern Europe since 1986, and since 1992 has provided <measure atLeast="120000000unit="USD"  commodity="currency">more than $120m</measure> in aid to Ukrainians.

Appendix A.3.25 att.resourced

att.resourced provides attributes by which a resource (such as an externally held media file) may be located.
Moduletei — Formal specification
Membersgraphic media
Attributes
url(uniform resource locator) specifies the URL from which the media concerned may be obtained.
StatusRequired
Datatypeteidata.pointer

Appendix A.3.26 att.typed

att.typed provides attributes that can be used to classify or subclassify elements in any way. [1.3.1. Attribute Classes 17.1.1. Words and Above 3.6.1. Referring Strings 3.7. Simple Links and Cross-References 3.6.5. Abbreviations and Their Expansions 3.13.1. Core Tags for Verse 7.2.5. Speech Contents 4.1.1. Un-numbered Divisions 4.1.2. Numbered Divisions 4.2.1. Headings and Trailers 4.4. Virtual Divisions 13.3.2.3. Personal Relationships 11.3.1.1. Core Elements for Transcriptional Work 16.1.1. Pointers and Links 16.3. Blocks, Segments, and Anchors 12.2. Linking the Apparatus to the Text 22.5.1.2. Defining Content Models: RELAX NG 8.3. Elements Unique to Spoken Texts 23.3.1.3. Modification of Attribute and Attribute Value Lists]
Moduletei — Formal specification
Membersatt.pointing.group[linkGrp] TEI addName affiliation application bibl birth change date death desc div education event figure forename graphic head idno incident kinesic label link listEvent listOrg listPerson listRelation measure media name nameLink note num occupation org orgName pb pc persName phr placeName recording ref relation roleName s seg sex state surname teiCorpus term text time title unit vocal w
Attributes
typecharacterizes the element in some sense, using any convenient classification scheme or typology.
StatusOptional
Datatypeteidata.enumerated
<div type="verse">  <head>Night in Tarras</head>  <lg type="stanza">   <l>At evening tramping on the hot white road</l>   <l></l>  </lg>  <lg type="stanza">   <l>A wind sprang up from nowhere as the sky</l>   <l></l>  </lg> </div>
Note

The type attribute is present on a number of elements, not all of which are members of att.typed, usually because these elements restrict the possible values for the attribute in a specific way.

subtype(subtype) provides a sub-categorization of the element, if needed
StatusOptional
Datatypeteidata.enumerated
Note

The subtype attribute may be used to provide any sub-classification for the element additional to that provided by its type attribute.

Schematron
<sch:rule context="tei:*[@subtype]"> <sch:assert test="@type">The <sch:name/> element should not be categorized in detail with @subtype unless also categorized in general with @type</sch:assert> </sch:rule>
Note

When appropriate, values from an established typology should be used. Alternatively a typology may be defined in the associated TEI header. If values are to be taken from a project-specific list, this should be defined using the <valList> element in the project-specific schema description, as described in 23.3.1.3. Modification of Attribute and Attribute Value Lists .

Appendix A.4 Datatypes

Appendix A.4.1 teidata.certainty

teidata.certainty defines the range of attribute values expressing a degree of certainty.
Moduletei — Formal specification
Used by
Content model
<content>
 <valList type="closed">
  <valItem ident="high"/>
  <valItem ident="medium"/>
  <valItem ident="low"/>
  <valItem ident="unknown"/>
 </valList>
</content>
    
Declaration
tei_teidata.certainty = "high" | "medium" | "low" | "unknown"
Note

Certainty may be expressed by one of the predefined symbolic values high, medium, or low. The value unknown should be used in cases where the encoder does not wish to assert an opinion about the matter.

Appendix A.4.2 teidata.count

teidata.count defines the range of attribute values used for a non-negative integer value used as a count.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <dataRef name="nonNegativeInteger"/>
</content>
    
Declaration
tei_teidata.count = xsd:nonNegativeInteger
Note

Any positive integer value or zero is permitted

Appendix A.4.3 teidata.duration.iso

teidata.duration.iso defines the range of attribute values available for representation of a duration in time using ISO 8601 standard formats
Moduletei — Formal specification
Used by
Content model
<content>
 <dataRef name="token"
  restriction="[0-9.,DHMPRSTWYZ/:+\-]+"/>
</content>
    
Declaration
tei_teidata.duration.iso = token { pattern = "[0-9.,DHMPRSTWYZ/:+\-]+" }
Example
<time dur-iso="PT0,75H">three-quarters of an hour</time>
Example
<date dur-iso="P1,5D">a day and a half</date>
Example
<date dur-iso="P14D">a fortnight</date>
Example
<time dur-iso="PT0.02S">20 ms</time>
Note

A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the last, which may have a decimal component (using either . or , as the decimal point; the latter is preferred). If any number is 0, then that number-letter pair may be omitted. If any of the H (hour), M (minute), or S (second) number-letter pairs are present, then the separator T must precede the first ‘time’ number-letter pair.

For complete details, see ISO 8601 Data elements and interchange formats — Information interchange — Representation of dates and times.

Appendix A.4.4 teidata.duration.w3c

teidata.duration.w3c defines the range of attribute values available for representation of a duration in time using W3C datatypes.
Moduletei — Formal specification
Used by
Content model
<content>
 <dataRef name="duration"/>
</content>
    
Declaration
tei_teidata.duration.w3c = xsd:duration
Example
<time dur="PT45M">forty-five minutes</time>
Example
<date dur="P1DT12H">a day and a half</date>
Example
<date dur="P7D">a week</date>
Example
<time dur="PT0.02S">20 ms</time>
Note

A duration is expressed as a sequence of number-letter pairs, preceded by the letter P; the letter gives the unit and may be Y (year), M (month), D (day), H (hour), M (minute), or S (second), in that order. The numbers are all unsigned integers, except for the S number, which may have a decimal component (using . as the decimal point). If any number is 0, then that number-letter pair may be omitted. If any of the H (hour), M (minute), or S (second) number-letter pairs are present, then the separator T must precede the first ‘time’ number-letter pair.

For complete details, see the W3C specification.

Appendix A.4.5 teidata.enumerated

teidata.enumerated defines the range of attribute values expressed as a single XML name taken from a list of documented possibilities.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <dataRef key="teidata.word"/>
</content>
    
Declaration
tei_teidata.enumerated = teidata.word
Note

Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Typically, the list of documented possibilities will be provided (or exemplified) by a value list in the associated attribute specification, expressed with a <valList> element.

Appendix A.4.6 teidata.language

teidata.language defines the range of attribute values used to identify a particular combination of human language and writing system. [6.1. Language Identification]
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <alternate>
  <dataRef name="language"/>
  <valList>
   <valItem ident=""/>
  </valList>
 </alternate>
</content>
    
Declaration
tei_teidata.language = xsd:language | ( "" )
Note

The values for this attribute are language ‘tags’ as defined in BCP 47. Currently BCP 47 comprises RFC 5646 and RFC 4647; over time, other IETF documents may succeed these as the best current practice.

A ‘language tag’, per BCP 47, is assembled from a sequence of components or subtags separated by the hyphen character (-, U+002D). The tag is made of the following subtags, in the following order. Every subtag except the first is optional. If present, each occurs only once, except the fourth and fifth components (variant and extension), which are repeatable.

language
The IANA-registered code for the language. This is almost always the same as the ISO 639 2-letter language code if there is one. The list of available registered language subtags can be found at http://www.iana.org/assignments/language-subtag-registry. It is recommended that this code be written in lower case.
script
The ISO 15924 code for the script. These codes consist of 4 letters, and it is recommended they be written with an initial capital, the other three letters in lower case. The canonical list of codes is maintained by the Unicode Consortium, and is available at http://unicode.org/iso15924/iso15924-codes.html. The IETF recommends this code be omitted unless it is necessary to make a distinction you need.
region
Either an ISO 3166 country code or a UN M.49 region code that is registered with IANA (not all such codes are registered, e.g. UN codes for economic groupings or codes for countries for which there is already an ISO 3166 2-letter code are not registered). The former consist of 2 letters, and it is recommended they be written in upper case; the list of codes can be searched or browsed at https://www.iso.org/obp/ui/#search/code/. The latter consist of 3 digits; the list of codes can be found at http://unstats.un.org/unsd/methods/m49/m49.htm.
variant
An IANA-registered variation. These codes ‘are used to indicate additional, well-recognized variations that define a language or its dialects that are not covered by other available subtags’.
extension
An extension has the format of a single letter followed by a hyphen followed by additional subtags. These exist to allow for future extension to BCP 47, but as of this writing no such extensions are in use.
private use
An extension that uses the initial subtag of the single letter x (i.e., starts with x-) has no meaning except as negotiated among the parties involved. These should be used with great care, since they interfere with the interoperability that use of RFC 4646 is intended to promote. In order for a document that makes use of these subtags to be TEI-conformant, a corresponding <language> element must be present in the TEI header.

There are two exceptions to the above format. First, there are language tags in the IANA registry that do not match the above syntax, but are present because they have been ‘grandfathered’ from previous specifications.

Second, an entire language tag can consist of only a private use subtag. These tags start with x-, and do not need to follow any further rules established by the IETF and endorsed by these Guidelines. Like all language tags that make use of private use subtags, the language in question must be documented in a corresponding <language> element in the TEI header.

Examples include

sn
Shona
zh-TW
Taiwanese
zh-Hant-HK
Chinese written in traditional script as used in Hong Kong
en-SL
English as spoken in Sierra Leone
pl
Polish
es-MX
Spanish as spoken in Mexico
es-419
Spanish as spoken in Latin America

The W3C Internationalization Activity has published a useful introduction to BCP 47, Language tags in HTML and XML.

Appendix A.4.7 teidata.name

teidata.name defines the range of attribute values expressed as an XML Name.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <dataRef name="Name"/>
</content>
    
Declaration
tei_teidata.name = xsd:Name
Note

Attributes using this datatype must contain a single word which follows the rules defining a legal XML name (see https://www.w3.org/TR/REC-xml/#dt-name): for example they cannot include whitespace or begin with digits.

Appendix A.4.8 teidata.numeric

teidata.numeric defines the range of attribute values used for numeric values.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <alternate>
  <dataRef name="double"/>
  <dataRef name="token"
   restriction="(\-?[\d]+/\-?[\d]+)"/>
  <dataRef name="decimal"/>
 </alternate>
</content>
    
Declaration
tei_teidata.numeric =
   xsd:double | token { pattern = "(\-?[\d]+/\-?[\d]+)" } | xsd:decimal
Note

Any numeric value, represented as a decimal number, in floating point format, or as a ratio.

To represent a floating point number, expressed in scientific notation, ‘E notation’, a variant of ‘exponential notation’, may be used. In this format, the value is expressed as two numbers separated by the letter E. The first number, the significand (sometimes called the mantissa) is given in decimal format, while the second is an integer. The value is obtained by multiplying the mantissa by 10 the number of times indicated by the integer. Thus the value represented in decimal notation as 1000.0 might be represented in scientific notation as 10E3.

A value expressed as a ratio is represented by two integer values separated by a solidus (/) character. Thus, the value represented in decimal notation as 0.5 might be represented as a ratio by the string 1/2.

Appendix A.4.9 teidata.outputMeasurement

teidata.outputMeasurement defines a range of values for use in specifying the size of an object that is intended for display.
Moduletei — Formal specification
Used by
Content model
<content>
 <dataRef name="token"
  restriction="[\-+]?\d+(\.\d+)?(%|cm|mm|in|pt|pc|px|em|ex|ch|rem|vw|vh|vmin|vmax)"/>
</content>
    
Declaration
tei_teidata.outputMeasurement =
   token
   {
      pattern = "[\-+]?\d+(\.\d+)?(%|cm|mm|in|pt|pc|px|em|ex|ch|rem|vw|vh|vmin|vmax)"
   }
Example
<figure>  <head>The TEI Logo</head>  <figDesc>Stylized yellow angle brackets with the letters <mentioned>TEI</mentioned> in    between and <mentioned>text encoding initiative</mentioned> underneath, all on a white    background.</figDesc>  <graphic height="600pxwidth="600px"   url="http://www.tei-c.org/logos/TEI-600.jpg"/> </figure>
Note

These values map directly onto the values used by XSL-FO and CSS. For definitions of the units see those specifications; at the time of this writing the most complete list is in the CSS3 working draft.

Appendix A.4.10 teidata.pattern

teidata.pattern defines attribute values which are expressed as a regular expression.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <dataRef name="token"/>
</content>
    
Declaration
tei_teidata.pattern = token
Note
A regular expression, often called a pattern, is an expression that describes a set of strings. They are usually used to give a concise description of a set, without having to list all elements. For example, the set containing the three strings Handel, Händel, and Haendel can be described by the pattern H(ä|ae?)ndel (or alternatively, it is said that the pattern H(ä|ae?)ndel matches each of the three strings)
Wikipedia

This TEI datatype is mapped to the XSD token datatype, and may therefore contain any string of characters. However, it is recommended that the value used conform to the particular flavour of regular expression syntax supported by XSD Schema.

Appendix A.4.11 teidata.pointer

teidata.pointer defines the range of attribute values used to provide a single URI, absolute or relative, pointing to some other resource, either within the current document or elsewhere.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <dataRef restriction="\S+" name="anyURI"/>
</content>
    
Declaration
tei_teidata.pointer = xsd:anyURI { pattern = "\S+" }
Note

The range of syntactically valid values is defined by RFC 3986 Uniform Resource Identifier (URI): Generic Syntax. Note that the values themselves are encoded using RFC 3987 Internationalized Resource Identifiers (IRIs) mapping to URIs. For example, https://secure.wikimedia.org/wikipedia/en/wiki/% is encoded as https://secure.wikimedia.org/wikipedia/en/wiki/%25 while http://موقع.وزارة-الاتصالات.مصر/ is encoded as http://xn--4gbrim.xn----rmckbbajlc6dj7bxne2c.xn--wgbh1c/

Appendix A.4.12 teidata.prefix

teidata.prefix defines a range of values that may function as a URI scheme name.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <dataRef name="token"
  restriction="[a-z][a-z0-9\+\.\-]*"/>
</content>
    
Declaration
tei_teidata.prefix = token { pattern = "[a-z][a-z0-9\+\.\-]*" }
Note

This datatype is used to constrain a string of characters to one that can be used as a URI scheme name according to RFC 3986, section 3.1. Thus only the 26 lowercase letters a–z, the 10 digits 0–9, the plus sign, the period, and the hyphen are permitted, and the value must start with a letter.

Appendix A.4.13 teidata.probCert

teidata.probCert defines a range of attribute values which can be expressed either as a numeric probability or as a coded certainty value.
Moduletei — Formal specification
Used by
Content model
<content>
 <alternate>
  <dataRef key="teidata.probability"/>
  <dataRef key="teidata.certainty"/>
 </alternate>
</content>
    
Declaration
tei_teidata.probCert = teidata.probability | teidata.certainty

Appendix A.4.14 teidata.probability

teidata.probability defines the range of attribute values expressing a probability.
Moduletei — Formal specification
Used by
Content model
<content>
 <dataRef name="double"/>
</content>
    
Declaration
tei_teidata.probability = xsd:double
Note

Probability is expressed as a real number between 0 and 1; 0 representing certainly false and 1 representing certainly true.

Appendix A.4.15 teidata.replacement

teidata.replacement defines attribute values which contain a replacement template.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <textNode/>
</content>
    
Declaration
tei_teidata.replacement = text

Appendix A.4.16 teidata.temporal.iso

teidata.temporal.iso defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the international standard Data elements and interchange formats – Information interchange – Representation of dates and times.
Moduletei — Formal specification
Used by
Content model
<content>
 <alternate>
  <dataRef name="date"/>
  <dataRef name="gYear"/>
  <dataRef name="gMonth"/>
  <dataRef name="gDay"/>
  <dataRef name="gYearMonth"/>
  <dataRef name="gMonthDay"/>
  <dataRef name="time"/>
  <dataRef name="dateTime"/>
  <dataRef name="token"
   restriction="[0-9.,DHMPRSTWYZ/:+\-]+"/>
 </alternate>
</content>
    
Declaration
tei_teidata.temporal.iso =
   xsd:date
 | xsd:gYear
 | xsd:gMonth
 | xsd:gDay
 | xsd:gYearMonth
 | xsd:gMonthDay
 | xsd:time
 | xsd:dateTime
 | token { pattern = "[0-9.,DHMPRSTWYZ/:+\-]+" }
Note

If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.

For all representations for which ISO 8601:2004 describes both a basic and an extended format, these Guidelines recommend use of the extended format.

Appendix A.4.17 teidata.temporal.w3c

teidata.temporal.w3c defines the range of attribute values expressing a temporal expression such as a date, a time, or a combination of them, that conform to the W3C XML Schema Part 2: Datatypes Second Edition specification.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <alternate>
  <dataRef name="date"/>
  <dataRef name="gYear"/>
  <dataRef name="gMonth"/>
  <dataRef name="gDay"/>
  <dataRef name="gYearMonth"/>
  <dataRef name="gMonthDay"/>
  <dataRef name="time"/>
  <dataRef name="dateTime"/>
 </alternate>
</content>
    
Declaration
tei_teidata.temporal.w3c =
   xsd:date
 | xsd:gYear
 | xsd:gMonth
 | xsd:gDay
 | xsd:gYearMonth
 | xsd:gMonthDay
 | xsd:time
 | xsd:dateTime
Note

If it is likely that the value used is to be compared with another, then a time zone indicator should always be included, and only the dateTime representation should be used.

Appendix A.4.18 teidata.text

teidata.text defines the range of attribute values used to express some kind of identifying string as a single sequence of Unicode characters possibly including whitespace.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <dataRef name="string"/>
</content>
    
Declaration
tei_teidata.text = string
Note

Attributes using this datatype must contain a single ‘token’ in which whitespace and other punctuation characters are permitted.

Appendix A.4.19 teidata.truthValue

teidata.truthValue defines the range of attribute values used to express a truth value.
Moduletei — Formal specification
Used by
Content model
<content>
 <dataRef name="boolean"/>
</content>
    
Declaration
tei_teidata.truthValue = xsd:boolean
Note

The possible values of this datatype are 1 or true, or 0 or false.

This datatype applies only for cases where uncertainty is inappropriate; if the attribute concerned may have a value other than true or false, e.g. unknown, or inapplicable, it should have the extended version of this datatype: teidata.xTruthValue.

Appendix A.4.20 teidata.versionNumber

teidata.versionNumber defines the range of attribute values used for version numbers.
Moduletei — Formal specification
Used by
Element:
Content model
<content>
 <dataRef name="token"
  restriction="[\d]+[a-z]*[\d]*(\.[\d]+[a-z]*[\d]*){0,3}"/>
</content>
    
Declaration
tei_teidata.versionNumber =
   token { pattern = "[\d]+[a-z]*[\d]*(\.[\d]+[a-z]*[\d]*){0,3}" }

Appendix A.4.21 teidata.word

teidata.word defines the range of attribute values expressed as a single word or token.
Moduletei — Formal specification
Used by
teidata.enumeratedElement:
Content model
<content>
 <dataRef name="token"
  restriction="[^\p{C}\p{Z}]+"/>
</content>
    
Declaration
tei_teidata.word = token { pattern = "[^\p{C}\p{Z}]+" }
Note

Attributes using this datatype must contain a single ‘word’ which contains only letters, digits, punctuation characters, or symbols: thus it cannot include whitespace.

Appendix A.4.22 teidata.xTruthValue

teidata.xTruthValue (extended truth value) defines the range of attribute values used to express a truth value which may be unknown.
Moduletei — Formal specification
Used by
Content model
<content>
 <alternate>
  <dataRef name="boolean"/>
  <valList>
   <valItem ident="unknown"/>
   <valItem ident="inapplicable"/>
  </valList>
 </alternate>
</content>
    
Declaration
tei_teidata.xTruthValue = xsd:boolean | ( "unknown" | "inapplicable" )
Note

In cases where where uncertainty is inappropriate, use the datatype teidata.TruthValue.

Appendix A.4.23 teidata.xpath

teidata.xpath defines attribute values which contain an XPath expression.
Moduletei — Formal specification
Used by
Content model
<content>
 <textNode/>
</content>
    
Declaration
tei_teidata.xpath = text
Note

Any XPath expression using the syntax defined in 6.2..

When writing programs that evaluate XPath expressions, programmers should be mindful of the possibility of malicious code injection attacks. For further information about XPath injection attacks, see the article at OWASP.

Notes
1
Note that this is a illustrative example, i.e. a valid ParlaMint corpus would also need certain attributes to be defined on the illustrated elements. This holds for all the examples in this section.
2
Note that parliaments also have unaffiliated (or independent) MPs, that can either belong to a special ‘unaffiliated’ parliamentary group or don't belong to any parliamentary group. For the former, they are simply not affiliated to any parliamentary group. For the latter, an ‘unaffiliated’ parlimentaryGroup organisation must be created, and such MPs are affiliated with it as members.
3
The typical situation is that the organisation somebody is affiliated with is specificed as a organisation, using the <org> element (cf. the Section on Organisations) but if this is not the case, using <orgName> directly in the <affiliation> is an alternative encoding.
4
Note that, according to TEI, a label should not appear in a state element in case this element contains subordinate state elements. In other words, this label makes the current encoding of ParlaMint corpora invalid according to TEI. This should be corrected in a future release.
5
Note that, in general, the utterance can also be split in the middle of a sentence, which brings with it problems for automatic linguistic processing, as, ideally, the parts should be first joined, and only then processed.
6
These are typically tagset developed and used for specific languages and can be found in the XPOS column of CoNLL-U files, which is the native format for UD treebanks.
7
Note that the example is rendered in three lines, however, the correct encoding in the corpus is actually in a single line, without any spaces between the elements, as otherwise the new line and indenting spaces are actually a part of the word ‘abyste’.
8
Because <name> and <phr> can give conflicting markup (i.e. crossing tags) the current script annotates phrases only where they are not related to names, i.e. not only conflicting markup, but also nestings of phr/name and name/phr are forbidden and such MWEs are not retained in the XML. Furthermore, due to a bug in the script, phrases adjecent to names are also not retained. We hope to introduce a better script and encoding in the future.
Tomaž Erjavec, tomaz.erjavec@ijs.si, Matyáš Kopp, kopp@ufal.mff.cuni.cz and Andrej Pančur, andrej.pancur@inz.si. Date: 2023-11-06