An example of a TEI header

<teiHeader id="ijs-elan.H" type="corpus"
  creator="ET" status="update" 
  date.created="1999-04-14" date.updated="2002-02-05">
  <title lang="sl">Slovenskoangleški vzporedni korpus IJS-ELAN</title>
  <title lang="en">The IJS-ELAN Slovene/English Parallel Corpus</title>
   <name id="ET">Tomaž Erjavec, IJS <xref></xref></name>
   <resp lang="sl">Urednik</resp>
   <resp lang="en">Editor</resp>
   <name>Peter Holozan, Amebis d.o.o <xref></xref></name>
   <resp lang="sl">Leksikalne oznake</resp>
   <resp lang="en">Lexical annotation</resp>
   <name>Špela Vintar, FF <xref></xref></name>
   <resp lang="sl">Zagotovitev in poravnava: SPOR, ANX2, STRA, KMET, EKON, VADE, VINO</resp>
   <resp lang="en">Acquisition and alignment: SPOR, ANX2, STRA, KMET, EKON, VADE, VINO</resp>
   <name>Roman Maurer, FMF</name>
   <resp lang="sl">Prevod, zagotovitev in poravnava: LIGS, GNPO</resp>
   <resp lang="en">Translation, acquisition and alignment: LIGS, GNPO</resp>
   <name>Andrej Skubic, FF</name>
   <resp lang="sl">Zagotovitev in poravnava: KUCA, PARL, ECMR, EKOL</resp>
   <resp lang="en">Acquisition and alignment: KUCA, PARL, ECMR, EKOL</resp>
 <editionStmt><edition>Version 2.0</edition></editionStmt>
 <extent>59 MB; 1,092,012 words = 501,437 (sl) + 590,575 (en)</extent>
    <address lang="sl">
     <addrLine>Odsek za inteligentne sisteme</addrLine>
     <addrLine>Institut "Jožef Stefan"</addrLine>
     <addrLine>Jamova 93</addrLine>
     <addrLine>1000 Ljubljana</addrLine>
    <address lang="en">
     <addrLine>Dept. of Intelligent Systems,</addrLine>
     <addrLine>Jozef Stefan Institute</addrLine>
     <addrLine>Jamova 39</addrLine>
     <addrLine>SI-1000 Ljubljana</addrLine>
  <pubPlace><xref type="URL"></xref></pubPlace>
   <p lang="en">
    This parallel aligned corpus is freely available, provided that 
    the sources described in this Header or 
    in the Headers of its TEI.2 text elements 
    are acknowledged.
   <p lang="sl">
    Ta vzporedni poravnani korpus je prosto dostopen, pod pogojem, 
    da se citira njegove vire, dokumentirane v tej glavi ali v 
    glavah njegovih TEI.2 besedil.
    <p>This corpus is composed of 15 texts:</p>
       <bibl sameAs="usta.H" lang="sl-en">
         <title lang="en">Constitution of the Republic of Slovenia</title>
         <title lang="sl">Ustava Republike Slovenije</title>
         <extent>20 kW</extent>
       <bibl sameAs="kuca.H" lang="sl-en">
         <title lang="en">Speeches by the President of Slovenia, M. Kučan</title>
         <title lang="sl">Govori predsednika RS, M. Kučana</title>
         <extent>69 kW</extent>
       <bibl sameAs="parl.H" lang="sl-en">
         <title lang="en">Functioning of the National Assembly</title>
         <title lang="sl">Delovanje Državnega zbora</title>
         <extent>20 kW</extent>
       <bibl sameAs="ecmr.H" lang="sl-en">
         <title lang="en">Slovenian Economic Mirror; 13 issues, 98/99</title>
         <title lang="sl">Ekonomsko ogledalo; 13 številk 98/99</title>
         <extent>239 kW</extent>
       <bibl sameAs="ekol.H" lang="sl-en">
         <title lang="en">National Environmental Protection Programme</title>
         <title lang="sl">Nacionalni program varstva okolja</title>
         <extent>70 kW</extent>
       <bibl sameAs="spor.H" lang="sl-en">
         <title lang="en">Europe Agreement</title>
         <title lang="sl">Evropski sporazum</title>
         <extent>34 kW</extent>
       <bibl sameAs="anx2.H" lang="sl-en">
         <title lang="en">Europe Agreement - Annex II</title>
         <title lang="sl">Evropski sporazum - Priloga II</title>
         <extent>25 kW</extent>
       <bibl sameAs="stra.H" lang="sl-en">
         <title lang="en">Slovenia's Strategy for Integration into EU</title>
         <title lang="sl">Strategija Slovenije za vključevanje v EU</title>
         <extent>89 kW</extent>
       <bibl sameAs="kmet.H" lang="sl-en">
         <title lang="en">Slovenia's programme for accession to EU - agriculture</title>
         <title lang="sl">Državni program za prilagajanje zakonodaje - kmetijstvo</title>
         <extent>29 kW</extent>
       <bibl sameAs="ekon.H" lang="sl-en">
         <title lang="en">Slovenia's programme for accession to EU - economy</title>
         <title lang="sl">Državni program za prilagajanje zakonodaje - gospodarstvo</title>
         <extent>23 kW</extent>
       <bibl sameAs="vade.H" lang="sl-en">
         <title lang="en">Vademecum by Lek</title>
         <title lang="sl">Vademecum Lekove domače lekarne</title>
         <extent>24 kW</extent>
       <bibl sameAs="vino.H" lang="en-sl">
         <title lang="en">EC Council Regulation No 3290/94 - agriculture</title>
         <title lang="sl">Uredba sveta ES št. 3290/94 - kmetijstvo</title>
         <extent>69 kW</extent>
       <bibl sameAs="ligs.H" lang="en-sl">
         <title lang="en">Linux Installation and Getting Started</title>
         <title lang="sl">Namestitev in začetek dela z Linuxom</title>
         <extent>173 kW</extent>
       <bibl sameAs="gnpo.H" lang="en-sl">
         <title lang="en">GNU PO localisation</title>
         <title lang="sl">GNU PO lokalizacije</title>
         <extent>13 kW</extent>
       <bibl sameAs="orwl.H" lang="en-sl">
         <title lang="en">G. Orwell: Nineteen Eighty-Four</title>
         <title lang="sl">G. Orwell: 1984</title>
         <extent>195 kW</extent>
   <p>This corpus is the (updated version of the) LJU1 site (IJS) 
    contribution to the EU MLIS project ELAN: 
    European Language Activity Network
    For more information see the IJS-ELAN homepage
    <xref type="URL"></xref>
    and the ELAN project homepage
    <xref type="URL"></xref>
    <p>All formatting removed from originals removed.</p>
    <p>Only ASCII characters and SGML entities used: see the DTD for defined entities.</p>
    <p>List bullets normalised to -, or left as *.</p>
    <p>No line contains more than one start/end tag or more than one element;
       white space between tokens is preserved before RE.</p>
    <p>Quotation marks converted to " or '</p>
    <p>Start / end quote is indicated with the open / close values of the TYPE attribute of C.</p>
    <p>Segmentation into translation units and segments semi automatic with various tools</p>
    <p>Tokenisation with MULTEXT mtseg, correcting the results with Perl & Emacs.</p>
    <p>Words automatically marked with context disambiguated lemma and
     MULTEXT-East morphosyntactic description. English words
     additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).</p>
   <tagUsage gi="group" occurs="15">Element 'Group'.
    Attributes are LANG and ID.</tagUsage>
   <tagUsage gi="text" occurs="30">Element 'Text'.
    Attributes are LANG and ID.</tagUsage>
   <tagUsage gi="body" occurs="30">Element 'Body'.
   <tagUsage gi="p" occurs="30">Element 'Paragraph'.
   <tagUsage gi="seg" occurs="63800">Element 'Translation segment'. 
    Attribute is ID.</tagUsage>
   <tagUsage gi="s" occurs="13386">Element 'Sentence'. Only in 'orwl' text!
    Attribute is ID (value identical to original MTE bundle.</tagUsage>
   <tagUsage gi="w" occurs="1092012">Element 'Word'.
    Attributes are TYPE (only "special" words), CTAG (English only), ANA, LEMMA (only known words).</tagUsage>
   <tagUsage gi="c" occurs="174040">Element 'Punctuation'.
    Attributes are TYPE (only "special" punctuation), CTAG.</tagUsage>
    <name>Tomaž Erjavec</name>
    <resp>IJS ELAN</resp>
   <item>Initial corpus and corpus Header</item>
    <name>Tomaž Erjavec</name>
    <resp>IJS ELAN</resp>
   <item>IJS-ELAN pre-release</item>
    <name>Tomaž Erjavec</name>
    <resp>IJS ELAN</resp>
   <item>V1.0 Header</item>
    <name>Tomaž Erjavec</name> <resp>IJS ELAN</resp>
    <name>Roman Maurer</name> <resp>TYPE C</resp>
   <item>Some more errors of tokenisation corrected; 
     Availability statement changed.
     Final ELAN release; V1.1 Header.</item>
    <name>Miro Romih, Amebis d.o.o</name> <resp>Lemmatisation</resp>
   <item>Slovene part ambiguously lemmatised with BesAna lexicon; 
     unknown words left unlemmatised.
    <name>Tomaž Erjavec</name> <resp>IJS ELAN</resp>
   <item>Some more tokenisation errors corrected. SGML structure
     radically changed: now langauge texts are separate. 
     Introduced ANA attribute on words. Made the corpus 'nsgml':
     at least one tag and at most one element per line.
    <name>Tomaž Erjavec</name> <resp>IJS ELAN</resp>
   <item>Slovene texts tagged with TnT trained on "1984" and
     bettered with other resources. English texts also tagged with TnT 
     trained on "1984" and additionaly with TnT trained on the Penn Treebank
     and QTag email service.
    <name>Tomaž Erjavec</name> <resp>IJS ELAN</resp>
   <item>Prepared corpus V2.0 for distribution.

Tomaž Erjavec, 2002-03-08