TEI Header

§file description
§title statement
§title
id = mte-cesdoc.title
MULTEXT-East cesDoc multilingual corpus
§statement of responsibility
§name Tomaž Erjavec, JSI
§responsibility TEI encoding
§statement of responsibility
§name Jean Veronis, Nancy Ide Laboratoire Parole et Langage Centre National de la Recherche Scientifique Aix-en-Provence, France
§responsibility MULTEXT-East Project management
§statement of responsibility
§name Nancy Ide, Greg Priest-Dorman; Vassar
§responsibility CES encoding
§statement of responsibility
§name Dan Tufiş, RACAI
§responsibility Romanian data
§statement of responsibility
§name Tomaž Erjavec, JSI
§responsibility Slovene data
§statement of responsibility
§name Vladimír Petkevič, ITCL
§responsibility Czech data
§statement of responsibility
§name Tomaž Erjavec, JSI
§responsibility Slovene data
§statement of responsibility
§name Ludmila Dimitrova, BAS
§responsibility Bulgarian data
§statement of responsibility
§name Heiki-Jaan Kaalep, TU
§responsibility Estonian data
§statement of responsibility
§name Csaba Oravecz, HAS
§responsibility Hungarian data
§statement of responsibility
§name Paul Sokolovsky, SIT
§responsibility Russian data
§statement of responsibility
§name Andrius Utka
§responsibility Lithuanian data
§statement of responsibility
§name Cvetana Krstev
§responsibility Serbian data
§funding body EU Copernicus Project COP106 "MULTEXT-East"
§funding body EU Copernicus Concerted Action "TELRI"
§funding body EU Copernicus Project PL96-1142 "Concede"
§funding body EU Capacities Project GA 211938 "MondiLex"
§funding body Individual partners' grants and contracts.
§edition statement
§edition MULTEXT-East, Version 4
§extent
§measure
type = words
2,029,874
§publication statement
§distributor MULTEXT-East Web site
§address http://nl.ijs.si/ME/V4/
§distributor Individual partners, c.f. component headers
§availability

Available for research purposes.

§source description
§citation list
§bibliographic citation
§title Multext-East cesDoc: Nineteen Eighty-Four, English
§bibliographic citation
§title Multext-East cesDoc: Speech, English
§encoding description
§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. <pointer>

§editorial practice declaration
§normalization

Encoded to Corpus Encoding Standard Level 1 (CES1)

§correction principles

See individual component headers.

§quotation
form = unknown

All quoation marks coverted to q, original rendering in rend attribute. At times marked with other attributes (who, type). q sometimes occurs within s - TEI extended to accommodate. See also individual component headers.

§segmentation

Marked up to the level of paragraph: p, quote plus marking of sub-paragraph element q. Some marking of particular sub-paragraph elements, e.g. name, date, abbr, mentioned, distinct, foreign. See also individual component headers.

§hyphenation

No end-of-line hyphenation present in texts.

§standard values

The two-letter language codes follow ISO 639.

§tagging declaration
§namespace
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = abbr occurs = 6706
abbreviation
§tag usage
gi = author occurs = 168
author
§tag usage
gi = bibl occurs = 168
bibliographic citation
§tag usage
gi = body occurs = 33
text body
§tag usage
gi = byline occurs = 1240
byline
§tag usage
gi = cell occurs = 75
cell
§tag usage
gi = closer occurs = 2
closer
§tag usage
gi = corr occurs = 1
editorial correction
§tag usage
gi = date occurs = 1721
date
§tag usage
gi = dateline occurs = 220
dateline
§tag usage
gi = div occurs = 3433
text division
§tag usage
gi = figDesc occurs = 115
description of figure
§tag usage
gi = figure occurs = 115
figure
§tag usage
gi = foreign occurs = 1306
foreign
§tag usage
gi = group occurs = 1
group
§tag usage
gi = head occurs = 3414
heading
§tag usage
gi = hi occurs = 6705
highlighted
§tag usage
gi = item occurs = 167
item
§tag usage
gi = l occurs = 785
verse line
§tag usage
gi = label occurs = 2
label
§tag usage
gi = lg occurs = 140
line group
§tag usage
gi = list occurs = 33
list
§tag usage
gi = measure occurs = 18
measure
§tag usage
gi = mentioned occurs = 1635
mentioned
§tag usage
gi = name occurs = 44823
name
§tag usage
gi = note occurs = 56
note
§tag usage
gi = num occurs = 5261
number
§tag usage
gi = opener occurs = 290
opener
§tag usage
gi = p occurs = 34684
paragraph
§tag usage
gi = ptr occurs = 42
pointer
§tag usage
gi = q occurs = 30236
separated from the surrounding text with quotation marks
§tag usage
gi = quote occurs = 1200
quotation
§tag usage
gi = ref occurs = 27
reference
§tag usage
gi = row occurs = 15
row
§tag usage
gi = s occurs = 78905
s-unit
§tag usage
gi = sp occurs = 251
speech
§tag usage
gi = speaker occurs = 6
speaker
§tag usage
gi = table occurs = 3
table
§tag usage
gi = term occurs = 6
term
§tag usage
gi = text occurs = 34
text
§tag usage
gi = time occurs = 87
time
§tag usage
gi = title occurs = 975
title
§classification declarations
§taxonomy
§category
id = orwl
§category description Nineteen Eighty-Four
§category
id = fict
§category description Fiction
§category
id = news
§category description Newspapers
§category
id = spch
§category description Speech
§category
id = oana
§category description Nineteen Eighty-Four, Morphosyntactically Annotated
§text-profile description
§language usage
§language
ident = sl-rozaj
Resian (dialect of Slovene)
§language
ident = be
Byelorussian
§language
ident = bg
Bulgarian
§language
ident = br
Breton
§language
ident = ca
Catalan
§language
ident = co
Corsican
§language
ident = cs
Czech
§language
ident = cy
Welsh
§language
ident = da
Danish
§language
ident = de
German
§language
ident = el
Greek/Latin
§language
ident = en
English
§language
ident = es
Spanish
§language
ident = et
Estonian
§language
ident = eu
Basque
§language
ident = fi
Finnish
§language
ident = fr
French
§language
ident = ga
Irish
§language
ident = gd
Scots Gaelic
§language
ident = gl
Galician
§language
ident = hr
Croatian
§language
ident = hu
Hungarian
§language
ident = hy
Armenian
§language
ident = ik
Inupiak
§language
ident = is
Icelandic
§language
ident = it
Italian
§language
ident = ji
Yiddish
§language
ident = ka
Georgian/Ibero
§language
ident = kl
Greenlandic
§language
ident = la
Latin/Latin
§language
ident = lt
Lithuanian
§language
ident = lv
Latvian;Lettish
§language
ident = mk
Macedonian
§language
ident = mo
Moldavian
§language
ident = nl
Dutch
§language
ident = no
Norwegian
§language
ident = oc
Occitan
§language
ident = pl
Polish
§language
ident = pt
Portuguese
§language
ident = rm
Rhaeto-Romance
§language
ident = ro
Romanian
§language
ident = ru
Russian
§language
ident = sh
Serbo-Croatian
§language
ident = sk
Slovak
§language
ident = sl
Slovene
§language
ident = sq
Albanian
§language
ident = sr
Serbian
§language
ident = sv
Swedish
§language
ident = tr
Turkish
§language
ident = tt
Tatar
§language
ident = uk
Ukrainian
§revision description
§change 2009-11-04<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.
§change 2004-05-10<date>Tomaž Erjavec<name>From BETA to FINAL V3
§change 2004-04-09<date>Tomaž Erjavec<name>Updated Serbian Orwell
§change 2004-03-12<date>Tomaž Erjavec<name>Some minor changes to Orwells
§change 2004-02-27<date>Tomaž Erjavec<name>Harmonised with CONCEDE/cesAna Orwell corpus.
§change 2004-02-25<date>Tomaž Erjavec<name>Included 1984 -lt, -sr, -ru in the corpus, converted the TELRI edition (SGML, CES) to TEI P4 and prepared the data for MTE V3.
§change 1997-10-05<date>Tomaz Erjavec, IJS<name>Final MTE release
§change 1996-11-02<date>Tomaz Erjavec, IJS<name>Internal Release for IM3