TEI Header

^§file description

^§title statement

MULTEXT-East cesDoc multilingual corpus

^§statement of responsibility

^§name	Tomaž Erjavec, JSI
^§responsibility	TEI encoding

^§statement of responsibility

^§name	Jean Veronis, Nancy Ide Laboratoire Parole et Langage Centre National de la Recherche Scientifique Aix-en-Provence, France
^§responsibility	MULTEXT-East Project management

^§statement of responsibility

^§name	Nancy Ide, Greg Priest-Dorman; Vassar
^§responsibility	CES encoding

^§statement of responsibility

^§name	Dan Tufiş, RACAI
^§responsibility	Romanian data

^§statement of responsibility

^§name	Tomaž Erjavec, JSI
^§responsibility	Slovene data

^§statement of responsibility

^§name	Vladimír Petkevič, ITCL
^§responsibility	Czech data

^§statement of responsibility

^§name	Tomaž Erjavec, JSI
^§responsibility	Slovene data

^§statement of responsibility

^§name	Ludmila Dimitrova, BAS
^§responsibility	Bulgarian data

^§statement of responsibility

^§name	Heiki-Jaan Kaalep, TU
^§responsibility	Estonian data

^§statement of responsibility

^§name	Csaba Oravecz, HAS
^§responsibility	Hungarian data

^§statement of responsibility

^§name	Paul Sokolovsky, SIT
^§responsibility	Russian data

^§statement of responsibility

^§name	Andrius Utka
^§responsibility	Lithuanian data

^§statement of responsibility

^§name	Cvetana Krstev
^§responsibility	Serbian data

^§funding body

EU Copernicus Project COP106 "MULTEXT-East"

^§funding body

EU Copernicus Concerted Action "TELRI"

^§funding body

EU Copernicus Project PL96-1142 "Concede"

^§funding body

EU Capacities Project GA 211938 "MondiLex"

^§funding body

Individual partners' grants and contracts.

^§edition statement

^§edition

MULTEXT-East, Version 4

^§extent

^§measure
type = words

2,029,874

^§publication statement

^§distributor

MULTEXT-East Web site

^§address

http://nl.ijs.si/ME/V4/

^§distributor

Individual partners, c.f. component headers

^§availability

Available for research purposes.

^§source description

^§citation list

^§bibliographic citation

^§title

Multext-East cesDoc: Nineteen Eighty-Four, English

^§bibliographic citation

^§title

Multext-East cesDoc: Speech, English

^§encoding description

^§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. _<pointer>

^§editorial practice declaration

^§normalization	Encoded to Corpus Encoding Standard Level 1 (CES1)
^§correction principles	See individual component headers.
^§quotation form = unknown	All quoation marks coverted to q, original rendering in rend attribute. At times marked with other attributes (who, type). q sometimes occurs within s - TEI extended to accommodate. See also individual component headers.
^§segmentation	Marked up to the level of paragraph: p, quote plus marking of sub-paragraph element q. Some marking of particular sub-paragraph elements, e.g. name, date, abbr, mentioned, distinct, foreign. See also individual component headers.
^§hyphenation	No end-of-line hyphenation present in texts.
^§standard values	The two-letter language codes follow ISO 639.

^§tagging declaration

^§namespace
name = http://www.tei-c.org/ns/1.0

^§tag usage gi = abbr occurs = 6706	abbreviation
^§tag usage gi = author occurs = 168	author
^§tag usage gi = bibl occurs = 168	bibliographic citation
^§tag usage gi = body occurs = 33	text body
^§tag usage gi = byline occurs = 1240	byline
^§tag usage gi = cell occurs = 75	cell
^§tag usage gi = closer occurs = 2	closer
^§tag usage gi = corr occurs = 1	editorial correction
^§tag usage gi = date occurs = 1721	date
^§tag usage gi = dateline occurs = 220	dateline
^§tag usage gi = div occurs = 3433	text division
^§tag usage gi = figDesc occurs = 115	description of figure
^§tag usage gi = figure occurs = 115	figure
^§tag usage gi = foreign occurs = 1306	foreign
^§tag usage gi = group occurs = 1	group
^§tag usage gi = head occurs = 3414	heading
^§tag usage gi = hi occurs = 6705	highlighted
^§tag usage gi = item occurs = 167	item
^§tag usage gi = l occurs = 785	verse line
^§tag usage gi = label occurs = 2	label
^§tag usage gi = lg occurs = 140	line group
^§tag usage gi = list occurs = 33	list
^§tag usage gi = measure occurs = 18	measure
^§tag usage gi = mentioned occurs = 1635	mentioned
^§tag usage gi = name occurs = 44823	name
^§tag usage gi = note occurs = 56	note
^§tag usage gi = num occurs = 5261	number
^§tag usage gi = opener occurs = 290	opener
^§tag usage gi = p occurs = 34684	paragraph
^§tag usage gi = ptr occurs = 42	pointer
^§tag usage gi = q occurs = 30236	separated from the surrounding text with quotation marks
^§tag usage gi = quote occurs = 1200	quotation
^§tag usage gi = ref occurs = 27	reference
^§tag usage gi = row occurs = 15	row
^§tag usage gi = s occurs = 78905	s-unit
^§tag usage gi = sp occurs = 251	speech
^§tag usage gi = speaker occurs = 6	speaker
^§tag usage gi = table occurs = 3	table
^§tag usage gi = term occurs = 6	term
^§tag usage gi = text occurs = 34	text
^§tag usage gi = time occurs = 87	time
^§tag usage gi = title occurs = 975	title

^§classification declarations

^§taxonomy

^§category
id = orwl

^§category description

Nineteen Eighty-Four

^§category
id = fict

^§category description

Fiction

^§category
id = news

^§category description

Newspapers

^§category
id = spch

^§category description

Speech

^§category
id = oana

^§category description

Nineteen Eighty-Four, Morphosyntactically Annotated

^§text-profile description

^§language usage

^§language ident = sl-rozaj	Resian (dialect of Slovene)
^§language ident = be	Byelorussian
^§language ident = bg	Bulgarian
^§language ident = br	Breton
^§language ident = ca	Catalan
^§language ident = co	Corsican
^§language ident = cs	Czech
^§language ident = cy	Welsh
^§language ident = da	Danish
^§language ident = de	German
^§language ident = el	Greek/Latin
^§language ident = en	English
^§language ident = es	Spanish
^§language ident = et	Estonian
^§language ident = eu	Basque
^§language ident = fi	Finnish
^§language ident = fr	French
^§language ident = ga	Irish
^§language ident = gd	Scots Gaelic
^§language ident = gl	Galician
^§language ident = hr	Croatian
^§language ident = hu	Hungarian
^§language ident = hy	Armenian
^§language ident = ik	Inupiak
^§language ident = is	Icelandic
^§language ident = it	Italian
^§language ident = ji	Yiddish
^§language ident = ka	Georgian/Ibero
^§language ident = kl	Greenlandic
^§language ident = la	Latin/Latin
^§language ident = lt	Lithuanian
^§language ident = lv	Latvian;Lettish
^§language ident = mk	Macedonian
^§language ident = mo	Moldavian
^§language ident = nl	Dutch
^§language ident = no	Norwegian
^§language ident = oc	Occitan
^§language ident = pl	Polish
^§language ident = pt	Portuguese
^§language ident = rm	Rhaeto-Romance
^§language ident = ro	Romanian
^§language ident = ru	Russian
^§language ident = sh	Serbo-Croatian
^§language ident = sk	Slovak
^§language ident = sl	Slovene
^§language ident = sq	Albanian
^§language ident = sr	Serbian
^§language ident = sv	Swedish
^§language ident = tr	Turkish
^§language ident = tt	Tatar
^§language ident = uk	Ukrainian

^§revision description

^§change	2009-11-04_<date>Tomaž Erjavec_<name>Conversion to MULTEXT-East TEI P5.
^§change	2004-05-10_<date>Tomaž Erjavec_<name>From BETA to FINAL V3
^§change	2004-04-09_<date>Tomaž Erjavec_<name>Updated Serbian Orwell
^§change	2004-03-12_<date>Tomaž Erjavec_<name>Some minor changes to Orwells
^§change	2004-02-27_<date>Tomaž Erjavec_<name>Harmonised with CONCEDE/cesAna Orwell corpus.
^§change	2004-02-25_<date>Tomaž Erjavec_<name>Included 1984 -lt, -sr, -ru in the corpus, converted the TELRI edition (SGML, CES) to TEI P4 and prepared the data for MTE V3.
^§change	1997-10-05_<date>Tomaz Erjavec, IJS_<name>Final MTE release
^§change	1996-11-02_<date>Tomaz Erjavec, IJS_<name>Internal Release for IM3