TEI Header

§file description
§title statement
§title
id = mten-et.title
Multext-East cesDoc corpus: Newspapers, Estonian
§statement of responsibility
§name Urve Talvik
§responsibility entered the text
§statement of responsibility
§name Riina Mosna
§responsibility entered the text
§statement of responsibility
§name Heiki-Jaan Kaalep
§responsibility supervised the work
§statement of responsibility
§name Heiki-Jaan Kaalep
§responsibility modified the header for version 4
§statement of responsibility
§name Tomaž Erjavec
§responsibility Conversion to XML/TEI P5
§edition statement
§edition MULTEXT-East, Version 4
§extent 112003<measure> WordCount represents the number of words in this text exclusive of tags and header information. ByteCount reflects the approximate size of the file containing the doctype and cesDoc element including all text, tags and header information.
§publication statement
§address http://nl.ijs.si/ME/V4/
§distributor TÜ arvutuslingvistika uurimisgrupp
§address Tiigi 78-232, Tartu, Estonia
§address eAddress: hkaalep@psych.ut.ee
§date
when = 2010-05-09
2010-05-09
§source description
§fully-structured bibliographic citation
§title statement
§title Multext-East CES1: Newspapers, Estonian
§statement of responsibility
name Urve Talvik
responsibility entered the text
§statement of responsibility
name Riina Mosna
responsibility entered the text
§statement of responsibility
name Heiki-Jaan Kaalep
responsibility supervised the work
§statement of responsibility
name Heiki-Jaan Kaalep
responsibility modified the header for version 4
§edition statement

MTE Final Release

§publication statement
§distributor TÜ arvutuslingvistika uurimisgrupp
§address Tiigi 78-232, Tartu, Estonia
§address eAddress: hkaalep@psych.ut.ee
§availability

Freely available

§date
when = 1997-10-01
October 1, 1997
§source description
§citation list
bibliographic citation
title Õhtuleht 25/04/1985
bibliographic citation
title Noorte Hääl 02/11/1985
bibliographic citation
title Noorte Hääl 26/12/1985
bibliographic citation
title Õhtuleht 26/12/1985
bibliographic citation
title Rahva Hääl 21/03/1985
bibliographic citation
title Rahva Hääl 15/05/1985
bibliographic citation
title Rahva Hääl 19/05/1985
bibliographic citation
title Noorte Hääl 28/05/1985
bibliographic citation
title Noorte Hääl 29/05/1985
bibliographic citation
title Punane Täht 11/06/1985
bibliographic citation
title Sirp ja Vasar 20/09/1985
§encoding description
§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

§editorial practice declaration
§normalization

Corpus Encoding Standard, Version 2.0 CES LEVEL: 1

§correction principles

§segmentation

Up to the level of sentences

§hyphenation

No end-of-line hyphenation

§tagging declaration
§namespace
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = abbr occurs = 1864
abbreviation
§tag usage
gi = author occurs = 168
author
§tag usage
gi = bibl occurs = 168
bibliographic citation
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = byline occurs = 333
byline
§tag usage
gi = corr occurs = 1
editorial correction
§tag usage
gi = date occurs = 11
date
§tag usage
gi = distinct occurs = 131
distinct
§tag usage
gi = div occurs = 388
text division
§tag usage
gi = docAuthor occurs = 333
docAuthor
§tag usage
gi = foreign occurs = 4
foreign
§tag usage
gi = head occurs = 356
heading
§tag usage
gi = hi occurs = 1008
highlighted
§tag usage
gi = item occurs = 244
item
§tag usage
gi = list occurs = 39
list
§tag usage
gi = name occurs = 7629
name
§tag usage
gi = note occurs = 2
note
§tag usage
gi = num occurs = 344
number
§tag usage
gi = p occurs = 2423
paragraph
§tag usage
gi = q occurs = 385
separated from the surrounding text with quotation marks
§tag usage
gi = quote occurs = 89
quotation
§tag usage
gi = ref occurs = 3
reference
§tag usage
gi = s occurs = 7758
s-unit
§tag usage
gi = text occurs = 1
text
§tag usage
gi = title occurs = 333
title
§text-profile description
§text classification
§category reference
target = news
§revision description
§change 10/31/96 <date> Heiki-Jaan Kaalep, UT <name> Changed the header to conform to the new CES version
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Normalisation of corpus component CESHEADER elements: CESHEADER, EDITIONSTMT, TITLESTMT/H.TITLE
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>ISO LANGUAGEs implemented as marked section PUBLIC ent
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Language (WSDs) implemented as PUBLIC entities
§change 1997-09-25<date>Tomaž Erjavec<name>Changed editionStmt, byteCount, pubDate to final form
§change 2004-05-10<date>Tomaž Erjavec<name>Converted to TEI P4, prepared for MTE V3
§change 2010-05-09<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.