This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.

Note that this HTML translation does not contain all the information from the original document.

Uses ISO 8859-1 (Latin-1) encoding.


CES header

Version: 4.1, Type: text, Language: en,
Creator: HJK, Status: update, Created: 1997-11-28, Updated: 1997-12-21

File Description

Title Statement
Title:
Multext-East cesAna: Nineteen Eighty-Four, Estonian
Responsibility
Heiki-Jaan Kaalep (Overall Responsibility) Kadri Muischnek (Hand-tagging of part 1, chapter 1-4; part 2 chapter 9) Andriela Rääbis (Hand-tagging of part 1, chapter 5-7; part 3 chapter 1, 3, 4) Heili Orav (Hand-tagging of part 1, chapter 8; part 3 chapter 2, 5, 6) Helen Potter (Hand-tagging of part 2, chapter 1-7) Külli Habicht (Hand-tagging of part 2, chapter 8) Vladimír Petkevic (Conversion to cesAna DTD )
Edition:
MTE Final Release
Extent:
75433 words, 18.7 MB MB
Note: wordCount represents he number of TOK TYPE=WORD elements in the text. byteCount is in megaBytes
Publication Statement
Distributor:
TÜ arvutuslingvistika uurimisgrupp
Address:
Tiigi 78-232, Tartu, Estonia
Electronic address:
email: hkaalep@psych.ut.ee
Electronic address:
www: http://www.cl.ut.ee
Availiability:
Freely available
Publication date:
January 1st, 1998
Source Description
Full Bibliography
Title Statement
Title:
Multext-East CES1: Nineteen Eighty-Four, Estonian
Publication Statement
Distributor:
TÜ arvutuslingvistika uurimisgrupp
Address:
Tiigi 78-232, Tartu, Estonia
Electronic address:
email: hkaalep@psych.ut.ee
Electronic address:
www: http://www.cl.ut.ee
Availiability:
Freely available
Publication date:
October 1, 1997
Source Description
Structured Bibliography
Monography
Title:
1984
Author:
George Orwell
Author:
Translator: Elias Treeman
Imprint
Publication date:
1990
Publisher:
Loomingu Raamatukogu nr. 48-51
Publisher:
Perioodika
Place:
Tallinn

Encoding Description

Project Description:
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Editorial declaration:
Transduction:
In the cesDoc to cesAna conversion, DIV, QUOTE tags and HEAD, POEM, LIST elements have been omitted. cesDoc P elements are encoded as PAR, and S as S. Q tags have been encoded as punctuation symbols. cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
Quotation:
QUOTE tags from the cesDoc source not retained.
Segmentation:
S segmentation same as in cesDoc source (hand-validated). TOK segmentation performed with mtseg and manually corrected,
Tag declaration:
chunkList = 1
Element corresponds to TEXT of the cesDoc source
chunk = 1
Element corresponds to BODY of the cesDoc source
par = 1266
Elements correspond to P elements of the cesDoc source. The FROM attribute gives the reference to the ID of the corresponding cesDoc P element.
s = 6478
Elements correspond to S elements of the cesDoc source The FROM attribute gives the reference to the ID of the corresponding cesDoc S element.
tok = 94906
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute giving the mtseg class of the token.
orth = 94906
Contains the orthography of the token, as found in the cesDoc source.
disamb = 75433
Contains disambiguated lexical information.
lex = 147542
Contains undisambiguated lexical information.
base = 222975
Base or lemmma of a token.
msd = 222975
Morphosyntactic description of a token.
ctag = 94906
Corpus tag.

Revision Description

Date: 1997-11-28 (Heiki-Jaan Kaalep)
Date: 1997-12-21 (Tomaz Erjavec, IJS)


Meta-Made by et