This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.
Note that this HTML translation does not contain all the information
from the original document.
Uses ISO 8859-1 (Latin-1) encoding.
CES header
Version: 4.1, Type: text, Language: en,
Creator: HJK, Status: update, Created: 1997-11-28, Updated: 1997-12-21
File Description
- Title Statement
- Title:
- Multext-East cesAna: Nineteen Eighty-Four, Estonian
- Responsibility
- Heiki-Jaan Kaalep
(Overall Responsibility)
Kadri Muischnek
(Hand-tagging of part 1, chapter 1-4;
part 2 chapter 9)
Andriela Rääbis
(Hand-tagging of part 1, chapter 5-7;
part 3 chapter 1, 3, 4)
Heili Orav
(Hand-tagging of part 1, chapter 8;
part 3 chapter 2, 5, 6)
Helen Potter
(Hand-tagging of part 2, chapter 1-7)
Külli Habicht
(Hand-tagging of part 2, chapter 8)
Vladimír Petkevic
(Conversion to cesAna DTD )
- Edition:
- MTE Final Release
- Extent:
- 75433 words, 18.7 MB MB
Note: wordCount represents he number of TOK TYPE=WORD
elements in the text. byteCount is in megaBytes
- Publication Statement
- Distributor:
-
TÜ arvutuslingvistika uurimisgrupp
- Address:
- Tiigi 78-232, Tartu, Estonia
- Electronic address:
- email: hkaalep@psych.ut.ee
- Electronic address:
- www: http://www.cl.ut.ee
- Availiability:
-
Freely available
- Publication date:
- January 1st, 1998
- Source Description
- Full Bibliography
- Title Statement
- Title:
- Multext-East CES1: Nineteen Eighty-Four, Estonian
- Publication Statement
- Distributor:
-
TÜ arvutuslingvistika uurimisgrupp
- Address:
- Tiigi 78-232, Tartu, Estonia
- Electronic address:
- email: hkaalep@psych.ut.ee
- Electronic address:
- www: http://www.cl.ut.ee
- Availiability:
-
Freely available
- Publication date:
- October 1, 1997
- Source Description
- Structured Bibliography
- Monography
- Title:
- 1984
- Author:
- George Orwell
- Author:
- Translator: Elias Treeman
- Imprint
- Publication date:
- 1990
- Publisher:
- Loomingu Raamatukogu nr. 48-51
- Publisher:
- Perioodika
- Place:
- Tallinn
Encoding Description
- Project Description:
-
MULTEXT-East:
Multilingual Text Tools and Corpora for Central and Eastern
European Languages.
EU Copernicus Project COP106
- Editorial declaration:
- Transduction:
-
In the cesDoc to cesAna conversion, DIV, QUOTE tags and
HEAD, POEM, LIST elements have been omitted. cesDoc P
elements are encoded as PAR, and S as S.
Q tags have been encoded as punctuation symbols.
cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
- Quotation:
-
QUOTE tags from the cesDoc source not retained.
- Segmentation:
-
S segmentation same as in cesDoc source (hand-validated).
TOK segmentation performed with mtseg and manually corrected,
- Tag declaration:
- chunkList = 1
-
Element corresponds to TEXT of the cesDoc source
- chunk = 1
-
Element corresponds to BODY of the cesDoc source
- par = 1266
-
Elements correspond to P elements of the cesDoc source.
The FROM attribute gives the reference to the ID of the
corresponding cesDoc P element.
- s = 6478
-
Elements correspond to S elements of the cesDoc source
The FROM attribute gives the reference to the ID of the
corresponding cesDoc S element.
- tok = 94906
-
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute
giving the mtseg class of the token.
- orth = 94906
-
Contains the orthography of the token, as found in the
cesDoc source.
- disamb = 75433
-
Contains disambiguated lexical information.
- lex = 147542
-
Contains undisambiguated lexical information.
- base = 222975
-
Base or lemmma of a token.
- msd = 222975
-
Morphosyntactic description of a token.
- ctag = 94906
-
Corpus tag.
Revision Description
- Date: 1997-11-28 (Heiki-Jaan Kaalep)
-
- Date: 1997-12-21 (Tomaz Erjavec, IJS)
- Modified EDITIONSTMT and changed ... to ...
Meta-Made by et