This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.

Note that this HTML translation does not contain all the information from the original document.

Uses ISO 8859-1 (Latin-1) encoding.


CES header

Version: 4.1, Type: text, Language: en,
Creator: VP, Status: update, Created: 1997-11-28, Updated: 1997-12-21

File Description

Title Statement
Title:
Multext-East cesAna: Nineteen Eighty-Four, Czech
Responsibility
Vladimír Petkevic (Overall Responsibility) Milena Hnátková (Hand-tagging of the first 3 chapters) (Revision of the tagger results) Vladimír Petkevic (Conversion to cesAna DTD)
Edition:
MTE Final Release
Extent:
79862 words, 24.4 MB
Note: wordCount represents he number of TOK TYPE=WORD elements in the text.
Publication Statement
Distributor:
Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Prague
Address:
Celetná 13 110 00 Praha 1, Czech Republic
Electronic address:
email: Vladimir.Petkevic@ff.cuni.cz
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
January 1st, 1998
Source Description
Full Bibliography
Title Statement
Title:
Multext-East CES1: Nineteen Eighty-Four, Czech
Publication Statement
Distributor:
Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Prague
Address:
Celetná 13, 110 00 Praha 1, Czech Republic
Electronic address:
email: Vladimir.Petkevic@ff.cuni.cz
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
November 1, 1997
Source Description
Full Bibliography
Title Statement
Title:
Electronic form of 1984 by George Orwell in Czech, obtained via OCR
Responsibility
Vladimír Petkevic Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Prague, Czech Republic (ÚTKL FFUK) ( OCR'ed the novel )
Publication Statement
Distributor:
Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Prague, Czech Republic (ÚTKL FFUK)
Address:
Celetná 13, Praha 1 Czech Republic
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
1998
Source Description
Structured Bibliography
Monography
Title:
1984
Author:
George Orwell
Author:
Translator: Eva Simecková
Imprint
Publication date:
1991
Publisher:
Nase vojsko
Place:
Prague, Czech Republic

Encoding Description

Project Description:
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Editorial declaration:
Transduction:
In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and HEAD, POEM, LIST elements have been omitted. cesDoc P elements are encoded as PAR, and S as S. cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
Quotation:
Q and QUOTE tags from the cesDoc source not retained.
Segmentation:
S segmentation same as in cesDoc source (hand-validated). TOK segmentation performed with mtseg and manually corrected,
Tag declaration:
chunklist = 1
Element corresponds to TEXT of the cesDoc source
chunk = 1
Element corresponds to BODY of the cesDoc source
par = 1297
Elements correspond to P elements of the cesDoc source. The FROM attribute gives the reference to the ID of the corresponding cesDoc P element.
s = 6751
Elements correspond to S elements of the cesDoc source The FROM attribute gives the reference to the ID of the corresponding cesDoc S element.
tok = 100358
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute giving the mtseg class of the token.
orth = 100358
Contains the orthography of the token, as found in the cesDoc source.
disamb = 79862
Contains disambiguated lexical information.
lex = 214368
Contains undisambiguated lexical information.
base = 294230
Base or lemma of a token.
msd = 294230
Morphosyntactic description of a token.
ctag = 20496
Corpus tag (only for punctuation).

Revision Description

Date: 1997-11-04 (Tomaz Erjavec, IJS)
Date: 1997-11-28 (Vladimír Petkevic, ÚTKL)
Date: 1997-12-21 (Tomaz Erjavec, IJS)


Meta-Made by et