This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.
Note that this HTML translation does not contain all the information
from the original document.
Uses ISO 8859-1 (Latin-1) encoding.
CES header
Version: 4.1, Type: text, Language: en,
Creator: VP, Status: update, Created: 1997-11-28, Updated: 1997-12-21
File Description
- Title Statement
- Title:
- Multext-East cesAna: Nineteen Eighty-Four, Czech
- Responsibility
- Vladimír Petkevic
(Overall Responsibility)
Milena Hnátková
(Hand-tagging of the first 3 chapters)
(Revision of the tagger results)
Vladimír Petkevic
(Conversion to cesAna DTD)
- Edition:
- MTE Final Release
- Extent:
- 79862 words, 24.4 MB
Note:
wordCount represents he number of TOK TYPE=WORD
elements in the text.
- Publication Statement
- Distributor:
-
Institute of Theoretical and Computational Linguistics,
Faculty of Philosophy, Charles University, Prague
- Address:
- Celetná 13 110 00 Praha 1, Czech Republic
- Electronic address:
- email: Vladimir.Petkevic@ff.cuni.cz
- Availiability:
-
Available for research purposes upon receipt of signed agreement
- Publication date:
- January 1st, 1998
- Source Description
- Full Bibliography
- Title Statement
- Title:
- Multext-East CES1: Nineteen Eighty-Four, Czech
- Publication Statement
- Distributor:
-
Institute of Theoretical and Computational Linguistics,
Faculty of Philosophy, Charles University, Prague
- Address:
- Celetná 13, 110 00 Praha 1, Czech Republic
- Electronic address:
- email: Vladimir.Petkevic@ff.cuni.cz
- Availiability:
-
Available for research purposes upon receipt of signed agreement
- Publication date:
- November 1, 1997
- Source Description
- Full Bibliography
- Title Statement
- Title:
-
Electronic form of 1984 by George Orwell in Czech,
obtained via OCR
- Responsibility
-
Vladimír Petkevic
Institute of Theoretical and Computational Linguistics,
Faculty of Philosophy, Charles University, Prague, Czech Republic
(ÚTKL FFUK)
(
OCR'ed the novel
)
- Publication Statement
- Distributor:
-
Institute of Theoretical and Computational Linguistics,
Faculty of Philosophy, Charles University, Prague, Czech Republic
(ÚTKL FFUK)
- Address:
-
Celetná 13, Praha 1
Czech Republic
- Availiability:
-
Available for research purposes upon receipt of signed
agreement
- Publication date:
- 1998
- Source Description
- Structured Bibliography
- Monography
- Title:
- 1984
- Author:
- George Orwell
- Author:
- Translator: Eva Simecková
- Imprint
- Publication date:
- 1991
- Publisher:
- Nase vojsko
- Place:
- Prague, Czech Republic
Encoding Description
- Project Description:
-
MULTEXT-East:
Multilingual Text Tools and Corpora for Central and Eastern
European Languages.
EU Copernicus Project COP106
- Editorial declaration:
- Transduction:
-
In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and
HEAD, POEM, LIST elements have been omitted. cesDoc P
elements are encoded as PAR, and S as S.
cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
- Quotation:
-
Q and QUOTE tags from the cesDoc source not retained.
- Segmentation:
-
S segmentation same as in cesDoc source (hand-validated).
TOK segmentation performed with mtseg and manually corrected,
- Tag declaration:
- chunklist = 1
-
Element corresponds to TEXT of the cesDoc source
- chunk = 1
-
Element corresponds to BODY of the cesDoc source
- par = 1297
-
Elements correspond to P elements of the cesDoc source.
The FROM attribute gives the reference to the ID of the
corresponding cesDoc P element.
- s = 6751
-
Elements correspond to S elements of the cesDoc source
The FROM attribute gives the reference to the ID of the
corresponding cesDoc S element.
- tok = 100358
-
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute
giving the mtseg class of the token.
- orth = 100358
-
Contains the orthography of the token, as found in the
cesDoc source.
- disamb = 79862
-
Contains disambiguated lexical information.
- lex = 214368
-
Contains undisambiguated lexical information.
- base = 294230
-
Base or lemma of a token.
- msd = 294230
-
Morphosyntactic description of a token.
- ctag = 20496
-
Corpus tag (only for punctuation).
Revision Description
- Date: 1997-11-04 (Tomaz Erjavec, IJS)
- Created initial header template and part of the content
- Date: 1997-11-28 (Vladimír Petkevic, ÚTKL)
- Created the specific part of the header content
- Date: 1997-12-21 (Tomaz Erjavec, IJS)
- Converted from ISO Latin-2 to SGML entities
- Modified EDITIONSTMT, BYTECOUNT
Meta-Made by et