This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.
Note that this HTML translation does not contain all the information
from the original document.
Uses ISO 8859-1 (Latin-1) encoding.
CES header
Version: 4.1, Type: text, Language: en,
Creator: LD, Status: update, Created: 1997-11-30, Updated: 1997-12-21
File Description
- Title Statement
- Title:
- Multext-East cesAna: Nineteen Eighty-Four, Bulgarian
- Responsibility
- Ludmila Dimitrova, Lydia Sinapova
(Overall Responsibility)
Ludmila Dimitrova, Kiril Simov
(Hand-tagging of first chapter first part)
Ludmila Dimitrova
(Hand-tagging of second chapter first part, first
chapter second part)
Vladimír Petkevic
(Conversion to cesAna DTD )
- Edition:
- MTE Final Release
- Extent:
- 86020 words, 29.9 MB
Note: wordCount represents the number of TOK TYPE=WORD
elements in the text.
- Publication Statement
- Distributor:
-
Institute of Mathematics and Informatics,
Bulgarian Academy of Sciences, Sofia
- Address:
-
Acad G. Bonchev st. bl.8
1113 Sofia, Bulgaria
- Electronic address:
- email: ludmila@ling.math.acad.bg
- Availiability:
-
Available for research purposes upon receipt of signed
agreement
- Publication date:
- January 1st, 1998
- Source Description
- Full Bibliography
- Title Statement
- Title:
- Multext-East CES1: Nineteen Eighty-Four, Bulgarian
- Publication Statement
- Distributor:
-
Institute of Mathematics and Informatics,
Bulgarian Academy of Sciences, Sofia
- Address:
-
Acad G. Bonchev st. bl.8
1113 Sofia, Bulgaria
- Electronic address:
- email: ludmila@ling.math.acad.bg
- Availiability:
-
Available for research purposes upon receipt of signed
agreement
- Publication date:
- October 1, 1997
- Source Description
- Full Bibliography
- Title Statement
- Title:
- Electronic form of 1984 by George Orwell in
Bulgarian
- Responsibility
-
Ludmila Dimitrova (BAS), Lydia Sinapova (BAS),
Kiril Simov(BAS)
(
Typing-in 1984.
)
- Publication Statement
- Distributor:
-
Institute of Mathematics and Informatics,
Bulgarian Academy of Sciences, Sofia
- Address:
-
Acad G. Bonchev st. bl.8
1113 Sofia, Bulgaria
- Availiability:
-
Available for research purposes upon receipt of signed
agreement
- Publication date:
- 1997
- Source Description
- Structured Bibliography
- Monography
- Title:
- 1984)
- Author:
- George Orwell
- Author:
- Translator: Lydia Bozhilova
- Imprint
- Publication date:
- 1989
- Publisher:
- Profizdat
- Place:
- Sofia, Bulgaria
Encoding Description
- Project Description:
-
MULTEXT-East:
Multilingual Text Tools and Corpora for Central and Eastern
European Languages.
EU Copernicus Project COP106
- Editorial declaration:
- Transduction:
-
In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and
HEAD, POEM, LIST elements have been omitted. cesDoc P
elements are encoded as PAR, and S as S.
cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
- Quotation:
-
Q and QUOTE tags from the cesDoc source not retained.
- Segmentation:
-
S segmentation same as in cesDoc source (hand-validated).
TOK segmentation performed with mtseg and manually corrected,
- Tag declaration:
- chunklist = 1
-
Element corresponds to TEXT of the cesDoc source
- chunk = 1
-
Element corresponds to BODY of the cesDoc source
- par = 1322
-
Elements correspond to P elements of the cesDoc source.
The FROM attribute gives the reference to the ID of the
corresponding cesDoc P element.
- s = 6682
-
Elements correspond to S elements of the cesDoc source
The FROM attribute gives the reference to the ID of the
corresponding cesDoc S element.
- tok = 101173
-
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute
giving the mtseg class of the token.
- orth = 101173
-
Contains the orthography of the token, as found in the
cesDoc source.
- disamb = 86020
-
Contains disambiguated lexical information.
- lex = 156002
-
Contains undisambiguated lexical information.
- base = 242022
-
Base or lemma of a token.
- msd = 156002
-
Morphosyntactic description of a token.
- ctag = 257175
-
Corpus tag.
Revision Description
- Date: 1997-12-19 (Vladimír Petkevic, ÚTKL FFUK,
Prague)
- Filled in tags' usage, wordcount and bytecount
- Date: 1997-12-21 (Tomaz Erjavec, IJS)
- Converted from ISO Cyrillic to SGML entities
- Changed ... to ...
- Modified EDITIONSTMT, BYTECOUNT
Meta-Made by et