This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.

Note that this HTML translation does not contain all the information from the original document.

Uses ISO 8859-1 (Latin-1) encoding.


CES header

Version: 4.1, Type: text, Language: en,
Creator: LD, Status: update, Created: 1997-11-30, Updated: 1997-12-21

File Description

Title Statement
Title:
Multext-East cesAna: Nineteen Eighty-Four, Bulgarian
Responsibility
Ludmila Dimitrova, Lydia Sinapova (Overall Responsibility) Ludmila Dimitrova, Kiril Simov (Hand-tagging of first chapter first part) Ludmila Dimitrova (Hand-tagging of second chapter first part, first chapter second part) Vladimír Petkevic (Conversion to cesAna DTD )
Edition:
MTE Final Release
Extent:
86020 words, 29.9 MB
Note: wordCount represents the number of TOK TYPE=WORD elements in the text.
Publication Statement
Distributor:
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia
Address:
Acad G. Bonchev st. bl.8 1113 Sofia, Bulgaria
Electronic address:
email: ludmila@ling.math.acad.bg
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
January 1st, 1998
Source Description
Full Bibliography
Title Statement
Title:
Multext-East CES1: Nineteen Eighty-Four, Bulgarian
Publication Statement
Distributor:
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia
Address:
Acad G. Bonchev st. bl.8 1113 Sofia, Bulgaria
Electronic address:
email: ludmila@ling.math.acad.bg
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
October 1, 1997
Source Description
Full Bibliography
Title Statement
Title:
Electronic form of 1984 by George Orwell in Bulgarian
Responsibility
Ludmila Dimitrova (BAS), Lydia Sinapova (BAS), Kiril Simov(BAS) ( Typing-in 1984. )
Publication Statement
Distributor:
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia
Address:
Acad G. Bonchev st. bl.8 1113 Sofia, Bulgaria
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
1997
Source Description
Structured Bibliography
Monography
Title:
1984)
Author:
George Orwell
Author:
Translator: Lydia Bozhilova
Imprint
Publication date:
1989
Publisher:
Profizdat
Place:
Sofia, Bulgaria

Encoding Description

Project Description:
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Editorial declaration:
Transduction:
In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and HEAD, POEM, LIST elements have been omitted. cesDoc P elements are encoded as PAR, and S as S. cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
Quotation:
Q and QUOTE tags from the cesDoc source not retained.
Segmentation:
S segmentation same as in cesDoc source (hand-validated). TOK segmentation performed with mtseg and manually corrected,
Tag declaration:
chunklist = 1
Element corresponds to TEXT of the cesDoc source
chunk = 1
Element corresponds to BODY of the cesDoc source
par = 1322
Elements correspond to P elements of the cesDoc source. The FROM attribute gives the reference to the ID of the corresponding cesDoc P element.
s = 6682
Elements correspond to S elements of the cesDoc source The FROM attribute gives the reference to the ID of the corresponding cesDoc S element.
tok = 101173
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute giving the mtseg class of the token.
orth = 101173
Contains the orthography of the token, as found in the cesDoc source.
disamb = 86020
Contains disambiguated lexical information.
lex = 156002
Contains undisambiguated lexical information.
base = 242022
Base or lemma of a token.
msd = 156002
Morphosyntactic description of a token.
ctag = 257175
Corpus tag.

Revision Description

Date: 1997-12-19 (Vladimír Petkevic, ÚTKL FFUK, Prague)
Date: 1997-12-21 (Tomaz Erjavec, IJS)


Meta-Made by et