This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.

Note that this HTML translation does not contain all the information from the original document.

Uses ISO 8859-1 (Latin-1) encoding.


CES header

Version: 4.1, Type: text, Language: en,
Creator: DT, Status: update, Created: 1997-11-04, Updated: 1997-12-21

File Description

Title Statement
Title:
Multext-East cesAna: Nineteen Eighty-Four, Romanian
Responsibility
Dan Tufis (Overall Responsibility) Ana-Maria Barbu (Hand-tagging the whole book) Vasile Patrascu (Conversion to cesAna DTD )
Edition:
MTE Final Release
Extent:
101508 words, 27.1 MB
Note: wordCount represents he number of TOK TYPE=WORD elements in the text. byteCount is in megaBytes
Publication Statement
Distributor:
Center for Advanced Research in Machine Learning, Natural Language Processing and Conceptual Modelling
Address:
Casa Academiei,13, "13 Septembrie, Bucharest 5, 74311, Romania"
Electronic address:
email: tufis@valhalla.racai.ro
Electronic address:
www: http://nl.ijs.si/ME
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
January 1st, 1998
Source Description
Structured Bibliography
Monography
Title:
O mie noua sute optzeci si patru
Author:
George Orwell
Author:
Translator: Mihnea Gafita
Imprint
Publication date:
1991
Publisher:
Editura Univers
Place:
Bucharest

Encoding Description

Project Description:
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Editorial declaration:
Transduction:
The electronic form was obtained by keyboarding at the Center for Advanced Research in Machine Learning, Natural Language Processing and Conceptual Modelling, spell-checked and hand tagged. In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and HEAD, POEM, LIST elements have been omitted. cesDoc P elements are encoded as PAR, and S as S. cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
Quotation:
Q and QUOTE tags from the cesDoc source not retained.
Segmentation:
S segmentation same as in cesDoc source (hand-validated). TOK segmentation performed with mtseg and manually corrected,
Tag declaration:
chunkList = 1
Element corresponds to TEXT of the cesDoc source
chunk = 1
Element corresponds to BODY of the cesDoc source
par = 1343
Elements correspond to P, POEM, LIST, HEAD elements of the cesDoc source. The FROM attribute gives the reference to the ID of the corresponding cesDoc P element.
s = 6521
Elements correspond to S, L, ITEM elements of the cesDoc source The FROM attribute gives the reference to the ID of the corresponding cesDoc S element.
tok = 118063
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute giving the mtseg class of the token.
orth = 118063
Contains the orthography of the token, as found in the cesDoc source.
disamb = 101508
Contains disambiguated lexical information.
lex = 189695
Contains undisambiguated lexical information.
base = 291203
Base or lemmma of a token.
msd = 291203
Morphosyntactic description of a token.
ctag = 307758

Revision Description

Date: 1997-11-04 (Tomaz Erjavec, IJS)
Date: 1997-11-06 (Vasile Patrascu)
Date: 1997-12-21 (Tomaz Erjavec, IJS)


Meta-Made by et