This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.
Note that this HTML translation does not contain all the information
from the original document.
Uses ISO 8859-1 (Latin-1) encoding.
CES header
Version: 4.1, Type: text, Language: en,
Creator: DT, Status: update, Created: 1997-11-04, Updated: 1997-12-21
File Description
- Title Statement
- Title:
- Multext-East cesAna: Nineteen Eighty-Four, Romanian
- Responsibility
- Dan Tufis
(Overall Responsibility)
Ana-Maria Barbu
(Hand-tagging the whole book)
Vasile Patrascu
(Conversion to cesAna DTD )
- Edition:
- MTE Final Release
- Extent:
- 101508 words, 27.1 MB
Note: wordCount represents he number of TOK TYPE=WORD
elements in the text. byteCount is in megaBytes
- Publication Statement
- Distributor:
-
Center for Advanced Research in Machine Learning, Natural Language
Processing and Conceptual Modelling
- Address:
- Casa Academiei,13, "13 Septembrie, Bucharest 5, 74311, Romania"
- Electronic address:
- email: tufis@valhalla.racai.ro
- Electronic address:
- www: http://nl.ijs.si/ME
- Availiability:
-
Available for research purposes upon receipt of signed agreement
- Publication date:
- January 1st, 1998
- Source Description
- Structured Bibliography
- Monography
- Title:
- O mie noua sute optzeci si patru
- Author:
- George Orwell
- Author:
- Translator: Mihnea Gafita
- Imprint
- Publication date:
- 1991
- Publisher:
- Editura Univers
- Place:
- Bucharest
Encoding Description
- Project Description:
-
MULTEXT-East:
Multilingual Text Tools and Corpora for Central and Eastern
European Languages.
EU Copernicus Project COP106
- Editorial declaration:
- Transduction:
-
The electronic form was obtained by keyboarding at the
Center for Advanced Research in Machine Learning, Natural Language
Processing and Conceptual Modelling, spell-checked and hand tagged.
In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and
HEAD, POEM, LIST elements have been omitted. cesDoc P
elements are encoded as PAR, and S as S.
cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
- Quotation:
-
Q and QUOTE tags from the cesDoc source not retained.
- Segmentation:
-
S segmentation same as in cesDoc source (hand-validated).
TOK segmentation performed with mtseg and manually corrected,
- Tag declaration:
- chunkList = 1
-
Element corresponds to TEXT of the cesDoc source
- chunk = 1
-
Element corresponds to BODY of the cesDoc source
- par = 1343
-
Elements correspond to P, POEM, LIST, HEAD elements of the cesDoc source.
The FROM attribute gives the reference to the ID of the
corresponding cesDoc P element.
- s = 6521
-
Elements correspond to S, L, ITEM elements of the cesDoc source
The FROM attribute gives the reference to the ID of the
corresponding cesDoc S element.
- tok = 118063
-
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute
giving the mtseg class of the token.
- orth = 118063
-
Contains the orthography of the token, as found in the
cesDoc source.
- disamb = 101508
-
Contains disambiguated lexical information.
- lex = 189695
-
Contains undisambiguated lexical information.
- base = 291203
-
Base or lemmma of a token.
- msd = 291203
-
Morphosyntactic description of a token.
- ctag = 307758
-
Revision Description
- Date: 1997-11-04 (Tomaz Erjavec, IJS)
-
- Date: 1997-11-06 (Vasile Patrascu)
-
The Tagusage, Bytecount and Wordcount were updated. Entities that
were counted as words are those that were identified by the segmenter
that is words, clitics, compounds (counted as one unit, irrespective
of the number of constituents), punctuation, numbers.
- Date: 1997-12-21 (Tomaz Erjavec, IJS)
- Modified EDITIONSTMT and changed ... to ...
Meta-Made by et