This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.
Note that this HTML translation does not contain all the information
from the original document.
Uses ISO 8859-1 (Latin-1) encoding.
CES header
Version: 4.1, Type: text, Language: en,
Creator: OCS, Status: update, Created: 1997-11-24, Updated: 1997-12-21
File Description
- Title Statement
- Title:
- Multext-East cesAna: Nineteen Eighty-Four, Hungarian
- Responsibility
- Csaba Oravecz
(Overall Responsibility)
Vladimír Petkevic
(Conversion to cesAna DTD )
- Edition:
- MTE Final Release
- Extent:
- 80705 words, 18.4 MB
Note: wordCount represents he number of TOK TYPE=WORD
elements in the text. byteCount is in megaBytes
- Publication Statement
- Distributor:
-
Research Institute for Linguistics, Hungarian Academy of Sciences
- Address:
- Budapest, Színház u. 5-9.
- Electronic address:
- email: oravecz@nytud.hu
- Electronic address:
- www: http://www.nytud.hu
- Availiability:
-
Available for research purposes upon receipt of signed agreement
- Publication date:
- January 1st, 1998
- Source Description
- Structured Bibliography
- Monography
- Title:
- 1984
- Author:
- George Orwell
- Imprint
- Publication date:
- 1989
- Publisher:
- Európa Könyvkiadó
- Place:
- Budapest
Encoding Description
- Project Description:
-
MULTEXT-East:
Multilingual Text Tools and Corpora for Central and Eastern
European Languages.
EU Copernicus Project COP106
- Editorial declaration:
- Transduction:
-
In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and
HEAD, POEM, LIST elements have been omitted. cesDoc P
elements are encoded as PAR, and S as S.
cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
- Quotation:
-
Q and QUOTE tags from the cesDoc source not retained.
- Segmentation:
-
S segmentation same as in cesDoc source (hand-validated).
TOK segmentation performed with mtseg and manually corrected,
- Tag declaration:
- chunkList = 1
-
Element corresponds to TEXT of the cesDoc source
- chunk = 1
-
Element corresponds to BODY of the cesDoc source
- par = 1303
-
Elements correspond to P elements of the cesDoc source.
The FROM attribute gives the reference to the ID of the
corresponding cesDoc P element.
- s = 6768
-
Elements correspond to S elements of the cesDoc source
The FROM attribute gives the reference to the ID of the
corresponding cesDoc S element.
- tok = 98426
-
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute
giving the mtseg class of the token.
- orth = 98426
-
Contains the orthography of the token, as found in the
cesDoc source.
- disamb = 80705
-
Contains disambiguated lexical information.
- lex = 111945
-
Contains undisambiguated lexical information.
- base = 192650
-
Base or lemmma of a token.
- msd = 192650
-
Morphosyntactic description of a token.
- ctag = 98426
-
Corpus tag.
Revision Description
- Date: 1997-11-24 (Csaba Oravecz, RIL)
-
- Date: 1997-12-21 (Tomaz Erjavec, IJS)
- Converted from ISO Latin-2 to SGML entities
- Changed ... to ...
- Modified EDITIONSTMT, BYTECOUNT
Meta-Made by et