This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.

Note that this HTML translation does not contain all the information from the original document.

Uses ISO 8859-1 (Latin-1) encoding.


CES header

Version: 4.1, Type: text, Language: en,
Creator: OCS, Status: update, Created: 1997-11-24, Updated: 1997-12-21

File Description

Title Statement
Title:
Multext-East cesAna: Nineteen Eighty-Four, Hungarian
Responsibility
Csaba Oravecz (Overall Responsibility) Vladimír Petkevic (Conversion to cesAna DTD )
Edition:
MTE Final Release
Extent:
80705 words, 18.4 MB
Note: wordCount represents he number of TOK TYPE=WORD elements in the text. byteCount is in megaBytes
Publication Statement
Distributor:
Research Institute for Linguistics, Hungarian Academy of Sciences
Address:
Budapest, Színház u. 5-9.
Electronic address:
email: oravecz@nytud.hu
Electronic address:
www: http://www.nytud.hu
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
January 1st, 1998
Source Description
Structured Bibliography
Monography
Title:
1984
Author:
George Orwell
Imprint
Publication date:
1989
Publisher:
Európa Könyvkiadó
Place:
Budapest

Encoding Description

Project Description:
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Editorial declaration:
Transduction:
In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and HEAD, POEM, LIST elements have been omitted. cesDoc P elements are encoded as PAR, and S as S. cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
Quotation:
Q and QUOTE tags from the cesDoc source not retained.
Segmentation:
S segmentation same as in cesDoc source (hand-validated). TOK segmentation performed with mtseg and manually corrected,
Tag declaration:
chunkList = 1
Element corresponds to TEXT of the cesDoc source
chunk = 1
Element corresponds to BODY of the cesDoc source
par = 1303
Elements correspond to P elements of the cesDoc source. The FROM attribute gives the reference to the ID of the corresponding cesDoc P element.
s = 6768
Elements correspond to S elements of the cesDoc source The FROM attribute gives the reference to the ID of the corresponding cesDoc S element.
tok = 98426
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute giving the mtseg class of the token.
orth = 98426
Contains the orthography of the token, as found in the cesDoc source.
disamb = 80705
Contains disambiguated lexical information.
lex = 111945
Contains undisambiguated lexical information.
base = 192650
Base or lemmma of a token.
msd = 192650
Morphosyntactic description of a token.
ctag = 98426
Corpus tag.

Revision Description

Date: 1997-11-24 (Csaba Oravecz, RIL)
Date: 1997-12-21 (Tomaz Erjavec, IJS)


Meta-Made by et