This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.

Note that this HTML translation does not contain all the information from the original document.

Uses ISO 8859-1 (Latin-1) encoding.


CES header

Version: 4.1, Type: text, Language: en,
Creator: ET, Status: update, Created: 1997-11-04, Updated: 1997-12-20

File Description

Title Statement
Title:
Multext-East cesAna: Nineteen Eighty-Four, Slovene
Responsibility
Tomaz Erjavec (Overall Responsibility) Aleksandra Bizjak, Primoz Jakopin (Tagging) Vladimír Petkevic (Conversion to cesAna DTD )
Edition:
MTE Final Release
Extent:
90768 words, 22.5 MB
Note: wordCount represents the number of TOK TYPE=WORD elements in the text.
Publication Statement
Distributor:
Dept. for Intelligent Systems, Jozef Stefan Institute
Address:
Jamova 39, SI-1000 Ljubljana, Slovenia
Electronic address:
email: tomaz.erjavec@ijs.si
Electronic address:
www: http://nl.ijs.si/ME
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
January 1st, 1998
Source Description
Full Bibliography
Title Statement
Title:
Multext-East CES1: Nineteen Eighty-Four, Slovene
Publication Statement
Distributor:
Dept. for Intelligent Systems, Jozef Stefan Institute
Address:
Jamova 39, SI-1000 Ljubljana, Slovenia
Electronic address:
email: tomaz.erjavec@ijs.si
Electronic address:
www: http://nl.ijs.si/ME
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
October 1, 1997
Source Description
Full Bibliography
Title Statement
Title:
The European Corpus Initiative Multilingual Corpus 1: 1984 by George Orwell (Slovene)
Responsibility
Association for Computational Linguistics (Converted from OTA's DTD to ECI DTD)
Publication Statement
Distributor:
ACL
Address:
ACL
Availiability:
Available for research purposes upon receipt of signed agreement
Publication date:
1994
Source Description
Full Bibliography
Title Statement
Title:
Orwell's 1984: electronic edition
Responsibility
Oxford Text Archive ( The four versions of Orwell's 1984 in the OTA were all prepared by the OUCS KDEM service in 1985 for Dr David C Bennett of the School of Oriental And African Studies at London University. The texts here have not been encoded or proofread in any way since they were produced (other than the English text, which was converted to an SGML like encoding by John Price-Wilkin, and subsequently automatically converted to conform to the OTA's dtd by myself and Alan Morrison. The other languages were converted to TEI conformant SGML by the ECI project 1993.) --LB, Nov 1992 )
Edition:
Public Domain TEI edition prepared at the Oxford Text Archive
Publication Statement
Distributor:
Oxford Text Archive
Address:
Oxford University Computing Service 13 Banbury Road Oxford OX2 6NN UK archive@ox.ac.uk
Availiability:
Freely available for non-commercial use provided that this header is included in its entirety with any copy distributed
Publication date:
19 Nov 1992
Source Description
Structured Bibliography
Monography
Title:
1984
Author:
George Orwell
Author:
Translator: Alenka Puhar
Imprint
Publication date:
1983
Publisher:
Knjiznica Kondor
Publisher:
Mladinska knjiga
Place:
Ljubljana

Encoding Description

Project Description:
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Editorial declaration:
Transduction:
In the cesDoc to cesAna conversion, DIV, QUOTE, Q tags and HEAD, POEM, LIST elements have been omitted. cesDoc P elements are encoded as PAR, and S as S. cesDoc sub-S level tags are omitted: DATE, NAME, ABBR, etc.
Quotation:
Q and QUOTE tags from the cesDoc source not retained.
Segmentation:
S segmentation same as in cesDoc source (hand-validated). TOK segmentation performed with mtseg and manually corrected,
Tag declaration:
chunklist = 1
Element corresponds to TEXT of the cesDoc source
chunk = 1
Element corresponds to BODY of the cesDoc source
par = 1288
Elements correspond to P elements of the cesDoc source. The FROM attribute gives the reference to the ID of the corresponding cesDoc P element.
s = 6689
Elements correspond to S elements of the cesDoc source The FROM attribute gives the reference to the ID of the corresponding cesDoc S element.
tok = 107770
Tokens are of TYPE=WORD or PUNCT, with the CLASS attribute giving the mtseg class of the token (ABBR, COMP, INIT, TTL).
orth = 107770
Contains the orthography of the token, as found in the cesDoc source (except for COMP, which have underscore instead of blank).
disamb = 90792
Contains disambiguated lexical information for WORDs.
lex = 187562
Contains undisambiguated lexical information for WORDs.
base = 278354
Base or lemmma of a WORD.
msd = 278354
Morphosyntactic description of a WORD.
ctag = 16978
Corpus tag of PUNCT tokens.

Revision Description

Date: 1997-11-04 (Tomaz Erjavec, IJS)
Date: 1997-12-11 (Tomaz Erjavec, IJS)
Date: 1997-12-20 (Tomaz Erjavec, IJS)


Meta-Made by et