TEI Headers for
MULTEXT-East cesAna multilingual corpus: Nineteen Eighty-Four


Headers

MULTEXT-East cesAna multilingual corpus: Nineteen Eighty-Four

TEI header ( corpus)

creator: et
status: update
date: 2000-10-30 (created) 2004-04-09 (updated)

File Description

Title Statement:
Title:
MULTEXT-East cesAna multilingual corpus: Nineteen Eighty-Four
Responsibility Statement:
Tomaž Erjavec, IJS
TEI encoding
Responsibility Statement:
Nancy Ide, Vassar
English data
Dan Tufiş, RACAI
Romanian data
Heiki-Jaan Kaalep, TU
Estonian data
Csaba Oravecz, HAS
Hungarian data
Vladimír Petkevič, ITCL
Czech data
Ludmila Dimitrova, BAS
Bulgarian data
Cvetana Krstev, Duško Vitas
Serbian data
Tomaž Erjavec, IJS
Slovene data
Funder:
EU Copernicus Project COP106 "MULTEXT-East"
Funder:
EU Copernicus Concerted Action "TELRI"
Funder:
EU Copernicus Project PL96-1142 "Concede"
Funder:
Individual partners' grants and contracts.
Edition Statement:
Edition: Version 3 BETA
Extent: 618,879 word tokens words
Publications Statement:
Distributor:
MULTEXT-East Web site
Address:
http://nl.ijs.si/ME/V3/
Distributor:
Individual partners, c.f. component headers
Availiability:

Available for research purposes upon receipt of signed agreement.

Source Description:
Title Statement:
Title:
Multext-East/Concede: Nineteen Eighty-Four, Multilingual
Funder:
EU Copernicus Project PL96-1142 "Concede"
Funder:
EU Copernicus Project COP106 "MULTEXT-East"
Funder:
Individual partners' grants and contracts.
Edition Statement:
Edition: Concede Release
Publications Statement:
Distributor:
MULTEXT-East Web site
Address:
http://nl.ijs.si/ME/V2/
Distributor:
Individual partners
Availiability:

Available for research purposes upon receipt of signed agreement.

March 19th, 2001
Source Description:
Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four
Funder:
EU Copernicus Project COP106 "MULTEXT-East"
Funder:
EU Copernicus Action "TELRI"
Edition Statement:
Edition: MULTEXT-East Final Release
Publications Statement:
Distributor:
TRACTOR: TELRI Research Archive of Computational Tools and Resources
Place of publication:
"East meets West" CD-ROM, ISBN 3-922641-46-6
Address:
http://tractor.bham.ac.uk/
http://www.tractor.de/
Distributor:
MULTEXT-East Web site
Address:
http://nl.ijs.si/ME/CD/
Availiability:

Available for research purposes upon receipt of signed agreement.

January 1st, 1998
Source Description:
Bibliography list:
    Title:
    1984
    George Orwell 1949; reprinted 1961
    Publisher:
    New American Library
    Place of publication:
    New York
    Title:
    O mie nouă sute optzeci şi patru
    George Orwell Translator: Mihnea Gafiţa 1991
    Publisher:
    Editura Univers
    Place of publication:
    Bucharest
    Title:
    1984
    George Orwell Translator: Alenka Puhar 1983
    Publisher:
    Knjižnica Kondor
    Publisher:
    Mladinska knjiga
    Place of publication:
    Ljubljana
    Title:
    1984
    George Orwell Translator: Eva Šimečková 1991
    Publisher:
    Naše vojsko
    Place of publication:
    Prague
    Title:
    1984
    George Orwell Translator: Lydia Bozhilova 1989
    Publisher:
    Profizdat
    Place of publication:
    Sofia
    Title:
    1984
    George Orwell Translator: Elias Treeman 1990
    Publisher:
    Loomingu Raamatukogu nr. 48-51
    Publisher:
    Perioodika
    Place of publication:
    Tallinn
    Title:
    1984
    George Orwell 1989
    Publisher:
    Európa Könyvkiadó
    Place of publication:
    Budapest
    Title:
    1984
    George Orwell Translator: Vlada Stojiljković 1989
    Publisher:
    Beogradski izdavačko-grafički zavod
    Place of publication:
    Beograd

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. <http://nl.ijs.si/ME/>

Editorial declaration:

Since the CD-ROM release of the cesAna Orwells, many errors of linguistic annotations have been corrected in the individual texts.

In the process of conversion to TEI, various format errors were detected and corrected.

Normalisation:

All the novels have their markup normalised: a) structure annotation with DIV and P (attributes ID and TYPE); b) segmentation annotation with S (attribute ID); c) tokenisation annotation with W, C, (attribute TYPE) d) linguistic annotation with W attributes LEMMA and ANA.

All the novels use Unicode XML character entities to represent non-ASCII characters.

Quotation:

QUOTEs have been in general changed to P

Q markup has been in some novels (see individual Headers) omitted, while it is in others present as punctuation mark "; its C element is marked with TYPE="open" or ="close"

Segmentation:

Segmentation into paragraphs follows the printed sources; it therefore not 1-1 with the English original. Segmentation into sentences was performed automatically and then hand-validated.

Tokenisation into words and punctuation symbols was perfumed on the basis of MULTEXT-East lexica, mostly with the MULTEXT tools 'mtseg' and then hand-validated.

No end-of-line hyphenation present in texts.

The linguistic interpretation of the text consists of marking up the word tokens with their context disambiguated lemma and MULTEXT-East morphosyntactic description. The various texts have undergone various amounts of validation, so error-rates between them differ.

The two-letter language codes follow ISO 639.

The MULTEXT-East morphosyntactic descriptions (MSDs) follow the revised common tables of lexical specifications MULTEXT-East/Concede. The lexical MSDs have been converted to a FSLIB, a feature-structure library, while their decomposition into features is given in a FLIB, a feature library. The words in the texts have theirs MSD encoded as the value of the ANA (#IDREF) attribute. This attribute refers to a FS, which, in turn, refers via its #IDREFS FEATS to the Fs that define it.

Class declaration:
text = 8
Text
body = 8
Text body
div = 228
Division: part, chapter
head = 3
Title of the Appendix
p = 10438
Paragraph
s = 53303
Sentence
w = 708708
Word token
c = 143992
Punctuation token
fLib = 14
MSD Feature Library
f = 524
Defined features
sym = 524
A feature-value
fsLib = 14
MSD Library
fs = 6279
Valid MSD
Taxonomy:
Category oana:
Nineteen Eighty-Four, Morphosyntactically Annotated

Profile Description

Language use:
sl-rozaj: Resian (dialect of Slovene)
be: Byelorussian
bg: Bulgarian
br: Breton
ca: Catalan
co: Corsican
cs: Czech
cy: Welsh
da: Danish
de: German
el: Greek/Latin
en: English
es: Spanish
et: Estonian
eu: Basque
fi: Finnish
fr: French
ga: Irish
gd: Scots Gaelic
gl: Galician
hr: Croatian
hu: Hungarian
hy: Armenian
ik: Inupiak
is: Icelandic
it: Italian
ji: Yiddish
ka: Georgian/Ibero
kl: Greenlandic
la: Latin/Latin
lt: Lithuanian
lv: Latvian;Lettish
mk: Macedonian
mo: Moldavian
nl: Dutch
no: Norwegian
oc: Occitan
pl: Polish
pt: Portuguese
rm: Rhaeto-Romance
ro: Romanian
ru: Russian
sh: Serbo-Croatian
sk: Slovak
sl: Slovene
sq: Albanian
sr: Serbian
sv: Swedish
tr: Turkish
tt: Tatar
uk: Ukrainian

Revision Description



TEI header ( msd-library)

creator: et
status: update
date: 2000-10-30 (created) 2004-04-09 (updated)

File Description

Title Statement:
Title:
MULTEXT-East Morphosyntactic Specifications
Responsibility Statement:
Tomaž Erjavec, IJS
Editor
Responsibility Statement:
Nancy Ide, Vassar
English data
Radoslav Pavlov, L.Dimitrova, Ludmila Sinapova, Kiril Simov
Bulgarian specification
Vladimír Petkevič
Czech specification
Heiki-Jaan Kaalep
Estonian specification
Nancy Ide, Greg Priest-Dorman, Tomaž Erjavec, Tamas Varadi
English specification
Laszlo Tihanyi, Tamas Varadi
Hungarian specification
Dan Tufiş, Anna Maria Barbu
Romanian specification
Tomaž Erjavec, Peter Holozan, Vojko Gorjanc, Marko Stabej
Slovene specification
Marko Tadić
Croatian specification
Cvetana Krstev, Duško Vitas
Serbian specification
Han Steenwijk
Resian specification
Funder:
EU Copernicus Project COP106 "MULTEXT-East"
Funder:
EU Copernicus Project PL96-1142 "Concede"
Funder:
Individual partners' grants and contracts.
Edition Statement:

MULTEXT-East Morphosyntactic Specifications, Version 3 BETA

Publications Statement:
Distributor:
MULTEXT-East Web site
Address:
http://nl.ijs.si/ME/V3/msd/
Availiability:

Freely available.

March 1st, 2004
Source Description:
Title:
MULTEXT-East Morphosyntactic Specifications, Concede Edition
Tomaž Erjavec
Place of publication:
http://nl.ijs.si/ME/V2/msd/
2001-04-09

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

Class declaration:
fLib = 14
Library for Morphosyntactic Specifications, one for each category. Attributes are: type: gives the name of the category (part-of-speech) that the contained features are describing
f = 524
Defined features, one for each category, and one for each defined attribute-value pair. Attributes are: id: the identifier of the feature composed of the part-of-speech code, the feature number, a period and the feature code, e.g. id="A1.f". select: the languages that the feature-value is appropriate for name: the name of the attribute; "PoS" for category.
sym = 524
A feature-value. Attributes are: value: the name of the value
fsLib = 14
Library of Morphosyntactic Descriptions, one for each category. Attributes are: type: gives the name of the category (part-of-speech) that the contained MSDs belong to
fs = 6279
Valid morphosyntactic description. Attributes are: id: the lexical/corpus MSD select: the languages that the MSD is appropriate for feats: references to the definitions of a attribute-values.

Revision Description



TEI header (English text)

creator: ET
status: update
date: 1997-12-15 (created) 2004-03-05 (updated)

File Description

Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, English
Responsibility Statement:
Nancy Ide, Vassar
Overall Responsibility
Greg Priest-Dorman, Vassar
Automatic tagging and lemmatisation
János Szenthe, Tamás Váradi, HAS
Manual tagging
Ana-Maria Barbu, Dan Tufiş, RACAI
Tagging correction
Tomaž Erjavec, IJS
Conversion to TEI
Edition Statement:
Edition: MULTEXT-East, Version 3 BETA
Publications Statement:
Distributor:
Department of Computer Science, Vassar College
Address:
Poughkeepsie,
New York 12604-0252
USA
Address:
ide@cs.vassar.edu
Availiability:

Freely available for non-commercial use provided that this Header is included in its entirety with any copy distributed

October 30th, 2000
Source Description:
Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, English
Responsibility Statement:
Nancy Ide
Overall Responsibility
Greg Priest-Dorman
Generation of Lexical Data
Vladimír Petkevič
Conversion to cesAna DTD
Edition Statement:
Edition: MULTEXT-East Final Release
Publications Statement:
Distributor:
Department of Computer Science, Vassar College
Address:
Poughkeepsie,
New York 12604-0252
USA
Address:
ide@cs.vassar.edu
January 1st, 1998
Source Description:
Title Statement:
Title:
Multext-East CES1: Nineteen Eighty-Four, English
Publications Statement:
Distributor:
Department of Computer Science, Vassar College
Address:
Poughkeepsie,
New York 12604-0252
USA
Address:
ide@cs.vassar.edu
October 1, 1997
Source Description:
Title Statement:
Title:
The European Corpus Initiative Multilingual Corpus 1: 1984 by George Orwell (English)
Responsibility Statement:
Association for Computational Linguistics
Converted from OTA's DTD to ECI DTD
Publications Statement:
Distributor:
ACL
Address:
ACL
1994
Source Description:
Title Statement:
Title:
Orwell's 1984: electronic edition
Responsibility Statement:
Oxford Text Archive
The four versions of Orwell's 1984 in the OTA were all prepared by the OUCS KDEM service in 1985 for Dr David C Bennett of the School of Oriental And African Studies at London University. The texts here have not been encoded or proofread in any way since they were produced (other than the English text, which was converted to an SGML like encoding by John Price-Wilkin, and subsequently automatically converted to conform to the OTA's dtd by myself and Alan Morrison. The other languages were converted to TEI conformant SGML by the ECI project 1993.) --LB, Nov 1992
Edition Statement:

Public Domain TEI edition prepared at the Oxford Text Archive

Publications Statement:
Distributor:
Oxford Text Archive
Address:
Oxford University Computing Service
13 Banbury Road
Oxford OX2 6NN UK
archive@ox.ac.uk
Availiability:

Freely available for non-commercial use provided that this Header is included in its entirety with any copy distributed

19 Nov 1992
Source Description:
Title:
1984
George Orwell 1949; reprinted 1961
Publisher:
New American Library
Place of publication:
New York

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

Class declaration:
text = 1
body = 1
div = 29
p = 1287
s = 6737
w = 104286
c = 14138

Revision Description



TEI header (Romanian text)

creator: DT
status: update
date: 1997-11-04 (created) 2004-03-05 (updated)

File Description

Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Romanian
Responsibility Statement:
Dan Tufiş, RACAI
Overall Responsibility
Tomaž Erjavec, IJS
Conversion to TEI
Edition Statement:
Edition: MULTEXT-East, Version 3 BETA
Publications Statement:
Distributor:
Centre for Artificial Intelligence, NLP division, Romanian Academy
Address:
13, 13 Septembrie Str.,
Bucharest 5, 74311
Romania
Address:
tufis@valhalla.racai.ro
http://nl.ijs.si/ME/
Availiability:

Available for research purposes upon receipt of signed agreement.

October 30th, 2000
Source Description:
Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Romanian
Responsibility Statement:
Dan Tufiş
Overall Responsibility
Ana-Maria Barbu
Hand-tagging the whole book
Vasile Pătraşcu
Conversion to cesAna DTD
Edition Statement:
Edition: MULTEXT-East Final Release
Publications Statement:
Distributor:
Center for Advanced Research in Machine Learning, Natural Language Processing and Conceptual Modelling
Address:
Casa Academiei,13,
13 Septembrie,
Bucharest 5, 74311
Romania
Address:
tufis@valhalla.racai.ro
http://nl.ijs.si/ME/
Availiability:

Available for research purposes upon receipt of signed agreement.

January 1st, 1998
Source Description:
Title:
O mie nouă sute optzeci şi patru
George Orwell Translator: Mihnea Gafiţa 1991
Publisher:
Editura Univers
Place of publication:
Bucharest

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

The Concede project had the aim of developing a unified dictionary encoding schema and the experiments were done with lexical tokens extracted from Orwell's "1984" multilingual corpus developed within the MULTEXT-East project. The headword extraction considered various frequency intervals and considering all word categories (POS) so that different kinds of encoding problems be revealed. The MULTEXT-East corpus has been significantly improved for the purpose of CONCEDE project.

Class declaration:
text = 1
body = 1
div = 28
p = 1346
s = 6520
w = 101772
c = 16556
head = 2

Revision Description



TEI header (Czech text)

creator: VP
status: update
date: 1997-11-28 (created) 2004-03-05 (updated)

File Description

Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Czech
Responsibility Statement:
Vladimír Petkevič, ITCL
Overall Responsibility, IJS
Tomaž Erjavec
Conversion to TEI
Edition Statement:
Edition: MULTEXT-East, Version 3 BETA
Publications Statement:
Distributor:
Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Prague
Address:
Celetná 13
110 00 Praha 1,
Czech Republic
Address:
Vladimir.Petkevic@ff.cuni.cz
Availiability:

Freely available for non-commercial use provided that this Header is included in its entirety with any copy distributed

October 30th, 2000
Source Description:
Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Czech
Responsibility Statement:
Vladimír Petkevič
Overall Responsibility
Milena Hnátková
Hand-tagging of the first 3 chapters
Revision of the tagger results
Vladimír Petkevič
Conversion to cesAna DTD
Edition Statement:
Edition: MULTEXT-East Final Release
Publications Statement:
Distributor:
Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Prague
Address:
Celetná 13,
110 00 Praha 1,
Czech Republic
Address:
Vladimir.Petkevic@ff.cuni.cz
January 1st, 1998
Source Description:
Title Statement:
Title:
Multext-East CES1: Nineteen Eighty-Four, Czech
Publications Statement:
Distributor:
Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Prague
Address:
Celetná 13,
110 00 Praha 1,
Czech Republic
Address:
Vladimir.Petkevic@ff.cuni.cz
November 1, 1997
Source Description:
Title Statement:
Title:
Electronic form of 1984 by George Orwell in Czech, obtained via OCR
Responsibility Statement:
Vladimír Petkevič
OCR'ed the novel
Publications Statement:
Distributor:
Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Prague, Czech Republic (ÚTKL FFUK)
Address:
Celetná 13, Praha 1
Czech Republic
1998
Source Description:
Title:
1984
George Orwell Translator: Eva Šimečková 1991
Publisher:
Naše vojsko
Place of publication:
Prague, Czech Republic

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

Class declaration:
text = 1
body = 1
div = 29
p = 1298
s = 6752
w = 79870
c = 20498
head = 1

Revision Description



TEI header (Slovene text)

creator: ET
status: update
date: 1997-11-04 (created) 2004-04-06 (updated)

File Description

Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Slovene
Responsibility Statement:
Tomaž Erjavec
Overall Responsibility
Edition Statement:
Edition: MULTEXT-East, Version 3 BETA
Publications Statement:
Distributor:
Dept. of Knowledge Technologies, Jožef Stefan Institute
Address:
Jamova 39
SI-1000 Ljubljana
Slovenia
Address:
tomaz.erjavec at ijs.si
http://nl.ijs.si/ME/
Availiability:

Freely available for non-commercial use provided that this Header is included in its entirety with any copy distributed

November 1st, 2000
Source Description:
Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Slovene
Responsibility Statement:
Tomaž Erjavec
Overall Responsibility
Aleksandra Bizjak, Primož Jakopin
Tagging
Tomaž Erjavec
Tagging correction
Conversion to TEI
Edition Statement:
Edition: MULTEXT-East Final Release
Publications Statement:
Distributor:
Dept. for Intelligent Systems, Jožef Stefan Institute
Address:
Jamova 39
SI-1000 Ljubljana,
Slovenia
Address:
tomaz.erjavec at ijs.si
http://nl.ijs.si/ME/
Availiability:

Available for research purposes upon receipt of signed agreement.

January 1st, 1998
Source Description:
Title Statement:
Title:
Multext-East CES1: Nineteen Eighty-Four, Slovene
Publications Statement:
Distributor:
Dept. for Intelligent Systems, Jozef Štefan Institute
Address:
Jamova 39
SI-1000 Ljubljana
Slovenia
Address:
tomaz.erjavec@ijs.si
http://nl.ijs.si/ME/
October 1, 1997
Source Description:
Title Statement:
Title:
The European Corpus Initiative Multilingual Corpus 1: 1984 by George Orwell (Slovene)
Responsibility Statement:
Association for Computational Linguistics
Converted from OTA's DTD to ECI DTD
Publications Statement:
Distributor:
ACL
Address:
ACL
1994
Source Description:
Title Statement:
Title:
Orwell's 1984: electronic edition
Responsibility Statement:
Oxford Text Archive
The four versions of Orwell's 1984 in the OTA were all prepared by the OUCS KDEM service in 1985 for Dr David C Bennett of the School of Oriental And African Studies at London University. The texts here have not been encoded or proofread in any way since they were produced (other than the English text, which was converted to an SGML like encoding by John Price-Wilkin, and subsequently automatically converted to conform to the OTA's dtd by myself and Alan Morrison. The other languages were converted to TEI conformant SGML by the ECI project 1993.) --LB, Nov 1992
Edition Statement:

Public Domain TEI edition prepared at the Oxford Text Archive

Publications Statement:
Distributor:
Oxford Text Archive
Address:
Oxford University Computing Service
13 Banbury Road
Oxford OX2 6NN UK
archive@ox.ac.uk
Availiability:

Freely available for non-commercial use provided that this Header is included in its entirety with any copy distributed

19 Nov 1992
Source Description:
Title:
1984
George Orwell Translator: Alenka Puhar 1983
Publisher:
Knjižnica Kondor
Publisher:
Mladinska knjiga
Place of publication:
Ljubljana

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

Class declaration:
text = 1
body = 1
div = 29
p = 1288
s = 6689
w = 90792
c = 21486

Revision Description



TEI header (Serbian text)

creator: CK
status: update
date: 2004-04-06 (created) 2004-04-09 (updated)

File Description

Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Serbian
Responsibility Statement:
Cvetana Krstev
Tagging supervision, conversion from INTEX format.
Responsibility Statement:
Tomaž Erjavec
TEI P4 conformance.
Edition Statement:
Edition: MULTEXT-East, Version 3 BETA
Publications Statement:
Address:
http://nl.ijs.si/ME/V3/
Distributor:
Computer Science Department Faculty of Mathematics
Address:
Studentski trg 16
11000 Belgrade
Serbia
Address:
cvetana@matf.bg.ac.yu
Availiability:

Available for non-commercial use provided that this Header is included in its entirety with any copy distributed

April 9th, 2004
Source Description:
Title Statement:
Title:
Multext-East CES1: Nineteen Eighty-Four, Serbian
Responsibility Statement:
Cvetana Krstev
Error correction, CES1 conformance.
Dusko Vitas
Consulting.
Tomaž Erjavec
Encoding harmonisation with the MULTEXT-East '1984' corpus.
Edition Statement:

TELRI Final Release

Publications Statement:
January 1st, 1998
Source Description:
Title Statement:
Title:
Orwell's 1984: electronic edition
Responsibility Statement:
Oxford Text Archive
The four versions of Orwell's 1984 in the OTA were all prepared by the OUCS KDEM service in 1985 for Dr David C Bennett of the School of Oriental And African Studies at London University. The texts here have not been encoded or proofread in any way since they were produced (other than the English text, which was converted to an SGML like encoding by John Price-Wilkin, and subsequently automatically converted to conform to the OTA's dtd by myself and Alan Morrison. The other languages were converted to TEI conformant SGML by the ECI project 1993. --LB, Nov 1992
Edition Statement:

Public Domain TEI edition prepared at the Oxford Text Archive

Publications Statement:
Distributor:
Oxford Text Archive
Address:
Oxford University Computing Service 13 Banbury Road Oxford OX2 6NN UK archive@ox.ac.uk
Availiability:

Freely available for non-commercial use provided that this header is included in its entirety with any copy distributed

19 Nov 1992
Source Description:
Title:
1984
George Orwell Translator: Vlada Stojiljković Edition: Second edition 1984
Publisher:
Beogradski izdavačko-grafički zavod
Place of publication:
Beograd

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Class declaration:
text = 1
body = 1
div = 28
p = 1293
s = 6677
w = 18976
c = 89829

Revision Description



TEI header (Bulgarian text)

creator: LD
status: update
date: 1997-11-30 (created) 2004-03-05 (updated)

File Description

Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Bulgarian
Responsibility Statement:
Ludmila Dimitrova, BAS
Overall Responsibility
Tomaž Erjavec, IJS
Conversion to TEI
Edition Statement:
Edition: MULTEXT-East, Version 3 BETA
Publications Statement:
Distributor:
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia
Address:
Acad G. Bonchev st. bl.8
1113 Sofia, Bulgaria
Address:
ludmila@ling.math.acad.bg
Availiability:

Freely available for non-commercial use provided that this Header is included in its entirety with any copy distributed

November 1st, 2000
Source Description:
Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Bulgarian
Responsibility Statement:
Ludmila Dimitrova, Lydia Sinapova
Overall Responsibility
Ludmila Dimitrova, Kiril Simov
Hand-tagging of first chapter first two parts
Ludmila Dimitrova
Correction of disambiguation errors
Vladimír Petkevič
Conversion to cesAna DTD
Edition Statement:
Edition: MULTEXT-East Final Release
Publications Statement:
Distributor:
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia
Address:
Acad G. Bonchev st. bl.8
1113 Sofia, Bulgaria
Address:
ludmila@ling.math.acad.bg
October 1, 1997
Source Description:
Title Statement:
Title:
Multext-East CES1: Nineteen Eighty-Four, Bulgarian
Publications Statement:
Distributor:
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia
Address:
Acad G. Bonchev st. bl.8
1113 Sofia, Bulgaria
Address:
ludmila@ling.math.acad.bg
October 1, 1997
Source Description:
Title Statement:
Title:
Electronic form of 1984 in Bulgarian
Responsibility Statement:
Ludmila Dimitrova (BAS)
Lydia Sinapova (BAS)
Kiril Simov(BAS
Typing-in 1984.
Publications Statement:
Distributor:
Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia
Source Description:
Title:
1984
George Orwell Translator: Lydia Bozhilova 1989
Publisher:
Profizdat
Place of publication:
Sofia, Bulgaria

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

Class declaration:
text = 1
body = 1
div = 29
p = 1322
s = 6682
w = 86020
Words encode values of corpus tags in FUNCTION attribute
c = 15153

Revision Description



TEI header (Estonian text)

creator: HJK
status: update
date: 1997-11-28 (created) 2004-03-05 (updated)

File Description

Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Estonian
Responsibility Statement:
Heiki-Jaan Kaalep
Overall responsibility
Tomaž Erjavec
Conversion to TEI
Edition Statement:
Edition: MULTEXT-East, Version 3 BETA
Publications Statement:
Distributor:
TÜ arvutuslingvistika uurimisgrupp
Address:
Tiigi 78-203,
Tartu,
Estonia
Address:
hkaalep@psych.ut.ee
http://www.cl.ut.ee
Availiability:

Freely available

October 30th, 2000
Source Description:
Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Estonian
Responsibility Statement:
Heiki-Jaan Kaalep
Overall responsibility, automatic morphological tagging, inter-annotator consistency checking, conversion to CESANA
Külli Habicht, Kadri Muischnek, Heili Orav, Helen Potter, Andriela Rääbis
Manual disambiguation (twice) and harmonization
Edition Statement:
Edition: MULTEXT-East Final Release
Publications Statement:
Distributor:
TÜ arvutuslingvistika uurimisgrupp
Address:
Tiigi 78-232,
Tartu,
Estonia
Address:
hkaalep@psych.ut.ee
http://www.cl.ut.ee/
Availiability:

Freely available

January 1st, 1998
Source Description:
Title Statement:
Title:
Multext-East CES1: Nineteen Eighty-Four, Estonian
Publications Statement:
Distributor:
TÜ arvutuslingvistika uurimisgrupp
Address:
Tiigi 78-232,
Tartu,
Estonia
Address:
hkaalep@psych.ut.ee
http://www.cl.ut.ee
Availiability:

Freely available

October 1, 1997
Source Description:
Title:
1984
George Orwell Translator: Elias Treeman 1990
Publisher:
Loomingu Raamatukogu nr. 48-51
Publisher:
Perioodika
Place of publication:
Tallinn

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

Class declaration:
text = 1
body = 1
div = 27
p = 1266
s = 6478
w = 75431
c = 19467

Revision Description



TEI header (Hungarian text)

creator: OCS
status: update
date: 1997-11-24 (created) 2004-03-05 (updated)

File Description

Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Hungarian
Responsibility Statement:
Csaba Oravecz, HAS
Overall Responsibility
Tomaž Erjavec, IJS
Conversion to TEI DTD
Edition Statement:
Edition: MULTEXT-East, Version 3 BETA
Publications Statement:
Distributor:
Research Institute for Linguistics, Hungarian Academy of Sciences
Address:
Benczúr u. 33.
Budapest
Address:
oravecz@nytud.hu
http://www.nytud.hu
Availiability:

Freely available for non-commercial use provided that this Header is included in its entirety with any copy distributed

October 30th, 2000
Source Description:
Title Statement:
Title:
Multext-East cesAna: Nineteen Eighty-Four, Hungarian
Edition Statement:
Edition: MULTEXT-East Final Release
Publications Statement:
Distributor:
Research Institute for Linguistics, Hungarian Academy of Sciences
Address:
Benczúr u. 33.
Budapest
Address:
oravecz@nytud.hu
http://www.nytud.hu/
January 1st, 1998
Source Description:
Title:
1984
George Orwell 1989
Publisher:
Európa Könyvkiadó
Place of publication:
Budapest

Encoding Description

Project description:

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

Class declaration:
text = 1
body = 1
div = 29
p = 1303
s = 6768
w = 80708
c = 17718

Revision Description