This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document, produced in the scope of the MULTEXT-East project, by Fred.

Note that this HTML translation does not contain all the information from the cesHeader.

CES header

Creator: MULTEXT-East consortium
Created: 1996-10-31
Updated: 1997-09-25

File Description

Title Statement
MUTEXT-East corpus
Jean Veronis, Nancy Ide Laboratoire Parole et Langage Centre National de la Recherche Scientifique Aix-en-Provence, France ( Project management, DTD construction ) Tomaz Erjavec, Dept. for Intellignet Systems, Jozef Stefan Institute, Ljubljana, Slovenia ( Corpus workpackage leader )
MTE Final Release
1.761.850 words
24 MB MB
Publication Statement
Electronic address:
For now:
Electronic address:
Available for research purposes upon receipt of signed agreement
Publication date:
October 1, 1997
Source Description
Full Bibliography
Title Statement
The corpus of the MUTEXT-East project consists of the following components: 1. Multilingual Parallel "1984" by G. Orwell in English, Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene 2. Multilingual Comparable Fiction in Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene 3. Multilingual Comparable Newspapers in Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene 4. Multilingual Parallel EUROM extracts in Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene
Nancy Ide, Greg Priest-Dorman (CNRS/Vassar), Tomaz Erjavec (IJS) ( English '1984' corpus component; see component header for details. ) Institute of Mathematics, Bulgarian Academy of Sciences ( Bulgarian language corpus components; see bg component headers for details. ) Vladimir Petkevic (UTKL), Jana Klimova (FFUK) ( Czech language corpus components; see cs component headers for details. ) Heiki-Jaan Kaalep, Viire Villandi, Heili Orav ( Estonian language corpus components; see et component headers for details. ) Csaba Oravecz, Laszlo Tihanyi (RIL) ( Hungarian language corpus components; see hu component headers for details. ) Dan Tufis and Stefan Bruda (RACAI), Lidia Diaconu, Calin Diaconu (ICI) ( Romanian language corpus components; see ro component headers for details. ) Tomaz Erjavec (IJS), Olga Vukovic (Spica International), Amebis d.o.o ( Slovene language corpus components; see sl component headers for details. )
Publication Statement
See individual corpus components
See individual corpus components
See individual corpus components
Publication date:
October 1, 1997

Encoding Description

Project Description:
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Tag declaration:

Revision Description

Date: 1996-11-2 Tomaz Erjavec, IJS
Date: 1997-10-05 Tomaz Erjavec, IJS

  1. Nineteen Eighty-Four, English
  2. Nineteen Eighty-Four, Bulgarian
  3. Nineteen Eighty-Four, Czech
  4. Nineteen Eighty-Four, Estonian
  5. Nineteen Eighty-Four, Hungarian
  6. Nineteen Eighty-Four, Romanian
  7. Nineteen Eighty-Four, Slovene
  8. Fiction, Bulgarian
  9. Fiction, Czech
  10. Fiction, Estonian
  11. Fiction, Hungarian
  12. Fiction, Romanian
  13. Fiction, Slovene
  14. Newspapers, Bulgarian
  15. Newspapers, Czech
  16. Newspapers, Estonian
  17. Newspapers, Hungarian
  18. Newspapers, Romanian
  19. Newspapers, Slovene
  20. Speech, English
  21. Speech, Bulgarian
  22. Speech, Czech
  23. Speech, Estonian
  24. Speech, Hungarian
  25. Speech, Romanian
  26. Speech, Slovene