This document is a HTML 3.2 rendering of a
Corpus Encoding Specification
DTD document, produced in the scope of the
MULTEXT-East
project, by
Fred.
Note that this HTML translation does not contain all the information from the cesHeader.
CES header
Creator: MULTEXT-East consortium
Created: 1996-10-31
Updated: 1997-09-25
File Description
- Title Statement
- Title:
- MUTEXT-East corpus
- Responsibility
-
Jean Veronis,
Nancy Ide
Laboratoire Parole et Langage
Centre National de la Recherche Scientifique
Aix-en-Provence, France
(
Project management, DTD construction
)
Tomaz Erjavec,
Dept. for Intellignet Systems,
Jozef Stefan Institute,
Ljubljana, Slovenia
(
Corpus workpackage leader
)
- Edition:
- MTE Final Release
- Extent:
- 1.761.850 words
24 MB MB
- Publication Statement
- Distributor:
- TELRI? ELRA?
- Address:
-
- Electronic address:
- For now: tomaz.erjavec@ijs.si
- Electronic address:
- http://nl.ijs.si/ME
- Availiability:
-
Available for research purposes upon receipt of signed agreement
- Publication date:
- October 1, 1997
- Source Description
- Full Bibliography
- Title Statement
- Title:
-
The corpus of the MUTEXT-East project consists of the
following components:
1. Multilingual Parallel "1984" by G. Orwell in
English,
Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene
2. Multilingual Comparable Fiction in
Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene
3. Multilingual Comparable Newspapers in
Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene
4. Multilingual Parallel EUROM extracts in
Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene
- Responsibility
-
Nancy Ide, Greg Priest-Dorman (CNRS/Vassar),
Tomaz Erjavec (IJS)
(
English '1984' corpus component;
see component header for details.
)
Institute of Mathematics, Bulgarian Academy of Sciences
(
Bulgarian language corpus components;
see bg component headers for details.
)
Vladimir Petkevic (UTKL),
Jana Klimova (FFUK)
(
Czech language corpus components;
see cs component headers for details.
)
Heiki-Jaan Kaalep, Viire Villandi, Heili Orav
(
Estonian language corpus components;
see et component headers for details.
)
Csaba Oravecz, Laszlo Tihanyi (RIL)
(
Hungarian language corpus components;
see hu component headers for details.
)
Dan Tufis and Stefan Bruda (RACAI),
Lidia Diaconu, Calin Diaconu (ICI)
(
Romanian language corpus components;
see ro component headers for details.
)
Tomaz Erjavec (IJS),
Olga Vukovic (Spica International),
Amebis d.o.o
(
Slovene language corpus components;
see sl component headers for details.
)
- Publication Statement
- Distributor:
- See individual corpus components
- Address:
- See individual corpus components
- Availiability:
-
See individual corpus components
- Publication date:
- October 1, 1997
Encoding Description
- Project Description:
-
MULTEXT-East:
Multilingual Text Tools and Corpora for Central and
Eastern European Languages.
EU Copernicus Project COP106
- Tag declaration:
- cesdoc = 26
- text = 26
- group = 1
- body = 28
- div = 3309
- p = 27993
- abbr = 6059
- author = 168
- bibl = 168
- byline = 1240
- caption = 145
- cell = 75
- closer = 2
- corr = 1
- date = 1835
- dateline = 220
- distinct = 223
- docauthor = 729
- figdesc = 115
- figure = 115
- foreign = 919
- head = 3262
- hi = 5328
- item = 350
- l = 456
- label = 2
- list = 61
- measure = 18
- mentioned = 1419
- name = 41340
- note = 52
- num = 5238
- opener = 290
- poem = 90
- ptr = 39
- q = 23947
- quote = 756
- ref = 26
- row = 15
- s = 65758
- sp = 251
- speaker = 6
- table = 3
- term = 6
- time = 87
- title = 920
Revision Description
- Date: 1996-11-2 Tomaz Erjavec, IJS
-
- Date: 1997-10-05 Tomaz Erjavec, IJS
-
- Nineteen Eighty-Four, English
- Nineteen Eighty-Four, Bulgarian
- Nineteen Eighty-Four, Czech
- Nineteen Eighty-Four, Estonian
- Nineteen Eighty-Four, Hungarian
- Nineteen Eighty-Four, Romanian
- Nineteen Eighty-Four, Slovene
- Fiction, Bulgarian
- Fiction, Czech
- Fiction, Estonian
- Fiction, Hungarian
- Fiction, Romanian
- Fiction, Slovene
- Newspapers, Bulgarian
- Newspapers, Czech
- Newspapers, Estonian
- Newspapers, Hungarian
- Newspapers, Romanian
- Newspapers, Slovene
- Speech, English
- Speech, Bulgarian
- Speech, Czech
- Speech, Estonian
- Speech, Hungarian
- Speech, Romanian
- Speech, Slovene