The structure of the corpus is explained in the following reports:
The corpus can be found in the corp/
directory (WWW access restricted).
The multilingual corpus is marked-up in accordance with
CES, and consists of three parts:
- Parallel, annotated text corpus: Orwell's 1984.
The introduction to this corpus is provided on a
separate page.
The corpus can be found in the
corp/1984 directory
(WWW access restricted).
- Comparable text corpus.
The corpus can be found in the
corp/comp directory
(WWW access restricted).
This corpus consists of two parts:
- Parallel speech corpus: EUROM passages.
The introduction to this corpus is provided on a
separate page.
The corpus can be found in the
corp/spch directory
(WWW access restricted).
The header of the six-language primary data MULTEXT-East corpus is
also provided in HTML.
[home]
Last updated 22-Dec-1997 by et