Multext-East Corpora

The structure of the corpus is explained in the following reports:

The corpus can be found in the corp/ directory (WWW access restricted).

The multilingual corpus is marked-up in accordance with CES, and consists of three parts:

  1. Parallel, annotated text corpus: Orwell's 1984.
    The introduction to this corpus is provided on a separate page.
    The corpus can be found in the corp/1984 directory (WWW access restricted).
  2. Comparable text corpus.
    The corpus can be found in the corp/comp directory (WWW access restricted).
    This corpus consists of two parts:
  3. Parallel speech corpus: EUROM passages.
    The introduction to this corpus is provided on a separate page.
    The corpus can be found in the corp/spch directory (WWW access restricted).
The header of the six-language primary data MULTEXT-East corpus is also provided in HTML.

[home]


Last updated 22-Dec-1997 by et