Note that this is an old version; the latest is available from
the MULTEXT-East home page
This is the home page of Version 3 of the MULTEXT-East resources, a
multilingual dataset for language engineering research and
development. This dataset contains, for Bulgarian, Croatian, Czech,
English, Estonian, Hungarian, Lithuanian, Resian, Romanian, Russian,
Serbian, and Slovene, some or all of the following resources:
The resources comply with the
EAGLES
and
TEI P4
recommendations and
are freely available for research use.
Read about the resources
- Tomaž Erjavec:
MULTEXT-East Version 3:
Multilingual Morphosyntactic Specifications, Lexicons and Corpora.
In: Proc. of the Fourth Intl. Conf. on
Language Resources and Evaluation,
LREC'04,
ELRA, Paris, 2004.
- full documentation,
with links to the resources
Download
- free download:
MULTEXT-East documentation and
MULTEXT-East speech corpus
- licenced download:
MULTEXT-East morphosyntactic resources and
MULTEXT-East cesDoc corpus
To get access to the licenced resources, please fill out and submit the
on-line
licence; a text copy is provided here
for reference.
You will then receive by email a username and password for full
access to the resources, which you can browse on-line (links are in
the documentation) or download. The download files unpack into a
mirror of this WWW site.
Page last updated 2010-05-14,
Tomaž Erjavec