Multext-East Resources, Version 3

Note that this is an old version; the latest is available from the MULTEXT-East home page

This is the home page of Version 3 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Resian, Romanian, Russian, Serbian, and Slovene, some or all of the following resources:

The resources comply with the EAGLES and TEI P4 recommendations and are freely available for research use.

Read about the resources

  1. Tomaž Erjavec: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Proc. of the Fourth Intl. Conf. on Language Resources and Evaluation, LREC'04, ELRA, Paris, 2004.
  2. full documentation, with links to the resources


  1. free download: MULTEXT-East documentation and MULTEXT-East speech corpus
  2. licenced download: MULTEXT-East morphosyntactic resources and MULTEXT-East cesDoc corpus

To get access to the licenced resources, please fill out and submit the on-line licence; a text copy is provided here for reference. You will then receive by email a username and password for full access to the resources, which you can browse on-line (links are in the documentation) or download. The download files unpack into a mirror of this WWW site.

Page last updated 2010-05-14, Tomaž Erjavec