Multext-East Resources
Version 4 "MondiLex"

This is the home page of Version 4 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, Ukrainian, some or all of the following resources: The specifications and corpora use the TEI P5 Guidelines for the XML encoding; the schema and its documentation is available in the schema/ directory.

In published research please acknowledge the use of MULTEXT-East resources by citing the following paper:

Tomaž Erjavec (2012): MULTEXT-East: Morphosyntactic Resources for Central and Eastern European Languages. Language Resources and Evaluation, 46/1, pp. 131-142.


  1. Tomaž Erjavec (2012): MULTEXT-East: Morphosyntactic Resources for Central and Eastern European Languages.
    Language Resources and Evaluation, 46/1, pp. 131-142.
  2. Tomaž Erjavec: MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora.
    Proc. of the LREC 2010, Malta, 19-21 May, 2010. [PDF]
  3. Tomaž Erjavec: MULTEXT-East Morphosyntactic Specifications: Towards Version 4 In: Proc. of the MONDILEX Third Open Workshop, Bratislava, Slovakia, 15-16 April, 2009. [PDF]
  4. Tomaž Erjavec: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Proc. of the Fourth Intl. Conf. on Language Resources and Evaluation, LREC'04, ELRA, Paris, 2004. [PDF]

Page last updated 2015-06-15, Tomaž Erjavec