Multext-East Resources
Version 4 "MondiLex"

This is the home page of Version 4 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Bulgarian, Croatian, Czech, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbian, Slovak, Slovene, Ukrainian, some or all of the following resources: The specifications and corpora use the TEI P5 Guidelines for the XML encoding; the schema and its documentation is available in the schema/ directory.

Publications:

  1. Tomaž Erjavec (2012): MULTEXT-East: Morphosyntactic Resources for Central and Eastern European Languages.
    Language Resources and Evaluation, 46/1, pp. 131-142.
  2. Tomaž Erjavec: MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora.
    Proc. of the LREC 2010, Malta, 19-21 May, 2010. [PDF]
  3. Tomaž Erjavec: MULTEXT-East Morphosyntactic Specifications: Towards Version 4 In: Proc. of the MONDILEX Third Open Workshop, Bratislava, Slovakia, 15-16 April, 2009. [PDF]
  4. Tomaž Erjavec: MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. In: Proc. of the Fourth Intl. Conf. on Language Resources and Evaluation, LREC'04, ELRA, Paris, 2004. [PDF]

Download

[about]

  1. free download:
  2. licenced download:

To get access to the licenced resources, please read the licence under which they are avaialbe and, if you agree with it, send an email requesting the resources to Tomaž Erjavec. You will then receive by email a username and password for full access to the resources, which you can browse on-line or download. The download files unpack into a mirror of this WWW site.

In published research please acknowledge the use of MULTEXT-East resources by citing the following paper:

Tomaž Erjavec (2012): MULTEXT-East: Morphosyntactic Resources for Central and Eastern European Languages. Language Resources and Evaluation, 46/1, pp. 131-142


Page last updated 2012-12-22, Tomaž Erjavec