Multext-East Resources Version 6 "CLARIN"

This is the home page of Version 6 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Albanian, Bulgarian, Chechen, Czech, Damaskini, English, Estonian, Hungarian, Lithuanian, Macedonian, Persian, Polish, Resian, Romanian, Russian, Serbo-Croatian, Slovak, Slovene, Torlak, and Ukrainian, the MULTEXT-East morphosyntactic specifications Version 6, defining harmonised word-level syntactic features and their mapping to MSD tagsets:

The other resources are still at version 4 (but will be upgraded in time) and consit of: The specifications and corpora use the TEI Guidelines for their XML encoding; the schema and its documentation is available in the schema/ directory.

In published research please acknowledge the use of MULTEXT-East resources by citing one of the following papers:


Page last updated 2022-03-24, Tomaž Erjavec