A brief description of this release is given in the ElsNews article
"The MULTEXT-East Resources
Revisited", and a longer one in:
Tomaž Erjavec:
Harmonised Morphosyntactic Tagging for Seven Languages and Orwell's 1984.
In the Proceedings of the
6th Natural Language Processing Pacific Rim Symposium,
NLPRS'01, pp. 487-492, 2001.
Version 2 of MULTEXT-East language resources contains, for
English, Romanian, Czech, Slovene, Bulgarian, Estonian, and Hungarian:
- The revised and expanded MULTEXT and EAGLES based
lexical morphosyntactic specifications,
in print form (HTML, PDF, LaTeX) and as TEI encoded feature structures.
The specifications are freely available for downloading or
browsing at http://nl.ijs.si/ME/V2/msd/
- The morphosyntactic lexica, totalling at least 15.000
lemmas per language, where each entry contains the word-form, its
lemma and morphosyntactic description; included is also a
high-precision automatically generated 7-way multilingual
lexicon.
- The corrected and TEI encoded
"1984" morphosyntactically annotated corpus,
with about 100.000 words per language. The corpus
includes 2-way and 7-way sentence alignments in CES (Corpus Encoding
Standard).
The lexica and corpus are freely available for research use - to
obtain them, please fill out and submit the
license agreement.
Page
http://nl.ijs.si/ME/V2/,
last updated 2002-12-09,
Tomaž Erjavec