MULTEXT-East Morphosyntactic Specifications, Version 4


Up: Contents Next: 1. Background

The purpose of this document is to provide harmonised morphosyntactic specifications for sixteen languages: Bulgarian, Croatian, Czech, Estonian, English, Hungarian, Macedonian, Persian, Polish, Romanian, Russian, Ukrainian, Serbian, Slovak, Slovene, and the Resian dialect of Slovene.

These specifications are based on the proposals for lexicon specifications from the MULTEXT Project [mt:D161B] and from the EAGLES Lexicon sub-group on Morphosyntactic annotation [eagles:morphsyn], [eagles:morphana].

The MULTEXT-East specifications were initially produced in the scope of the EU MULTEXT-East project in 1995, and were, slightly modified, made available in 1998 on the TELRI CD-ROM "A Compendium of Multilingual Resources" [telri:CD], [lrec98:mtelex].

The second version, the so called "Concede edition" in 2001 [elsnews01:v2], [mte:nlprs] offered the addition of a new language, Croatian, and additions (for Romanian and Slovene) to the common tables in terms of attributes and values. The tables for Slovene had also been localised. The common tables had been made available additionally in XML, as TEI P4 feature libraries.

In Version 3, 2004 [mte:slav] two more languages were added (Serbian, and for a dialect of Slovene, Resian) and some minor errors found in the Concede edition were fixed.

The present Version 4, the "MONDILEX edition" [bib.mte-v4] offers six new languages: Russian, Macedonian and Persian, and, due to the support of the EU MONDILEX project, Slovak, Polish and Ukrainian. The specifications have also been converted from their old LaTeX format to XML, specifically to a schema that is based on the Text Encoding Initiative TEI P5 Guidelines.

Pisa, October 1995

Ljubljana, December 1997

Ljubljana, March 2001

Ljubljana, May 2004

Ljubljana, May 2010

Up: Contents Next: 1. Background

Date: 2010-05-12
This work is licensed under the Creative Commons licence Attribution-ShareAlike 3.0.