Specifications and Notation for MULTEXT-East Lexicon Encoding
Bulgarian: R.Pavlov, L.Dimitrova, L.Sinapova, K.Simov
English: N.Ide, G.Priest-Dorman, T.Erjavec
Hungarian: L.Tihanyi, T.Váradi, C.Oravecz
Romanian: D.Tufiş, A.M.Barbu
Slovene: T.Erjavec, P.Holozan, V.Gorjanc, M.Stabej
Multext-East / Concede
March 21th, 2001
Supported by EU projects
Multext-East, Concede and TELRI
Morphosyntactic specifications V.2 Specifications and Notation for Lexicon Encoding
The purpose of this document is to (i) provide harmonised lexical specifications for eight languages -- Bulgarian, Croatian, Czech, Estonian, English, Hungarian, Romanian, Slovene -- and (ii) formulate the relevant notation that is used in the lexicons and annotated corpora contributed by the language groups.
This edition of the lexical specifications is the revised and expanded version of the MULTEXT-East report D11F. This report was, in turn, based on two proposals of lexicon specifications presented in the Multext D1-6-1B Deliverable of the Multext Project (Bel, Calzolari and Monachini eds. 1995) and in the Eagles document of the Lexicon sub-group on Morphosyntactic annotation (Monachini and Calzolari 1995).
The current 'Concede' version offers the addition of a new language, Croatian, and additions (for Romanian and Slovene) to the common tables in terms of attributes and values. The tables for Slovene have also been localised. The LATEX document has been converted to Latin-2 encoding, so that the language specific characters are correctly displayed in print and in the HTML version of this document.
Pisa, October 1995
Ljubljana, December 1997
Ljubljana, March 2001