Document in ISO Latin 2
next up previous contents
Next: Contents Up: Main directory

Specifications and Notation for MULTEXT-East Lexicon Encoding

Bulgarian: R.Pavlov, L.Dimitrova, L.Sinapova, K.Simov
Croatian: M.Tadić
Czech: V.Petkevič
Estonian: H.J.Kaalep
English: N.Ide, G.Priest-Dorman, T.Erjavec
Hungarian: L.Tihanyi, T.Váradi, C.Oravecz
Romanian: D.Tufiş, A.M.Barbu
Slovene: T.Erjavec, P.Holozan, V.Gorjanc, M.Stabej

Tomaž Erjavec

Multext-East / Concede

March 21th, 2001

Supported by EU projects

Multext-East, Concede and TELRI

Morphosyntactic specifications V.2 Specifications and Notation for Lexicon Encoding


The purpose of this document is to (i) provide harmonised lexical specifications for eight languages -- Bulgarian, Croatian, Czech, Estonian, English, Hungarian, Romanian, Slovene -- and (ii) formulate the relevant notation that is used in the lexicons and annotated corpora contributed by the language groups.

This edition of the lexical specifications is the revised and expanded version of the MULTEXT-East report D11F. This report was, in turn, based on two proposals of lexicon specifications presented in the Multext D1-6-1B Deliverable of the Multext Project (Bel, Calzolari and Monachini eds. 1995) and in the Eagles document of the Lexicon sub-group on Morphosyntactic annotation (Monachini and Calzolari 1995).

The current 'Concede' version offers the addition of a new language, Croatian, and additions (for Romanian and Slovene) to the common tables in terms of attributes and values. The tables for Slovene have also been localised. The LATEX document has been converted to Latin-2 encoding, so that the language specific characters are correctly displayed in print and in the HTML version of this document.

Pisa, October 1995

Ljubljana, December 1997

Ljubljana, March 2001

next up previous contents
Next: Contents Up: Main directory
Tomaz Erjavec