COP project 106 Multext-East Deliverable D1.1 F-- Introduction

The purpose of this document is to (i) provide harmonized lexical specifications for the seven languages involved in the Multext-East project -- Bulgarian, Czech, Estonian, English, Hungarian, Romanian, Slovene -- and (ii) formulate the relevant notation to be used in the lexicons contributed by each language group.

The two proposals of lexicon specifications presented in the Multext D1-6-1B Deliverable of the Multext Project (Bel, Calzolari and Monachini eds. 1995) and in the Eagles document of the Lexicon sub-group on Morphosyntactic annotation (Monachini and Calzolari 1995) -- which is the basis of the previous one -- constitute together the starting point and the model of the work and results presented here.

The Multext-East partners have evaluated the two proposals from the point of view of the coverage with respect to their languages, have added the specifications needed to encode the peculiarities of their languages and have produced concrete applications of the proposed specifications. The work has been done through a cyclical process of adjustments and re-application, giving rise to three versions of this document. The initial IM1 version was coordination by PISA. This was followed by a significantly revised and validated version at project milestone M. This final, F version incorporates some more changes to the language specific tables, adds sections on the tagsets used by the languages, and addes Multext-East harmonised data for the English language. The production of the M and F versions was coordinated by LJUBLJANA.

The result is the common proposal for lexicon specifications of the Central & Eastern European languages contained in Chapter 2. On the basis of the Eagles and Multext models, they are presented as sets of attribute-values -- displayed in tabular format -- and the notation proposed follows the ``string of characters in fixed positions'' strategy.

The applications of the proposal to the seven languages are given in Chapter 3. They have been contributed respectively by:

N.Ide, G.Priest-Dorman, T.Erjavec
D.Tufis and A.M.Barbu;
T.Erjavec and P.Holozan;
R.Pavlov, L.Dimitrova, L.Sinapova and K.Simov;

The language application parts present the lexicon specifications category by category and are structured as follows:

a section describing the features and values pertinent to a given category in the form of tables with examples from the language in question;
a section providing the allowed combinations of values for the particular language.

Pisa, October 1995
Ljubljana, August 1997
Ljubljana, December 1997

