next up previous contents
Next: Background work Up: Multext-East D1.1 M Previous: Contents

Introduction

COP project 106 MULTEXT-East Deliverable D1.1 M --- Introduction

The present document constitutes the Milestone M version of the Deliverable D1.1 carried out within the framework of the MULTEXT-East project. It is a significantly revised version of the Deliverable D1.1 for the Intermediate Milestone (IM1).

Its purpose is to (i) provide harmonized lexical specifications for the six languages involved in the project --- Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovene --- and (ii) formulate the relevant notation to be used in the lexicons contributed by each language group as resources for the tools, which will perform the automatic corpus tagging.

The two proposals of lexicon specifications presented in the MULTEXT \ D1-6-1B Deliverable of the MULTEXT Project (Bel, Calzolari and Monachini eds. 1995) and in the EAGLES document of the Lexicon sub-group on Morphosyntactic annotation (Monachini and Calzolari 1995) --- which is the basis of the previous one --- constitute together the starting point and the model of the work and results presented here.

The partners have evaluated the two proposals from the point of view of the coverage with respect to their languages, have added the specifications needed to encode the peculiarities of their languages and have produced concrete applications of the proposed specifications. The work has been done through a cyclical process of adjustments and re-application, giving rise to continuous exchanges between the task coordinators and the partners.

This cycle has led to the formulation of the common proposal for lexicon specifications of the Central & Eastern European languages contained in the present deliverable (Chapter 2). On the basis of the EAGLES and MULTEXT models, they are presented as sets of attribute-values --- displayed in tabular format --- and the notation proposed follows the ``string of characters in fixed positions'' strategy.

The applications of the proposal to the six languages are given in Chapter 3.They have been contributed respectively by:

Bulgarian:
R.Pavlov, L.Dimitrova, L.Sinapova and K.Simov;
Czech:
V.Petkevic;
Estonian:
H.J.Kaalep;
Hungarian:
L.Tihanyi;
Romanian:
D.Tufis and A.M.Barbu;
Slovene:
T.Erjavec and P.Holozan.

The language application parts present the lexicon specifications category by category and are structured as follows:

(a)
section which describes the features and values pertinent to a given category in the form of tables with exemplications from the language in analysis;
(b)
a second section which provides the combinations of values: the way in which different values combine together, giving rise to all the possible lexicon specifications for the items belonging to that category, is displayed.

Pisa, October 1995
Ljubljana, August 1996





next up previous contents
Next: Background work Up: Multext-East D1.1 M Previous: Contents



Tomaz Erjavec
Wed Oct 16 12:08:36 MDT 1996