[Next] [Up] [Previous] [Contents]
Next: Background Considerations Up: Multext D1.6.1 B Previous: Contents

Introduction

The work carried out in this task aims at formulating harmonized specifications and at proposing a notation for the lexica and the tagsets, to be contributed by each language group involved in the MULTEXT Project.

MULTEXT's general aim is to develop tools for corpus annotation which contribute to the standardization of this kind of work in an academic and an industrial environment. These tools will be provided with resources from six different languages to ensure their validity. Resources used to feed the tools are, among others, lexical lists for the six languages, containing the necessary information to run the tools. Tools that will use lexica are mainly those which perfom morphological analysis and generation, and lexical lookup tools. MULTEXT proposes to deliver a morphological tool together with basic morphological rules and a number of base form entries, duly coded with respect to the rules. The morphological tool is intended to expand these base forms into word-form lists, with corresponding morphosyntactic information. These word-forms will, in turn, be used for the tagger, providing that a correspondence between the morphosyntactic information and the tags to be used by the tagger is defined. The morphological tool must guarantee extensibility of the MULTEXT tools, as it is thought to be used by end-users to enlarge lexical material treated by the tools. It is also expected that a morphological analysis will be able to perform a ``guess" on at least the category of unknown words and, where possible, on morphosyntactic features. Within MULTEXT, therefore, ``lexical list" refers to a list of forms with related information: both to base-form lexica, coded in such a way as to feed the morphological tool, and to the word-form lexica, containing relevant information for corpus annotation purposes.

At the first workpackage coordinators' meeting held in Paris, and as also reported in D1.6.1. (September 1994), it was agreed that in view of the urgent need for lexical lists for the creation of the tools, lexical lists of word-forms in a particular format could be supplied already in the first phase, meanwhile leaving for the second phase the development of base-form morphological lexica, input for the morphological tool. These word-form lexical lists were generated from the resources already available at the different sites. Further work will be done in order to ensure the complete mappability between the results of the morphological tool and the formalism proposed for lexical lists.

The present report is mainly devoted to the definition of the information associated with the word-form lists, from now on referred to as ``lexical descriptions". We provide here the notation to be used in the lists corresponding to each language to describe a given word-form. Major effort has been devoted to ensure compatibility between the three different types of information to be associated with a given word: morphological information, morphosyntactic lexical description and TAG label.

The present report is divided into four sections:


[Next] [Up] [Previous] [Contents]
Next: Background Considerations Up: Multext D1.6.1 B Previous: Contents

Multext