The notation format proposed
to represent lexical descriptions consists of
linear strings of characters representing the morphosyntactic information to
be associated with word-forms. The string is constructed  following the
philosophy of the Intermediate Format proposed in the EAGLES Corpus proposal
(Leech and Wilson, 1994), i.e. of having agreed symbols in predefined and
fixed positions: the positions of a string of characters are numbered 0,
1,2, etc. in the following way:
a. the agreed character at position 0 encodes part-of-speech;
 
b. each character at position 1, 2, n, encodes the value of one attribute
(person, gender, number, etc.);
 
c. if an attribute does not apply, the corresponding position in the string
contains a special marker, in our case `-' (hyphen).
Example: Ncms- (noun,common,masculine,singular,nocase)
This notation  adopts  the  EAGLES Intermediate  Format with  a  small
revision:  the Intermediate Format  encodes information by
means of  digits,  while in MULTEXT characters of a mnemonic    nature
are preferred.
It  is  worth  noting here  that this  representation is proposed  for
word-form lists which will be used for a specific application, i.e.
corpus annotation.  We have
foreseen these  lexical descriptions as containing a full description of
lexical items. As
noted  above,  the  sets of tags, to be used properly for automatic
corpus annotation tools,  are expected to contain  less  information.
These lexical descriptions can be seen as notational variants of the feature-based notation in the form of attribute-value pairs. In fact, the string notation proposed, e.g.
Ex.: Ncms- (noun,common,masculine,singular,nocase)is completely synonymous to a feature-structure representation:
Ex.: {cat=noun, type=common, gender=masculine, number=singular, case=none}
or
         {cat=noun, type=common, gender=masculine, number=singular}
The above feature structures are often also represented as follows:
                    +-                   -+
                    | Cat:    Noun        |
                    | Type:   common      |
                    | Gender: masculine   |
                    | Number: singular    |
                    +-                   -+
Formal  characteristics  relevant  for  our  applications  have been
kept.
Use of
position  in the string to
encode  attributes makes no restrictions  on  the  set  of
characters to be used as values.  It could then be inferred that, if we
wanted to keep the formal characteristic of order independent notation,
we would  have  to make  sure  that the characters  meant to represent
attribute-values
are not ambiguous.  As attributes and values  are linked by
positional  criteria,   the  need   of  a   special  marker  for  void
attribute-value  pairs  is  evident  if  we want to keep  descriptions
coherent.  Thus, the ``Ncms-"
style can be viewed as a short-hand notation
convenient   for  some  users   and  straightforwardly  mappable  to  the
information   used    in   unification-based   attribute-value   pairs
formalisms.
When comparing MULTEXT lexical description representation format with other notations one must keep in mind that they are intended to describe word-forms, and are used in very large lexical lists which contain word-forms. It seems to us relevant to comment on this point because, although it can be justified (and we will do so below) that the same formal operations can be declared in both styles, there is little evidence for justifying the need of operations such as negation and disjunction of features and values when applying them to tagged word-forms as a result of corpus annotation.
We call this marker `not-applicable'[+],
and, as stated
above, its function is just to keep
the relationship established between attributes and values. It might be
used for  the  following  cases  (it  has  been  proposed to  use  the
`not-applicable'  marker  in  order  to  encode  the  case  of  a
not-applicable
feature for a particular language.  However this  decision
is  still  under  discussion  due to  the  facts reported  in
section
``Comparison  of  attributes/values   used  by  languages"):
a.   not
applicable given a particular combination  of attributes/values,  i.e.
although the attribute  applies to the category in a given language, it
does not apply to a particular subclass of the category.
b. not applicable to a particular lexical item, although the attribute
applies to the rest of its paradigm.
Example: in the description of pronouns, for personal pronouns the grammatical person is to be encoded, but for demonstrative pronouns it is avoided; in this case '-' is applied following (a). On the other hand, gender cannot be informative for some personal pronouns, but it is still relevant for other personal pronouns; the application of `-' follows (b):
Pd-ms "Este" Pronoun, demonstrative, masculine, singular. Pp1-s "Yo" Pronoun, personal, first, singular. Pp1mp "Nosotros" Pronoun, personal, first, masculine, plural. Pp1fp "Nosotras" Pronoun, personal, first, feminine, plural.Their uses are clearly not equivalent, but there would only be meaningful differences would occur in highly typed theories of lexical description. For illustrating this point let'us have the following type system for pronouns:
TYPES           SUBTYPES             ATTRIBUTES     VALUES
Pronoun                               gender        masculine
                                                    feminine
                                      number        singular
                                                    plural
               Demonstrative
               Personal                person          1
                                                       2
                                                       3
For this system, gender and number attributes belong to the set of
features which describe all pronouns. Person will only belong to the
set of features which describe personal pronouns - in addition to
gender and to number. Applied to this type system,
case  (a)  would
mean that the  attribute-value  pair does not
belong to the  set  of features which  describe a subtype,  while  (b)
would mean  indeterminacy  of  a  given  word-form  (which  could be
expressed  as  a  disjunction  of all the  values for  the  particular
attribute or  leaving a void for
the  value, being open to unification;
this choice mainly depends  on the  purpose of the  description,  e.g.
syntactic parsing).              |phon                este|
              |cat     |gender    masc||
              |   'dem'|number    sing||
              |phon                yo   |
              |cat      |gender    []  ||
              |   'pers'|number    sing||
              |         |person    1   ||
In  simpler flat type systems where distinctions are made only for the
generic type ``pronoun",  both cases a. and b. will be
treated by unification mechanisms in the same way.
From the conversion point  of  view,  we have to be concerned with the
output of the MULTEXT morphological tool, as it  will  be  the source of
word-form  lexical lists.  The
Mmorph  tool  does not  incorporate a highly
hierarchical typing system and thus no problems are expected  in
converting Mmorph output into lexical descriptions  of the proposed
format, if desired.  The
results from applying the Mmorph tool
will probably (it  strongly  depends  on  implementation
strategies) be the following:
1. a non present attribute in the description attached to the
word-form;
 
2. a disjunction expression, i.e. {gender=masc|fem};
 
3. encoded as a third possible value, i.e. {gender=none}.
The simplest case for converting would be the third one, as then automatic non-intelligent conversion is possible. In the first two cases the conversion routine will have to make some inferences on type declarations. It is also expected, that when converting from other lexical sources, special conversion routines will have to be used. As seen above, the conversion from ``Ncms" lexical description notation into other unification based format will only be difficult if the target formalism is a highly typed system. If this is not the case, the presence of the ``not-applicable" marker will have to be converted into a special value or into nothing, leaving it open. For conversion into highly typed system it might be useful to have cases (a) and (b) marked by different characters, in order to guide an intelligent conversion routine to the desired results.
The tags (see the examples below) used to exemplify issues and
problems to be dealt with in the mapping between lexical descriptions
and corpus tags, come from the tagsets proposed in the
language-specific applications of four of the MULTEXT partners.
These tagsets (containing dfferences among them, because
constructed on the  basis of tagging practices already
used by the  partners)  should be considered as a preliminary proposal
to  be discussed  for harmozation
and refined  after experimentations on  the  MULTEXT
tagger.
Mapping of these lexical descriptions into corpus tags has also
been taken into
account. It is also considered desirable to see whether under-informative
corpus tags can be directly mappable to the lexical descriptions each one
subsumes.
Decisions about corpus tags are language dependent. The information to
be encoded  depends on  the  ability of  a given tool to  disambiguate
between  different potential lexical descriptions  for a
given word-form.  We have already  mentioned  the  key concepts to  be
applied for defining sets of  corpus tags  in the preceding  sections.
Therefore   one  can  first  assume  that  the  mapping  from  lexical
descriptions onto corpus tags can be  done with conversion tables which
relate two different items: corpus tags and lexical descriptions. These
tables  are  likely to be  modified many  times  in  the course of the
project, based on experimentation with the disambiguation tool.
An example of such mappings is:
Lex.spec. TAG Definition Pp1msa- P1S Personal pronoun, first person, masc. sing. accusative Px1msa- P1S Reflexive pronoun, first person, masc. sing. accusative Pp1fsa- P1S Personal pronoun, first person, fem. sing. accusative Px1fsa- P1S Reflexive pronoun, first person, fem. sing. accusative Pp1msd- P1S Personal pronoun, first person, masc. sing. dative Px1msd- P1S Reflexive pronoun, first person, masc. sing, dative Pp1fsd- P1S Personal pronoun, first person, fem. sing., dative Px1fsd- P1S Reflexive pronoun, first person, fem. sing., dativeAll these lexical descriptions correspond to the Spanish form ``me". For this word-form the tags P1S - which conflates all the possible lexical descriptions - has been decided on the basis of the assumption that an automatic tool would have disambiguation problems in assigning the correct analysis among all the lexical descriptions. The correct analyis of this word-form would require syntactic analysis.
The mapping from the lexical descriptions to the corpus tags should be
applicative,  that is, ``each lexical description should map to one and
only one corpus tag,  while it is not possible to do
the  reverse" due to the
limitations of current tagging
techniques. The situation where corpus tags
are  more  precise  than  a lexical  description  (i.e.  one lexical tag
corresponds to more than one corpus tag) should be, in principle,
avoided.
In order to avoid redundancy in the conversion tables and to make tag optimization work easier, it has been proposed to study the possibility of having intermediate representations which prepare the conflation of information and which facilitate automatic mapping from lexical descriptions onto tags. This intermediate internal notation makes use of ``regular expressions" which incorporate operators in order to sum up the information referred by different lexical descriptions and conflated in a given tag. For the example given above, the resulting regular expression may incorporate two operators: ``match any" (.), ``list" ([]) - other possible operators proposed are ``disjunction" | and negation .
P[px]1.s[ad]- P1SHowever, the application of such regular expressions is still being studied as its use conveys some requirements on the conflation of lexical descriptions and on the construction of corpus tags. An example will illustrate the issues to be taken into account. For Spanish, first and third person of some tenses are homographs. This can be taken into account when conflating information:
Verbal paradigm regular exp. TAG cantaba, comi'a, veni'a Vmii[13]s- VMIIS cantari'a, comeri'a, vendri'a Vmcs[13]s- VMCSS cante, coma, venga Vmsp[13]s- VMSPS cantara, comiera, viniera Vmsi[13]s- VMSIS
For Italian, the conflation of information on homographs also in the verbal paradigm may cause problems to the applicative principle mentioned above:
Verbal paradigm lex.descr.    regular exp.           TAG
premiate        Vmip2p-       Vm([ims]p2p-)|(ps-pf)  VMP2IMCPP
                Vmmp2p-
                Vmsp2p-
                Vmps-pf
leggete         Vmip2p-       Vm[im]p2p-             VMP2IMP
                Vmmp2p-
leggiate        Vmsp2p-       Vmsp2p-                VP2CP
lette           Vmps-pf       Vmps-ps                VFPPR
As can be seen, if we use tags such as the ones above
which are based on the principle ``one graphical form - one tag",
there is a violation of the applicative principle,
i.e.  the same  lexical description  will correspond to two  different
tags, because of different conflation clusters.
In  general,
it is observed
that the use of operators in regular expressions
results in  a form of marking the information which is not
going to be expressed in the corpus tag. Thus, tags would have to contain
less information than the regular expression and hence than the lexical
description.
Another issue to be considered is the following. Having tags with little lexical information, as in the following French example, may lead to another problematic issue in cases where such regular expressions are also used in helping to recover all possible lexical information from a given ``under-specified" corpus tag. The mapping from the regular expression onto lexical descriptions will also have to take into account the word-form in order to reject possible descriptions which do not correspond to the tagged word-forms. Below are some examples from the proposed verbal tags and regular expressions:
TAG       Regular expression  Lexical descriptions  Possible word-forms
VM1P      Vm[iscm][pifs]1p--  Vmip1p--            venons
                              Vmii1p--            venions
                              Vmif1p--            viendrons
                               ...                  ....
Let us consider that
the word ``venons" is tagged as ``VM1P". If we want to know which are the
lexical description to which the tag can be referring to, the explosion of
the information contained in the regular expression will also
give lexical
descriptions which do not correspond to the word ``venons", but to other
words. Regular expressions can only map a given tag for a word-form into
all possible lexical descriptions for such a word-form if
the information conflated only
reflects ambiguities due  to  homography. Only
with this criterion for defining tags, all
the  possible  lexical descriptions subsumed  by the  corpus tag and
expressed in the regular expressions will be true of a given
tagged word.If the criterion for conflating information is limited to homograph ambiguities, we see - as in the following example - that all possible lexical descriptions expanded from the regular expression are true of a given word-form.
TAG       Regular expression  Lexical descriptions  Possible word-forms
VSXICP    Vm(sp.s)|(ip2s)-         Vmip2s-             ami
                                   Vmsp1s-             ami
                                   Vmsp2s-             ami
                                   Vmsp3s-             ami
As mentioned in the section ``Comparison of Attributes/values
used  by  languages",  the application  of  the  proposed  operators in
regular expressions for avoiding redundancy, in some cases,
is  not  needed if lexical
expressions already  encode  the possibility  of  having,
for a given word-form,
more than one possible lexical description. This is the case with
the proposed values ``common"  for gender, ``invariant"  for number (in
Italian), or ``object" for case (in French pronouns).Almost all the languages treated in MULTEXT have nouns, adjectives, determiners (among others) which have the same word-form both for feminine and masculine agreement. The Italian group has proposed a value for gender named ``common" which avoids having to write two different entries with the same word-form, but with different lexical descriptions. In fact, this use of a special value advances the possible use of proposed operators in the regular expression.
word-form lexical description regular expression TAG insegnante Nccs- Nccs- NNScould also be expressed as:
word-form      lexical description regular expression  TAG
insegnante     Ncms-               Nc[mf]s- or Nc.s-   NNS
               Ncfs-               or Nc(m|f)s
The need, as well as the
consequences, for  the mapping between
lexical descriptions  and
corpus tags, of  the regular  expressions must still  be regulated.  It
should be noted that regular  expressions can be regarded
as a  convenient way  to  map the  lexical descriptions to the  corpus
tags   since, in  many cases,  the information in the  lexicon is more
precise  than the information we can/want  to have in  the corpus  tag
set.  Such a mapping  still seems very  interesting because there  are
many corpus  tag  systems,  even  for the same language,  which makes it
extremely  difficult  to  relate  the  one  to   the  other.   Regular
expressions  could act as a common reference for the different systems
to make comparison  easy.  Besides,  regular  expressions  could  make
translations  between the lexical description  and corpus  tags
easier and
enable the automatic generation of conversion tables.
The categories listed below with the relevant attributes and values are
based on EAGLES documents and are the results of a first testing based
on a proposal made by Veronis et al. 1994 for lexical specifications in
MULTEXT.
As it has already been mentioned in the section ``Background
considerations" that propose features for describing lexical items
of different languages aiming at defining a set which can be
said ``common" for all of them is a complex task. The underlying
philosophy for this task has then be to lead different groups into a
pragmatic solution where the concept of an "harmonized" set of features
could be reached. 
The groups have first worked out
their lexical descriptions taking as input
EAGLES and Veronis et al. (1994) documents. The very general criterion
was to encode those proposed features which were considered relevant for
the language in question. Therefore MULTEXT also followed EAGLES
bottom-up methodology in trying to define extensively the features
``used" in the lexical descriptions for each group language,
as this procedure will
make evident the features commonly used. After this phase, whose result
can now be seen in the section ``Comparison of attribute/values used by
the groups", a new phase is envisaged as to accomodate language-specific
considerations into a general model to be used by MULTEXT. This
accomodation must take into account extensibility to other languages and
also application motivated arguments, as well as internal coherence.
For this new phase more specific criteria would be desirable with
respect the addition of new features to the EAGLES Level-1 set. The
aimed result is a ``harmonized" set of features which properly describe
lexical items of the different languages. 
Following   the  general   aim   of  the  project,   these  harmonized
specifications - and  the related resources - will contribute to the
standarization of  the corpus  annotation work.  They are  supposed to
serve as a user oriented additional characteristic of our tool package
in the sense that end-users  will have a common ground  for inspecting
and  understanding  the resources and  tool results  independently
to  a
large extent of the language. This common set of features will also be
a common  ground to perform comparisons of  different  annotation tool
results,  because, as mentioned in the previous section, the existence
of  many lexical description systems is causing nowadays a problem for
comparing results. 
Therefore the categories and features listed below are the
common reference for the work done by the
different groups. Further discussion on this first proposal is to be
found in the section ``Comparison of the attributes/values used by the
groups" which is in turn to define criteria for changing this first
proposal.
Tables of categories
=============== ====
Part-of-Speech  Code
=============== ====
Noun            N
Verb            V
Adjective       A
Pronoun         P
Determiner      D    (for those who do not have a separate category
Article         T     for Articles, these are included in Determiner)
Adverb          R
Adposition      S
Conjunction     C
Numeral         M
Interjection    I
Unique          U
Residual        X
Abbreviation    Y
=============== ====
Each character at  positions  1,  2,  etc.  encodes the  value  of one
attribute  (person,  gender,  number,  etc.),  according to the tables
given below.
2.2.2 Attribute/value tables
----------------------------
Abbreviations used:
  P   Position (starts with 0 for encoding PoS values)
  ATT Attribute name
  VAL Value
  C   Code
1. Nouns (N)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           common         c
                 proper         p
- -------------- -------------- -
2 Gender         masculine      m
                 feminine       f
                 neuter         n
- -------------- -------------- -
3 Number         singular       s
                 plural         p
- -------------- -------------- -
4 Case           nominative     n
                 genitive       g
                 dative         d
                 accusative     a
= ============== ============== =
2. Verbs (V)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           main           m
                 auxiliary      a
                 modal          o
- -------------- -------------- -
2 Mood/VForm     indicative     i
                 subjunctive    s
                 imperative     m
                 conditional    c
                 infinitive     n
                 participle     p
                 gerund         g
                 supine         s
                 base           b
- -------------- -------------- -
3 Tense          present        p
                 imperfect      i
                 future         f
                 past           s
- -------------- -------------- -
4 Person         first          1
                 second         2
                 third          3
- -------------- -------------- -
5 Number         singular       s
                 plural         p
- -------------- -------------- -
6 Gender         masculine      m
                 feminine       f
                 neuter         n
= ============== ============== =
3. Adjectives (A)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           qualificative  f
                 ordinal        o
                 cardinal       c
                 indefinite     i
                 possessive     s
- -------------- -------------- -
2 Degree         positive       p
                 comparative    c
                 superlative    s
- -------------- -------------- -
3 Gender         masculine      m
                 feminine       f
                 neuter         n
- -------------- -------------- -
4 Number         singular       s
                 plural         p
- -------------- -------------- -
5 Case           nominative     n
                 genitive       g
                 dative         d
                 accusative     a
= ============== ============== =
4. Pronouns (P)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           personal       p
                 demonstrative  d
                 indefinite     i
                 possessive     s
                 interrogative  t
                 relative       r
                 exclamative    e
                 reflexive      x
                 reciprocal     l
- -------------- -------------- -
2 Person         first          1
                 second         2
                 third          3
- -------------- -------------- -
3 Gender         masculine      m
                 feminine       f
                 neuter         n
- -------------- -------------- -
4 Number         singular       s
                 plural         p
- -------------- -------------- -
5 Case           nominative     n
                 genitive       g
                 dative         d
                 accusative     a
                 oblique        o
                 object         j
- -------------- -------------- -
6 Possessor      singular       s
                 plural         p
= ============== ============== =
5. Determiners (D)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           demonstrative  d
                 indefinite     i
                 possessive     s
                 interrogative  t
- -------------- -------------- -
2  Person        first          1
                 second         2
                 third          3
- -------------- -------------- -
3  Gender        masculine      m
                 feminine       f
                 neuter         n
- -------------- -------------- -
4  Number        singular       s
                 plural         p
- -------------- -------------- -
5  Case          nominative     n
                 genitive       g
                 dative         d
                 accusative     a
                 oblique        o
- -------------- -------------- -
6  Possessor     singular       s
                 plural         p
= ============== ============== =
6. Articles (T)
= ============ ===============  =
P ATT          VAL              C
= ============ ===============  =
1 Type         definite         d
               indefinite       i
------------- ----------------  -
2 Gender       masculine        m
               feminine         f
               neuter           n
------------- ----------------  -
3 Number       singular         s
               plural           p
------------- ----------------- -
4 Case         nominative       n
               genitive         g
               dative           d
               accusative       a
= ============ ================ =
7. Adverbs (R)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           general        g
                 particle       p
- -------------- -------------- -
2 Degree         positive       p
                 comparative    c
                 superlative    s
= ============== ============== =
8. Adpositions (S)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1  Type          preposition    p
                 postposition   t
                 circumposition c
- -------------- -------------- -
2 Formation      simple         s
                 compound       c
= ============== ============== =
9. Conjunctions (C)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           coordinating   c
                 subordinating  s
= ============== ============== =
10. Numerals (M)
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           cardinal       c
                 ordinal        o
- -------------- -------------- -
2 Gender         masculine      m
                 feminine       f
                 neuter         n
- -------------- -------------- -
3 Number         singular       s
                 plural         p
- -------------- -------------- -
5 Case           nominative     n
                 genitive       g
                 dative         d
                 accusative     a
= ============== ============== =
11. Interjections (I)
12. Unique membership class (U)
13. Residual (X)
14. Abbreviations (Y)
The  following tables  reflect  the  attributes  and values  used  for
lexical description in MULTEXT.  They take into account input supplied
by different groups  which the  reader  can  find  further detailed in
specific  language application annexes (see section 5).
It is worth noting  that the
tables reflect both
features used by five groups using lexical descriptions
as proposed  in previous  versions of this  document and also
features for Dutch
resulting   from  morphological  generation  with  the
``mmorph"  tool.   The
comparison  is certainly
of  help in  order to
have    a  clear picture of  the  level of consensus
reached with respect to harmonization when elaborating lexical lists.
It has already been mentioned in previous sections that the project is
working  towards  defining  criteria  for the  application  of  EAGLES
guidelines  on   standardization   of   lexical  resources   for   easy
re-usability.
Looking at  the  tables  below  some very  general issues  arise  with
respect to the  application  criteria  of  the  general  tables  to  the
particular languages and the interpretation of the general guidelines.
Until now groups  have  been working on the assumption that  they must
encode recommended Level-1 EAGLES features if they  are relevant
to their languages. The possibility of adding new
values and new attribute/value pairs was also foreseen
if recommended features
were not  enough  to describe  lexical  items with
fine-granularity  of
lexical  descriptions. It was also
found  useful  in view of
supplying lexical material
to be used  by other  tools than the MULTEXT ones.  This
openness has led to a number of     incoherencies with respect to
application criteria which we  summarize
in the  points below.  A  decision  with respect to general  criteria for
application must be reached in the
next phase. Hence, here the
issues concerning harmonization which arise from comparing application
sections follow.
There is an unbalanced treatment of features considered  as ``general"
for the different categories.  The presence of  a particular attribute
seems to be mainly justified for two reasons:
- representative in most of the studied languages;
 
- linguistic tradition.
We see in the comparative tables that a particular language is allowed
to add a new
attribute because of its relevance  for  the lexical description of  a
given category (i.e.
when the  language  items belonging to  that category are
inflected or marked with respect  to it).  The most evident case is the
proposal  made  in  order  to  encode  Possessor-gender  (among  other
features)  for  Pronouns and  Determiners.  It  is obvious  that  this
feature cannot be used  by languages which do not have different forms
regarding this particular distinction.
On  the other  hand,  note that ``case"  as  a feature  recommended as
``general"  for describing Nouns,  Adjectives,  Pronouns,  Determiners,
Articles and  Numerals,  is in fact used  only for Pronouns by most of
the  languages,  and  only  German  can  apply  it  for  the  rest  of
the categories.
What we mean by ``unbalanced treatment" has to do  with the  fact that
features being used by just a few languages,  or even just one,  receive
different  treatment  when  considering them ``general"  or ``language
specific".
Also arising   from  the  possibility  of  adding  language   specific
attributes and values where relevant  for a  given description,  the
procedure  followed in this task   has shown  that it has not
been easy  to reach a consensus in order to harmonize a number of
specific features
and  values  considered by a given language.
One of
cases of such proliferation  of features  is seen, in fact,
when considering  the
comparative table for Determiners.  One of the groups suggests having
language  specific  types  to  refer   to ``definite  article"  and
``indefinite article",  while other groups prefer to  have a general
type
``article"  and other attributes, i.e. Quantification or Definiteness to
encode this distinction at a lower level.           The particular
features suggested by the groups which, in our opinion, could be adapted
to the EAGLES model will be discussed during the next phase.
Because of this  openness with respect to
adding attibutes  and  values,  we
would  like  to  point  out  the  case  for the  values ``common"  and
``invariant" added by the Italian descriptions to all nominal inflected
categories for the attributes ``gender"  and ``number" respectively
(where a disjunction of values could be used instead). It
is a fact that most of the languages could easily adopt this value for
the forms which are identical for masculine/feminine,  singular/plural
agreement  features, but this issue has certainly to be clarified and
further discussed.
Probably  a  decision  with  respect to  the ``fine
granularity"  of lexical descriptions should be devised. In fact there
is another example in another category of the same strategy,  that
is to conflate in a new value for a given attribute a homography
which causes  explosion  of  entries. The
French group  has  suggested conflating
accusative/dative values for pronominal case into ``object" as
a generic value.  The  new division proposed would also apply to other
romance  languages  but it might  compromise  the ``fine  granularity"
tendency the project aimed at for lexical descriptions.
There can also be observed a certain confrontation of two different
traditions when some
groups propose to add a new attribute to characterise an
element while others propose  to add a new value to label a new class
under a general,
already available attribute such as ``type". To add a new
attribute would correspond to the unification based grammar practice,
and a label for a class would correspond to the so called ``taxonomic"
theories. We see an example of this confrontation in the proposal of
having an attribute/value ``wh" for marking relative particles in
different categories: pronouns, determiners, adverbs. EAGLES level-1
seems  to prefer separating relatives with a different value for the
attribute ``type" of pronouns and determiners. Marking as an additional
feature the relative characteristics of a given pronominal would help
for instance to specifically characterise items such as the
English ``whose"
or Spanish ``cuyo, cuya, cuyos, cuyas" which are normally described as
Possessive relative pronouns. Under the current classification a decision
must be made either
to put them under the Possessive or the Relative value of
the attribute ``type". It has also to be mentioned that no special
treatment can be made for relative adverbs which are not taken as a
separate class under Adverb type in the EAGLES proposal. Thus, from the
comparison made, it is worth mentioning that a new
attribute ``wh" for adverbs or, as suggested by the German group, a new
value for interrogative - and also for relatives -
adverbs should be devised.
As we have seen, the
EAGLES recommendations lack in some cases the desired
fine-grained distinctions which groups working in MULTEXT consider
desirable for our applications. Another example of this case is raised
by German and English. The groups dealing with these languages - and it
could also be applied to the rest of the
languages - have suggested a
specific value for comparative conjunctions. This addition seems
reasonable under the argument that it is an important feature with
respect to distributional criteria and can be of  great importance for
tagging purposes. Again, some guidelines must be defined for considering
the addition of features not contained in EAGLES level-1, but it is
worth noting that several of the features added for language specific
reasons could be considered as applicable to the rest of the
 languages.  
We recommend a new round of discussions
 on the new features suggested in
specific language applications to see whether they can be of use in
our concrete application and applicable to the rest of the
languages. Once
this discussion has led to conclusions, the approved features must be
included in the general model. Besides linguistic considerations,
having an agreed set of general features is of great concern for the
chosen notation style in lexical descriptions.
There must be regulations with
respect to the encoding of language specific attributes by the other
groups or on the ways of differentiating them from those of the general
model. This is especially relevant if general conversion routines are to
be developed. And because of theroretical coherence, the treatment
given to these features must take into account the above mentioned
``unbalanced treatment of features". Some other doubts remain in
connection with theoretical coherence and the applied nature of the
lexica to be supplied. We would only mention one of them  to
illustrate the kind of issues which must be taken into account in the
next phase. It has to do with agreement features of person, number and
gender.  Are they to be encoded with respect to grammatical  agreement
or with respect to  semantic differentiations.  As  it is now,  following
EAGLES  recommendations,  it seems as if only  semantic considerations
are taken into account, i.e.  Possessive-person
of determiners is taken as
the ``possessor person"  for most of the languages which  in fact does
not trigger agreement. 
A decision must be taken with respect to these cases and more specific
guidelines  must  be  established for  further  development of lexical
descriptions.  It seems  from the  comparison made  that  the  general
criteria ``relevant for your language"  is not enough.  New guidelines
must also take the application side into account.
Comparison tables
Abbreviations used:
    P   = Position (starts with 0 for encoding PoS values)
    ATT = Attribute name
    VAL = Value
    C   = Code
    x   = value marked by a given group (any character other than x
          means that a given 'language group' codes, in their
          application, the relevant value with that character, not using
          the agreed one).
          The column of characters is left empty in correspondence of
          language specific attributes/values of
          Dutch: they  are attested, in fact, among the set of attributes
          and values for Dutch implementation of Mmorph, where they are
          not represented by means of single codes.
1. Nouns (N)
             Features used by the groups IT DE ES FR NL EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           common         c        x  x  x  x  x  x
                 proper         p        x  x  x  x  x  x
- -------------- -------------- -
2 Gender         masculine      m        x  x  x  x     x
                 feminine       f        x  x  x  x     x
                 neuter         n           x           x
        l-s.     common         c        x
        l-s.     De                                  x
        l-s.     Het                                 x
        l-s.     None                                x
- -------------- -------------- -
3 Number         singular       s        x  x  x  x  x  x
                 plural         p        x  x  x  x  x  x
        l-s.     invariant      n        x
- -------------- -------------- -
4 Case           nominative     n           x
                 genitive       g           x
                 dative         d           x
                 accusative     a           x
= ============== ============== =
5 Sem-gender     M                                   x
                 F                                   x
                 N                                   x
- -------------- -------------- -
2. Verbs (V)
                                Features used by the groups
                                        IT GE SP FR DU EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           main           m        x  x  x  x  x  v
                 auxiliary      a        x  x  x  x  x  x
                 modal          o           x  x        m
        l-s.     copula                              x
        l-s.     impersonal                          x
- -------------- -------------- -
2 Mood/VForm     indicative     i        x  x  x  x
                 subjunctive    s        x  x  x  x
                 imperative     m        x  x  x  x
                 conditional    c        x     x  x
                 infinitive     n        x  x  x  x  x
                 participle     p        x  x  x  x
                 gerund         g        x     x
                 supine         s
                 base           b                       x
        l-s. inf. + particle    u           x
        l-s. ImPart                                  x
        l-s. Past participle
        l-s. Present participle
        l-s. PerfPart                                x
        l-s. Fin                                     x
- -------------- -------------- -
3 Tense          present        p        x  x  x  x  x  x
                 imperfect      i        x  x  x  x
                 future         f        x     x  x
                 past           s        x     x  x  x  x
- -------------- -------------- -
4 Person         first          1        x  x  x  x  x  x
                 second         2        x  x  x  x  x  x
                 third          3        x  x  x  x  x  x
- -------------- -------------- -
5 Number         singular       s        x  x  x  x  x  x
                 plural         p        x  x  x  x  x  x
- -------------- -------------- -
6 Gender         masculine      m        x     x  x
                 feminine       f        x     x  x
                 neuter         n
        l-s.     common         c        x
= ============== ============== =
7 Clitic l-s.    no             n        x  x
                 yes            y        x  x
- -------------- -------------- -
8 Clitic  l-s.   both           t              x
                 accusa         a              x
                 dative         d              x
- -------------- -------------- -
3. Adjectives (A)
                                Features used by the groups
                                        IT DE ES FR NL EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           qualificative  f           x  x  x
                 ordinal        o           x     x
                 cardinal       c           x     x
                 indefinite     i                 x
                 possessive     s           x  x  x
        l-s.     part1          1           x
                 part2          2           x
- -------------- -------------- -
2 Degree         positive       p        x  x  x  x  x  x
                 comparative    c        x  x  x  x  x  x
                 superlative    s        x  x  x     x  x
- -------------- -------------- -
3 Gender         masculine      m        x  x  x  x
                 feminine       f        x  x  x  x
                 neuter         n           x
        l-s.     common         c        x
- -------------- -------------- -
4 Number         singular       s        x  x  x  x
                 plural         p        x  x  x  x
        l-spc.   invariant      n        x
- -------------- -------------- -
5 Case           nominative     n           x
                 genitive       g           x
                 dative         d           x
                 accusative     a           x
= ============== ============== =
6 Position l-spc. attributive   a                       x
                  predicative   p                       x
- -------------- -------------- -
4. Pronouns (P)
                                Features used by the groups
                                        IT DE ES FR NL EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           personal       p        x  x  x  x  x  x
                 demonstrative  d        x  x  x  x  x
                 indefinite     i        x  x  x  x
                 possessive     s        x     x  x     x
                 interrogative  t        x  x  x  x  x
                 relative       r        x  x  x  x  x
                 exclamative    e        x
                 reflexive      x           x        x  x
                 reciprocal     l                    x
      l-s.       general        g                       x
      l-s.       quantificational                    x
- -------------- -------------- -
2 Person         first          1        x  x  x  x  x  x
                 second         2        x  x  x  x  x  x
                 third          3        x  x  x  x  x  x
- -------------- -------------- -
3 Gender         masculine      m        x  x  x  x
                 feminine       f        x  x  x  x
                 neuter         n           x  x  x
        l-s.     common         c        x
- -------------- -------------- -
4 Number         singular       s        x  x  x  x  x  x
                 plural         p        x  x  x  x  x  x
        l-s.     invariant      n        x
- -------------- -------------- -
5 Case           nominative     n           x  x  x
                 genitive       g           x
                 dative         d           x  x
                 accusative     a           x  x
                 oblique        o              x  x
                 object         j                 x
        l-s.     1                                   x
        l-s.     4                                   x
- -------------- -------------- -
6 Possessor      singular       s              x  x     x
                 plural         p              x  x     x
= ============== ============== =
7 Wh             Not-wh         n                       x
                 Relative       r                       x
                 Int            q                       x
- -------------- -------------- -
8 Poss-person    First          1                       x
                 Second         2                       x
                 Third          3                       x
- -------------- -------------- -
9 Poss-gender    Masculine      m                       x
                 Femenine       f                       x
                 Neuter         n                       x
- -------------- -------------- -
10Sem-gender     M                                   x
                 F                                   x
                 N                                   x
- -------------- -------------- -
5. Determiners (D)
                                Features used by the groups
                                        IT DE ES FR NL EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           demonstrative  d        x  x  x  x  x
                 indefinite     i        x  x  x  x
                 possessive     s        x  x  x  x  x  x
                 interrogative  t        x  x  x  x
                 exclamative    e        x
                 relative       r        x
                 article        a                 x  x
        l-s.     Def-article    t                       x
        l-s.     Indef-article  a                       x
        l-s.     General        g                       x
        l-s.     quantificational                    x
- -------------- -------------- -
2  Person        first          1        x  x  x  x
                 second         2        x  x  x  x
                 third          3        x  x  x  x
- -------------- -------------- -
3  Gender        masculine      m        x  x  x  x  x
                 feminine       f        x  x  x  x  x
                 neuter         n           x  x     x
        l-s      common         c        x
- -------------- -------------- -
4  Number        singular       s        x  x  x  x  x  x
                 plural         p        x  x  x  x  x  x
        l-s      invariant      n        x
- -------------- -------------- -
5  Case          nominative     n           x
                 genitive       g           x
                 dative         d           x
                 accusative     a           x
                 oblique        o
- -------------- -------------- -
6  Possessor     singular       s               x  x
                 plural         p               x  x
= ============== ============== =
7  Quantif./or   definite       d                  x  x
   Defness       indefinite     i                  x  x
- -------------- -------------- -
8  Wh            Not-wh         n                        x
                 Relative       r                        x
                 Int/Ecl        q                        x
- -------------- -------------- -
9 Poss-person    First          1                        x
                 Second         2                        x
                 Third          3                        x
- -------------- -------------- -
10 Poss-gender   Masculine      m                        x
                 Feminine       f                        x
                 Neuter         n                        x
- ------------  --------------- -
6. Articles (T)
                                Features used by the groups
                                        IT DE ES FR NL EN
= ============ ===============  =
P ATT          VAL              C
= ============ ===============  =
1 Type         definite         d        x  x  x
               indefinite       i        x  x  x
------------- ----------------  -
2 Gender       masculine        m        x  x  x
               feminine         f        x  x  x
               neuter           n        x  x  x
        l-s.   common           c        x
------------- ----------------  -
3 Number       singular         s        x  x  x
               plural           p        x  x  x
------------- ----------------- -
4 Case         nominative       n           x
               genitive         g           x
               dative           d           x
               accusative       a           x
= ============ ================ =
7. Adverbs (R)
                                Features used by the groups
                                        IT DE ES FR NL EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           general        g              x  x
                 particle       p              x  x
        l-s.     degree         d           x
        l-s.     interrogative  i           x
        l-s.     conjunction    c           x
        l-s.     modal          m           x
        l-s.     pronom         p           x
        l-s.     temporal       t           x
        l-s.     place          l           x
- -------------- -------------- -
2 Degree         positive       p        x  x  x  x     x
                 comparative    c           x  x  x     x
                 superlative    s        x  x  x        x
        l-s.     negative       n                 x
= ============== ============== ==
3 Function       mod                                    x
                 spe                                    x
- -------------- -------------- --
4 Wh-ness        interrogative  q                       x
                 relative       r                       x
                 no             n                       x
- -------------- -------------- --
8. Adpositions (S)
                                Features used by the groups
                                        IT DE ES FR NL EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1  Type          preposition    p        x  x  x  x  x  x
                 postposition   t           x        x  x
                 circumposition c           x
        l-s.     part1          a           x
        l-s.     part2          z           x
- -------------- -------------- -
2 Formation      simple         s        x  x  x
                 compound       c        x  x
= ============== ============== =
3 Gender         masculine      m        x
                 femenine       f        x
                 common         c        x
- -------------- -------------- -
4 Number         singular       s        x
                 plural         p        x
- -------------- -------------- -
9. Conjunctions (C)
                                Features used by the groups
                                        IT DE ES FR NL EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           coordinating   c        x  x  x  x  x  x
                 subordinating  s        x  x  x  x  x  x
      l-spc.     compar         v           x           x
      l-spc.     infinitive     i           x
      l-spc.     part1          a           x
      l-spc.     part2          z           x
= ============== ============== =
2  ctype         finite         f                        x
                 that           t                        x
                 subjunctive    s                        x
- -------------- -------------- -
3 coord-posit.   initial        i                        x
                 non-initial    n                        x
- -------------- -------------- -
10. Numerals (M)
                                Features used by the groups
                                        IT DE ES FR NL EN
= ============== ============== =
P ATT            VAL            C
= ============== ============== =
1 Type           cardinal       c        x     x  x      x
                 ordinal        o        x     x         x
- -------------- -------------- -
2 Gender         masculine      m        x     x  x
                 feminine       f        x     x  x
                 neuter         n
- -------------- -------------- -
3 Number         singular       s        x     x  x
                 plural         p        x     x  x
- -------------- -------------- -
5 Case           nominative     n
                 genitive       g
                 dative         d
                 accusative     a
= ============== ============== =
                                Categories used by the groups
                                        IT DE ES FR NL EN
11. Interjections (I)                    x  x  x  x  x
12. Unique membership class (U)
13. Residual (X)                         x  x     x
14. Particle (Q)                            x
15. Punctuation (F)                      x  x
16. Abreviations (Y)                        x