In MULTEXT , the notation has been chosen following current practices for NLP, where information is represented in attribute-value formalisms and following the idea that it should also be self-informative for human understanding. Considerations concerning the desirability that these descriptions are able to provide information about language-specific characteristics, have also been taken into account. To sum up, the notation format suggested has the following main characteristics:
These characteristics make the proposed lexical notation synonymous with attribute-value pairs used in current unification formalisms (see the D1-6-1B Deliverable for further details).
The linear strings of characters representing the morphosyntactic information to be associated with word-forms are constructed following the philosophy of the Intermediate Format proposed in the EAGLES \ Corpus proposal (Leech and Wilson, 1994), i.e. of having agreed symbols in predefined and fixed positions: the positions of a string of characters are numbered 0, 1, 2, etc. in the following way:
Example: Ncms- (noun, common, masculine, singular, nocase)
This notation adopts the EAGLES Intermediate Format with a small revision: the Intermediate Format encodes information by means of digits, while in MULTEXT characters of a mnemonic nature are preferred.
It is worth noting here that this representation is proposed for word-form lists which will be used for a specific application, i.e. corpus annotation.