MULTEXT-East Morphosyntactic Specifications, Version 4

1.3. Notation

Up: Contents Previous: 1.2. Lexicons Next: 1.4. Organisation of the language-specific chapters

In MULTEXT notation the information is represented in an attribute-value formalisms and following the idea that it should also be self-informative for human understanding. At the same time, a relativelly compact encoding was maintained. The notation format of the Morphosyntactic Descriptions (MSDs) has the following main characteristics:

These characteristics make the proposed MSD notation similar to attribute-value pairs used in unification based formalisms (see the MULTEXT D1-6-1B Deliverable [mt:D161B] for further details).

The linear strings of characters representing the morphosyntactic descriptions are constructed following the philosophy of the Intermediate Format proposed in the Eagles Corpus proposal [eagles:morphana], i.e. of having agreed symbols in predefined and fixed positions: the positions of a string of characters are numbered 0, 1, 2, etc. in the following way:

Example: Ncms- (Noun, common, masculine, singular, nocase)

This notation adopts the Eagles Intermediate Format with a small revision: the Intermediate Format encodes information by means of digits, while in MULTEXT characters of a mnemonic nature are preferred.

The marker ‘-’ has a special semantics and means 'not-applicable'. It is used in the following cases:

It should also be noted that in the MSDs trailing hyphens have been omitted, as this leads to a more compact encoding. Hence codes like Ncms- are written as Ncms.

Up: Contents Previous: 1.2. Lexicons Next: 1.4. Organisation of the language-specific chapters



Date: 2010-05-12
This work is licensed under the Creative Commons licence Attribution-ShareAlike 3.0.