MULTEXT-East Morphosyntactic Specifications, Version 4
1.3. Notation
Up: Contents Previous: 1.2. Lexicons Next: 1.4. Organisation of the language-specific chapters
In MULTEXT notation the
information is represented in an attribute-value formalisms and following the idea that it
should also be self-informative for human understanding. At the same time, a relativelly
compact encoding was maintained. The notation format of the Morphosyntactic Descriptions (MSDs)
has the following main
characteristics:
-
attributes are marked by positions;
-
values are represented by a single character;
-
a special marker reflects the non applicability of a given attribute.
These characteristics make the proposed MSD notation similar to attribute-value
pairs used in unification based formalisms (see the MULTEXT D1-6-1B
Deliverable [mt:D161B] for further details).
The linear strings of characters representing the morphosyntactic descriptions are
constructed following the philosophy of the Intermediate Format proposed in the Eagles
Corpus proposal [
eagles:morphana], i.e. of having
agreed symbols in predefined and fixed positions: the positions of a string of
characters are numbered 0, 1, 2, etc. in the following way:
-
the agreed character at position 0 encodes part-of-speech;
-
each character at position 1, 2, n, encodes the value of one attribute (person,
gender, number, etc.);
-
if an attribute does not apply, the corresponding position in the string
contains a special marker, the hyphen (‘-’).
Example: Ncms- (Noun, common, masculine, singular, nocase)
This notation adopts the Eagles Intermediate Format with a small revision: the
Intermediate Format encodes information by means of digits, while in MULTEXT characters
of a mnemonic nature are preferred.
The marker ‘-’ has a special semantics and means 'not-applicable'. It is used in
the following cases:
-
not relevant to a particular language, e.g. Gender to Estonian;
-
not applicable to a particular combination of attribute-values,
i.e. although the attribute is used by a category in a given language it does not
apply to a particular subclass of the category; e.g., Person applies to Pronouns,
but not to the Type demonstrative;
-
not applicable to a particular lexical item, i.e. although the
attribute applies to the rest of its paradigm, e.g., Gender in the paradigm of
English Personal Pronouns applies only to the 3rd person, I, you vs.
she, he).
It should also be noted that in the MSDs trailing hyphens have been omitted, as
this leads to a more compact encoding. Hence codes like Ncms- are written
as Ncms.
Up: Contents Previous: 1.2. Lexicons Next: 1.4. Organisation of the language-specific chapters