5.6.1 Multext morphology formalism
The main reason for having morphology is to facilitate maintenance of
lexical lists, both during the project and afterwards. It follows that
the rule formalism for morphology itself should not be a source of
complexity. The designers of the Multext morphology tool have
therefore decided to use fairly common, well-known methods, avoiding
any adventurous modernism.
The tool has two parts: morphosyntax and morphographemics.
For morphosyntax, a user-friendly version of context-free grammar is
used. The friendlyness comes with the possibility to annotate rules
with features, and to set features to the same values through
variables.
For morpho-graphemics, a version of two-level morphology is used.
The system can be used to generate a word list from an input word
list, as well as to look words up in a given word list.
The full description and detail of the rule formalism are given in the
report on the Multext morphology tool (Report nr. 2.3.1B) and the
manuals accompanying the tool. Extensive exemplification can be found
in the report on Multext morphology resources, Report nr. 5.3.1B.
5.6.2 Dutch word classes
The Dutch word classification used for Multext approximates as closely
as possible the proposal in Deliverable A, section 1.6.1.
It can be summarized best in terms of (the relevant selection from)
the types and attributes used in the actual Dutch description (Report
5.3.1B, section on Dutch).
There are 10 word class types:
V : Vtype Vform Person Number Tense
N : Ntype Semgender Gender Number
A : Inflected Degree
Adp : AdpType
Det : DetType Number Gender Defness
Pron : PronType Number Defness Person Semgender Case
Adv : Nil
Num : Nil
Conj : ConjType
Interj : Nil
where:
Vtype : Main Aux Copula Impersonal
Tense : Pres Past
Person : 1 2 3
Number : Sg Pl
Vform : Inf ImPart PerfPart Fin
Ntype : Common Proper
Semgender : M F N
Gender : De Het none
Degree : Pos Compar Super
Inflected : 0 1
AdpType : Post Pre
DetType : Article Quantificational Possessive Demonstrative
Defness : Def Indef
PronType : Reciprocal Reflexive Personal Relative Demonstrative
Quantificational Interrogative
Case : 1 4
ConjType : Coord Subord
Dutch is not a morphologically rich language. A distinction that it
makes which is different from most other Multext languages is that
between `syntactic' and `semantic' gender. A good example is `meisje'
(girl). The syntactic gender is `het' (`het meisje' *`de meisje') but
the semantic gender is female (`het meisje(i) dacht dat
ze(i)/*hij(i)/??het een jongetje was'). It can be seen in the example
that articles agree with their nouns in syntactic gender and that
pronouns (usually) agree with their antecedents in semantic gender.
For the rest, the distinctions given differ from other languages
mainly in what Dutch does not express.
The distinction between pronouns and determiners has been implemented
as follows:
for any X that could be either a det or a pron:
if X distributes like NP, then X is a pronoun;
if X distributes like Det (i.e. NP-initial), then X is a determiner
It is hoped that this is a good starting point for tagging but this
remains to be seen. As a consequence, a word like `mijn' which is
often called a pronoun is analysed as a determiner here.
Decisions like this one on function words can easily be changed; e.g.
in the lexicon supplied for Dutch (Report nr. 5.4.1B), 59 words are
classified as pronouns and 27 as determiners.