Application to Dutch

[Next] [Up] [Previous] [Contents]
Next: References Up: Language specific applications Previous: Application to English
Application to Dutch

5.6.1 Multext morphology formalism


The main reason for having morphology is to facilitate  maintenance of
lexical lists, both during the project and afterwards. It follows that
the rule  formalism  for morphology itself  should not be a  source of
complexity.   The  designers  of  the  Multext  morphology  tool  have
therefore decided to use fairly common,  well-known methods,  avoiding
any adventurous modernism.

The tool has two parts: morphosyntax and morphographemics.

For morphosyntax,  a  user-friendly version of context-free grammar is
used.  The friendlyness comes with the possibility to  annotate  rules
with  features,  and  to  set  features to  the  same  values  through
variables.

For morpho-graphemics, a version of two-level morphology is used.

The system can be used  to generate  a word  list  from  an input word
list, as well as to look words up in a given word list.

The full description and detail of the rule formalism are given in the
report on the Multext morphology  tool  (Report nr.  2.3.1B)  and  the
manuals accompanying the tool.  Extensive exemplification can be found
in the report on Multext morphology resources, Report nr. 5.3.1B.


5.6.2 Dutch word classes

The Dutch word classification used for Multext approximates as closely
as possible the proposal in Deliverable A, section 1.6.1.

It  can be summarized best  in terms of  (the relevant selection from)
the types and attributes used in the actual Dutch description  (Report
5.3.1B, section on Dutch).

There are 10 word class types:

V              : Vtype Vform  Person Number Tense
N              : Ntype Semgender Gender Number
A              : Inflected Degree
Adp            : AdpType
Det            : DetType Number Gender Defness
Pron           : PronType Number Defness Person Semgender Case
Adv            : Nil
Num            : Nil
Conj           : ConjType
Interj         : Nil

where:

Vtype          : Main Aux Copula Impersonal
Tense          : Pres Past
Person         : 1 2 3
Number         : Sg Pl
Vform          : Inf ImPart PerfPart Fin
Ntype          : Common Proper
Semgender      : M F N
Gender         : De Het none
Degree         : Pos Compar Super
Inflected      : 0 1
AdpType        : Post Pre
DetType        : Article Quantificational Possessive Demonstrative
Defness        : Def Indef
PronType       : Reciprocal Reflexive Personal Relative Demonstrative
                 Quantificational Interrogative
Case           : 1 4
ConjType       : Coord Subord


Dutch is not a morphologically rich  language.  A distinction that  it
makes which is  different from most  other Multext  languages  is that
between `syntactic'  and `semantic' gender. A good example is `meisje'
(girl). The syntactic gender is `het'  (`het meisje' *`de meisje') but
the   semantic   gender   is   female   (`het  meisje(i)   dacht   dat
ze(i)/*hij(i)/??het een jongetje was').  It can be seen in the example
that  articles  agree  with their nouns  in  syntactic gender and that
pronouns (usually) agree with their antecedents in semantic gender.

For  the rest,  the  distinctions  given  differ from other  languages
mainly in what Dutch does not express.

The distinction between pronouns and determiners  has been implemented
as follows:

for any X that could be either a det or a pron:
if X distributes like NP, then X is a pronoun;
if X distributes like Det (i.e. NP-initial), then X is a determiner

It is hoped that this  is a good starting  point for tagging but  this
remains to  be seen.  As a consequence,  a word  like `mijn'  which is
often called a pronoun is analysed as a determiner here.

Decisions like this one on function words can easily be changed;  e.g.
in the lexicon supplied for Dutch  (Report nr.  5.4.1B),  59 words are
classified as pronouns and 27 as determiners.
Multext