next up previous
Next: References Up: Learning Slovene Declensions with Previous: Experiments and results

Conclusions

We have presented the results of an initial attempt to learn rules for generating inflectional forms of Slovene nouns. In particular, FOIDL was applied to learn rules for the genitive form of proper and common nouns in singular, for each of the three genders separately. Taking into account that FOIDL was given very limited background knowledge, the results obtained are quite satisfactory: the accuracy of the induced rules for the genitive singular is approximately 99% for the feminine, 95% for the neuter, and 85% for the masculine gender. The errors can be traced to two causes, both having to do with insufficient information available to FOIDL.

First, while Slovene orthography is much closer to the phonological form of a word than it is in English, it is nevertheless sometimes not enough to predict whether a certain phonologically determined alternation should take place or not. For example, the -e- elision should take place only where the e is a schwa and this cannot be predicted from the spelling alone -- compare e.g.,  angel/angela and triangel/triangla. To cover such cases, a phonological representation has to be substituted for, or added to the orthographic one. Furthermore, the background knowledge of FOIDL could then be extended to take phonological regularities into account, by e.g.,  distinguishing vowels from consonants etc., thus leading to better generalizations.

Second, FOIDL makes use only of the form of the lemma. For Slovene, this is insufficient, as additional background knowledge on the content of the lemma is necessary in order to correctly generate word-forms. In particular, lexical morphosyntactic information must be incorporated into the lexicon and made use of by the induction algorithm. The most obvious example is to add the paradigm class to the lexical entries, but other necessary information includes animacy for masculine nouns and the origin of the noun, i.e.,  whether it is of native or foreign origin.

Given the preliminary nature of the work presented here, many other directions for further work can be pointed out. As regards the induction methodology, at least two improvements of FOIDL seem to be needed. Efficiency seems to be a major problem, limiting the size of training sets that can be considered to approximately 500 examples. Post-processing of the induced decision lists is also needed in order to remove irrelevant literals.

On a larger scale, the work presented here should be extended to complete paradigms, i.e.,  rules should be learned for forming all the 17 oblique forms of Slovene nouns (from the base form) and to cover other major categories of words. An additional and useful extension would be to learn rules for morphological analysis, rather than synthesis. These would generate e.g.,  the nominative form of a noun from its genitive form. Finally, it would be interesting to consider the discovery of morphological phenomena in a more modular fashion: instead of simply generating/analyzing a form with one type of clauses, the morphological phenomena of morphotactics and morphophonology could be considered separately.

Acknowledgements

 

This work was supported in part by the ESPRIT IV project 20237 ILP2.


next up previous
Next: References Up: Learning Slovene Declensions with Previous: Experiments and results

Tomaz Erjavec