next up previous
Next: The data Up: Learning Slovene Declensions with Previous: Learning Slovene Declensions with

Introduction

The Slovene language belongs to the South-Slavic family of languages; with the other Slavic languages it shares a rich system of inflections. Nouns in Slovene exhibit lexically defined syntactic gender and inflect for number and case. The gender distinguished are masculine, feminine and neuter, numbers are singular, plural and dual, and the cases nominative, genitive, dative, accusative, locative and instrumental. This gives us 18 morphologically distinct forms comprising the paradigm of a noun. The situation is complicated, much as in Latin, by nouns belonging to various paradigm classes, of which we have three for the masculine gender, three for feminine and two for neuter. Finally, each paradigm class exhibits alternations, some of which are determined by the morphosyntactic properties of the noun (e.g.,  animacy), some by the morphophonological makeup of the noun, and some being idiosyncratic to the noun in question.

In the scope of the MULTEXT-East project [1] a lexicon for the Slovene language was developed which comprises all the inflectional word-forms of approximately 15.000 lemmas appearing in the corpus of the project, giving a total of more than 500.000 word-forms. Each lexicon entry is composed of three fields, as can be seen in Table 1, which gives the complete paradigm of the lemma golob ('pigeon').

  table32
Table 1: Paradigm of 'pidgeon'

The first field is the word-form, the second the lemma (base form), and the third the morphosyntactic description of the word-form, in the first case above expanding to Noun, common, masculine, singular, nominative.

This paper is concerned with the problem of learning morphological rules for forming particular inflectional forms of nouns given the lemma. The particular form of the singular genitive was chosen as it usually exhibits the greatest number of alternations, and is given in dictionaries as the second leading from, i.e.,  the nominative singular together with the singular genitive is sufficient to determine the correct full paradigm for almost all nouns. Section 2 describes in more detail the data used in our experiments.

FOIDL [4] is used to learn rules for generating the genitive forms of nouns. FOIDL is an inductive logic programming (ILP) [3] system that learns first-order decision lists, i.e.,  ordered lists of clauses. A brief description of FOIDL and its application to learning past tense is given in Section 3.

Section 4 describes the experiments with FOIDL. The induced rules are discussed, as well as their performance on unseen cases. Section 5 concludes with a discussion and some directions for further work.


next up previous
Next: The data Up: Learning Slovene Declensions with Previous: Learning Slovene Declensions with

Tomaz Erjavec