Dan Tufis & al * Corpora and Corpus-Based Morpho-Lexical Processing
Eliminating neuter gender from the lexicon encoding needs a few explanations. Traditionally, grammar books distinguish in Romanian three genders: masculine, feminine and neuter. However there are few reasons - if any - not to get rid of neuter value and consider a simpler dual gender system. From the inflectional point of view, neuter nouns/adjectives behave in singular as masculine nouns/adjectives and in plural as feminine ones. Since there is no intrinsic semantic feature specific to neuter nouns (inanimacy is by no means specific to neuter nouns, existing plenty of feminine and masculine nouns denoting inanimate things) preserving the distinction masculine/feminine/neuter creates more problems than it solves [13]. Due to the agreement rules, adjectives can take masculine, feminine and neuter gender. At the lexicon level, this would require to add about 33% more entries in the case of adjectives. At the lookup level, considering only gender, any adjective would be two way ambiguous (masculine/neuter in singular and feminine/neuter in plural). However, it is worth mentioning that if needed, the neuter nouns or adjectives can be easily identified: those nouns/adjectives that are tagged with masculine gender in singular and with feminine gender in plural are what the traditional Romanian linguistics calls neuter nouns/adjectives.
The tables on the next pages present
the MSD encoding for Romanian in the following format:
The category code is an uppercase
letter (N, V, A, P, D, T, R, S, C, M, Q, Y, I, X) identifying
one of the 14 parts of speech considered in the MULTEXT-EAST encoding
schema. Any MSD code will begin with one of these 14 letters.
The Attribute Position column specifies the position in
the linear MSD encoding of the attribute, the name of which is
given in the Attribute column. The Value
column contains the allowable values of the current attribute. Their
codes, given between parentheses, may appear in an MSD headed
by the appropriate category code, at the position specified by
the Attribute Position column. This linear encoding
is a relatively efficient and compact way to represent the flat
attribute-value matrices.
32
Attribute
PositionAttribute
Value
Example
Attribute
PositionAttribute
Value Example
1
Type
common (c) carte
proper (p) Ion
2
Gender
masculine (m) bãiatul
feminine (f) casa
3
Number
singular (s) fatã
plural (p) fete
4
Case
direct (r) omul
oblique (o) omului
vocative (v) omule
5
Definiteness
yes (y) omul
no (n) om
6
Clitic
no (n) soran
yes (y) soruy-mea