3.16.1.1. Derivation
Without handling derivation a satisfactory morphological analysis is
not possible for Hungarian. The HUMOR system, a general purpose
morphological analyzer, handles it and the results can be converted to
Multext format since on syntactic level the morphological origin of
stems are generally irrelevant. The resulting word class is defined by
the rightmost derivational suffix. The suffix characters are literally
attached to the word.
Here are the derivations that the analyser recognizes but instead of
the origin we place only the resulting class to the output. (Suffix
tags used in HUMOR are in upper case, actual suffixes are in lower
case.)
Noun -> Adjective
|
Type |
Form |
Example |
Gloss |
|
BELI |
beli |
ház+beli |
property_of_living_in_the_house |
|
FAJTA |
fajta |
más+fajta |
of_some_other_kind |
|
FELE |
féle |
bútor+féle |
similar_to_furniture |
|
FORMA |
forma |
tojás+forma |
egg_shaped |
|
SZERU |
szerű |
tej+szerű |
milk+y |
|
IKEP |
i |
ház+i |
home_(e.g._made) |
|
SKEP |
s, as, os, es, ös |
gyerek+es |
child+ish |
|
UKEP |
ú, ű, jú, jű |
arc+ú |
(red)-face+d |
|
FFOSZ |
tlan, tlen, atlan, etlen, talan, telen |
- |
'devoid_of', _'-less' |
|
MER |
nyi |
kanál+nyi |
spoon+full |
Noun, A -> Noun
|
Type |
Form |
Example |
Gloss |
|
COL |
ság, ség |
barát+ság |
friend+ship |
Noun -> Noun
|
Type |
Form |
Example |
Gloss |
|
DIM |
cska, acska, ecske, öcske, ocska |
utcá+cska |
little_street |
|
FEM |
né |
Kovács+né |
Mrs._Kovács |
Noun -> Verb
|
Type |
Form |
Example |
Gloss |
|
FI |
z, az, oz, ez, öz |
autó+z |
go_by_car |
Verb -> Adjective
|
Type |
Form |
Example |
Gloss |
|
IFOSZT: |
atlan, etlen |
felel+etlen |
sg_not_being_answered |
|
MIF |
ó, ő |
felel+ő |
sy_who_answers |
|
MIB |
t, ott, ett, ött |
felel+t |
the_answered_(question) |
|
MIA |
andó, endő |
felel+endő |
sg_that_should be_answered |
|
NIVALÖ |
anivaló, enivaló, nivaló |
néz+nivaló |
sg_that_should_be_seen |
Verb -> Adverb
|
Type |
Form |
Example |
Gloss |
|
HIN |
va, ve |
olvas+va |
(while)_reading_(the_book) |
Numeral -> Adjective
|
Type |
Form |
Example |
Gloss |
|
KIEM |
ik |
hatod+ik |
six+th |
|
LAGOS |
lagos, leges |
másod+lagos |
second+ary |
Verb -> Noun
|
Type |
Form |
Example |
Gloss |
|
IF |
ás, és |
olvas+ás |
read+ing_(gerund) |
|
DES |
hatnék, hetnék |
olvas+hatnék |
the_intention_of_reading |
Adj -> Verb
|
Type |
Form |
Example |
Gloss |
|
FAK |
ít |
szép+ít |
make_it_pretty_(in_compounds_only) |
|
MI |
od, ed, öd |
vállas+od+ik |
becomes_strong |
|
MIGY |
kod, ked, köd |
okos+kod+ik |
plays_the_smart_(frequently) |
Verb -> Verb
|
Type |
Form |
Example |
Gloss |
|
MUV |
at, et, tat |
tet_olvas+tat |
makes_him_read |
|
GYAK |
gat, get, ogat, eget, öget |
olvas+gat |
he_reads_frequently |
|
HAT |
hat, het |
olvas+hat |
he_may_read |
|
VISSZ |
ód, őd |
old+ódik |
dissolves |
|
SZENV |
tatik, tetik |
olvas+tatik |
makes_the_book_being_read |
Examples:
If 'szemetelés' (littering, action of throwing away litter) is not in
the dictionary we derive it from the verb 'litter' szemetel[V] + és[IF]
(where IF=Verb2Noun).
Instead of giving the extra attribute to the verb expressing that it has
a derivational suffix we simply give the result of the analysis+conversion:
szemetelés[N].
In Hungarian some derivation may follow the inflectional suffix. For
these derivations the suffix+derivation together forms a compound
derivation. Then a new stem is generated from the
stem + inflection + derivation segments and the resulting part of speech
is determined by the derivation.
|
Type |
Form |
Example |
Gloss |
|
Nc-sn--ns1-+FAM |
ék |
apá+m+ék |
some_people_with_my_father |
|
Nc-su--u---+IKEP |
i |
asztal+onként+i |
sg._done_by_each_every_table |
|
Afc-sn--n----+KIEM |
ik |
nagy+obb+ik |
the_bigger_one |
There are adverbs that may get case endings. Since case inflections
derive adverbs from nouns these constructions can be handled
as derivations. That means the stem is the stem + inflection combination
and the part of speech is adverb.
|
Type |
Form |
Example |
Gloss |
|
Ag----+ablativ |
tól |
akkor+tól |
since_then |
3.16.1.2. Compounding
Compounding is handled in a very similar way to derivation. The
rightmost word class is always the resulting one. If it contains some
derivation as well then the result is the word class that the
derivation determines.
Examples:
-
Hungarian is similar to German but we have a competence limit:
we do not put together more then 2 nouns. These two words might be
compounds as well but they must be lexicalized forms. So e.g.:
rendőregyenruha (police uniform) is valid since it is put together
from rendőr (police) and egyenruha (uniform) where rendőr=rend
(order) + őr (guard) and egyenruha=egyen (uni) + ruha (clothes).
-
Of course the situation is more complicated since nouns can be
derived from verbs, adjectives or other nouns, and these derivations
can be parts of compounds: e.g. autóbuszmegálló (bus stop) where
meg+áll+ó is derived from the verb áll (stand). Whereas
megálló is lexicalized so dictionaries might contain it, the
morphological rule is productive so compounding works well for
constructions like tandíjbefizetés (paying the tution fee)=
tan(tution)+ díj(fee)+ be(in) +fizet(pay) +és(ment).
-
Compounding, such as derivation, will not be mirrorred in the
Multext format morphological analysis; a converter will produce an
acceptable segmentation in the corpus. The word class (POS) for
these compounds will be the word class of the output compound
(e.g. tandíjbefizetés/N 'the action of paying in the tuition fee').