MULTEXT-East Morphosyntactic Specifications, Version 4

3.16.1. Hungarian Introduction

Derivation

Without handling derivation a satisfactory morphological analysis is not possible for Hungarian. The HUMOR system, a general purpose morphological analyzer, handles it and the results can be converted to Multext format since on syntactic level the morphological origin of stems are generally irrelevant. The resulting word class is defined by the rightmost derivational suffix. The suffix characters are literally attached to the word.

Here are the derivations that the analyser recognizes but instead of the origin we place only the resulting class to the output. (Suffix tags used in HUMOR are in upper case, actual suffixes are in lower case.)

Noun -> Adjective

Type Form Example Gloss
BELI beli ház+beli property_of_living_in_the_house
FAJTA fajta más+fajta of_some_other_kind
FELE féle bútor+féle similar_to_furniture
FORMA forma tojás+forma egg_shaped
SZERU szerű tej+szerű milk+y
IKEP i ház+i home_(e.g._made)
SKEP s, as, os, es, ös gyerek+es child+ish
UKEP ú, ű, jú, jű arc+ú (red)-face+d
FFOSZ tlan, tlen, atlan, etlen, talan, telen - 'devoid_of', _'-less'
MER nyi kanál+nyi spoon+full

Noun, A -> Noun

Type Form Example Gloss
COL ság, ség barát+ság friend+ship

Noun -> Noun

Type Form Example Gloss
DIM cska, acska, ecske, öcske, ocska utcá+cska little_street
FEM Kovács+né Mrs._Kovács

Noun -> Verb

Type Form Example Gloss
FI z, az, oz, ez, öz autó+z go_by_car

Verb -> Adjective

Type Form Example Gloss
IFOSZT: atlan, etlen felel+etlen sg_not_being_answered
MIF ó, ő felel+ő sy_who_answers
MIB t, ott, ett, ött felel+t the_answered_(question)
MIA andó, endő felel+endő sg_that_should be_answered
NIVALÖ anivaló, enivaló, nivaló néz+nivaló sg_that_should_be_seen

Verb -> Adverb

Type Form Example Gloss
HIN va, ve olvas+va (while)_reading_(the_book)

Numeral -> Adjective

Type Form Example Gloss
KIEM ik hatod+ik six+th
LAGOS lagos, leges másod+lagos second+ary

Verb -> Noun

Type Form Example Gloss
IF ás, és olvas+ás read+ing_(gerund)
DES hatnék, hetnék olvas+hatnék the_intention_of_reading

Adj -> Verb

Type Form Example Gloss
FAK ít szép+ít make_it_pretty_(in_compounds_only)
MI od, ed, öd vállas+od+ik becomes_strong
MIGY kod, ked, köd okos+kod+ik plays_the_smart_(frequently)

Verb -> Verb

Type Form Example Gloss
MUV at, et, tat tet_olvas+tat makes_him_read
GYAK gat, get, ogat, eget, öget olvas+gat he_reads_frequently
HAT hat, het olvas+hat he_may_read
VISSZ ód, őd old+ódik dissolves
SZENV tatik, tetik olvas+tatik makes_the_book_being_read


If 'szemetelés' (littering, action of throwing away litter) is not in the dictionary we derive it from the verb 'litter' szemetel[V] + és[IF] (where IF=Verb2Noun).

Instead of giving the extra attribute to the verb expressing that it has a derivational suffix we simply give the result of the analysis+conversion: szemetelés[N].

In Hungarian some derivation may follow the inflectional suffix. For these derivations the suffix+derivation together forms a compound derivation. Then a new stem is generated from the stem + inflection + derivation segments and the resulting part of speech is determined by the derivation.

Type Form Example Gloss
Nc-sn--ns1-+FAM ék apá+m+ék some_people_with_my_father
Nc-su--u---+IKEP i asztal+onként+i sg._done_by_each_every_table
Afc-sn--n----+KIEM ik nagy+obb+ik the_bigger_one

There are adverbs that may get case endings. Since case inflections derive adverbs from nouns these constructions can be handled as derivations. That means the stem is the stem + inflection combination and the part of speech is adverb.

Type Form Example Gloss
Ag----+ablativ tól akkor+tól since_then Compounding

Compounding is handled in a very similar way to derivation. The rightmost word class is always the resulting one. If it contains some derivation as well then the result is the word class that the derivation determines.


  1. Hungarian is similar to German but we have a competence limit: we do not put together more then 2 nouns. These two words might be compounds as well but they must be lexicalized forms. So e.g.: rendőregyenruha (police uniform) is valid since it is put together from rendőr (police) and egyenruha (uniform) where rendőr=rend (order) + őr (guard) and egyenruha=egyen (uni) + ruha (clothes).
  2. Of course the situation is more complicated since nouns can be derived from verbs, adjectives or other nouns, and these derivations can be parts of compounds: e.g. autóbuszmegálló (bus stop) where meg+áll+ó is derived from the verb áll (stand). Whereas megálló is lexicalized so dictionaries might contain it, the morphological rule is productive so compounding works well for constructions like tandíjbefizetés (paying the tution fee)= tan(tution)+ díj(fee)+ be(in) +fizet(pay) +és(ment).
  3. Compounding, such as derivation, will not be mirrorred in the Multext format morphological analysis; a converter will produce an acceptable segmentation in the corpus. The word class (POS) for these compounds will be the word class of the output compound (e.g. tandíjbefizetés/N 'the action of paying in the tuition fee').

Date: 2010-05-12
This work is licensed under the Creative Commons licence Attribution-ShareAlike 3.0.