La'szlo' Tihanyi, 04-30-1996.
Although working morphological analyzers do exist in Hungarian, the number of possible word forms still remains an estimation. The main reason is that Hungarian differs from other languages in that the number of word forms on the boarder between the acceptable and unacceptable is quite big as it will be exemplified below. The morphological systems solve the problem with overgenaration. In the following we try to give a more accurate estimation on possible word forms.
First we estimate the total number and than see an example and its concrete figures on one word.
1. Total number of word forms in Hungarian.
The calculation uses a number of words that can be found in the dictionary of the HUMOR morphological analyzer.
A verbs: 9.400 (in dictionary) B verbal prefix: 50 (estimated average) C prefixed verbs: 147.000 (B*A) D v to v derivations: 14 (estimated average) E v to n derivations: 29 (estimated average) F verbs derived from verbs: 2.205.000 (C+D*C) G nouns derived from verbs: 4.263.000 (C*E) H nouns: 50.000 (in dictionary) I adjectives: 10.800 (in dictionary) J total nominals 60.800 (I+J not counted numerals) K n to n derivations: 91 (estimated average) L n to v derivations: 72 (estimated average) M verbs derived form noms: 4.337.600 (L*J) N nouns derived from noms: 5.532.800 (K*J) O others (adverbs etc.): 2.300 (in dictionary) P verbs total 6.552.000 (A+F+M) Q nominals total: 9.845.800 (H+G+N) W verbal inflections 59 (in dictionary) Z nominal inflections 924 (in dictionary) R verbal inflection combinations:386.568.000 (W*P) S nominal infl. combinations: 9.096.780.000 (Z*Q) T grand total: 9.483.348.000 (R+S+O) U grand total with clitics 18.966.396.000 (2*T)
That is about 20 billion word forms.
2. Estimation by an example
As we can see, B, D, E, K, L are averages, concrete numbers can be different. We take the verb 'ver' which means 'beat', 'hit', 'defeat', 'whack' or 'lap' as an example.
First we estimate the number of prefixed forms of the example verb. Here we can find 53 prefixed forms (out of the 81) that are meaningful. The main problem is that it is impossible to say which morphological construction is acceptable and which is ungrammatical, the boarder is fuzzy. It is very likely that different speakers would make different judgements on the good and bad forms in the following list.
Good forms are glossed with English explanation and bad ones are signed with the * .
agyonver slay (by hitting) ala'ver hit_under ala'bbver hit_lower be-bever hit_in (several times) bele-belever hit_in (several times) belever hit_in bele'ver hit_in bever hit_in el-elver whack (several times) elver whack fel-felver awake (several times) felver awake fe'lrever ring_a_bell fo:lver awake fo:le'ver hit_above hazaver chase_home helyrever mend_by_hitting hozza'ver hit_against ha'traver hit_back idever hit_here keresztbever hit_across keresztu:lver beat_trough kette'ver hit_to_fall_apart ki-kiver whack (several times) kiver whack (a child's bottom) ko:rbever hit_around ko:ru:lver hit_around ko:zbever hit_in_between ko:ze'ver hit_in_between le-lever beat (several times) lever beat (an army) meg-megver defeat (several times) megver defeat (in sport) melle'ver hit_close_to (a spike) mo:ge'ver hit_behind nekiver hit_against odaver hit_against rea'ver hit_on ra'ver hit_on sze'jjelver defeat (an army) sze'tver defeat (an army) tova'bbver countinues_the_hitting to:nkrever beat (with big difference) tu'lver hit_more_than_needed uta'naver hit_again_to_fix vissza-visszaver repel (several times) visszaver repel (an) ve'gigver hit_along o:sszever beat (in fight) a'tver mislead u'jraver hit_again telever nail/spike_full (the surface/area) teliver nail/spike_full (the surface/area) *abbaver, *alulver, *bennver, *egybever, *egyu:ttver, *ele'ver *elo"rever, *elo"ver, *felu:lver, *fennver, *fe'lbever, *fo:lu:lver *fo:l-lever, *fo:nnver, *ki-bever, *ku:lo:nver, *ko:zrever, *rajtaver *szembever, *szertever, *tovaver, *utolver, *viszontver, *ve'gbever *ve'ghezver, *ve'grever, *a'ltalver, *u'jja'ver
The number of verbal derivations also varies from word to word. The system again enables all combinations (89) but we found only 50 'good' ones for the verb 'ver'.
Here again it is not possible to rule out ill formed derivations since the boarder is fuzzy, and the decision on them can be good, possible, unlikely and bad.
Here we have 9 'verb to verb' derivations (marked with V in front of the English explanation) and 40 'verb to noun' (marked with N or A).
vereget V hit (several times) veregethet V is alowed to hit several times veregetheto" A can be hit several times veregetheto"bb A can be beaten more than others several times veregetheto"se'g N the possibility of hitting several times veregete's N the action of hitting (several times) veregete'si A have connection with the hitting (several times) veregeto" N somebody who hits (several times) verendo" A somebody who should be beaten verendo"bb A somebody who should be beaten rather than somebody else legverendo"bb A somebody who should be beaten most veret V make somebody beaten verethet V allow somebody to make somebody else be beaten veretheto" A allowed to be beaten veretheto"se'g N the state of being allowed to be beaten veretlen A has not beaten so far veretlenebb A having less defeat than somebody else legveretlenebb A having the least defeat verete's N the action of beat (on a ...) vereto" A the man who made somebody else to beat others verhet V allowed to beat verhetetlen A unbeatable verhetetlenebb A more unbeatable than others legverhetetlenebb A most unbeatable verhetetlense'g N the state of being unbeateble verheto" A can be beaten verheto"bb A can be betean more easily than others verheto"se'g N the state of being beatable verheto"se'gi A have connection with the state of being beatable vernivalo' A a something that is to beat vert A somebody who is beaten vertebb A somebody who is beaten more than others vertse'g N the state of being beaten vere's N the action of beating vere'ses A have beats vere'si A have connection with beat vere'snyi A measure of one beat vere'su" A have some kind of stamp (coin) vero" A somebody who beats vero"dik V laps against something () vero"dget V laps several times vero"dgethet V allowed to lap against several times vero"dgete's N the action of lapping several times vero"dgeto" N something that laps several times vero"dhet V is allowed to lap several times vero"de's N the action of lapping against vero"de'si A is in connection with lapping vero"do" A lapping vero"do:tt A has been lapped *veregethete's, *veregethete'si, *veregetheto"i, *veregeto"i *verendo"se'g, *verethete's, *verethete'si, *veretheto"i *verete'si, *vereto"i, *verhetetlense'geskede's, *verhete'si, *verheto"i *vere'sesed, *vere'seskedhetne'k, *vere'sesse'g, *vere'sesi't, *vere'sesi'te's *vere'sesi'to", *vero"dgethete's, *vero"dgethete'si, *vero"dgetheto", *vero"dgetheto"i *vero"dgete'si, *vero"dgeto"i, *vero"dhete's, *vero"dhete'si, *vero"dheto" *vero"do"i, *vero"leges, *vero"legesse'g, *vero"se'g, *vero"s *vero"i, *vero"ibb, *veretlenedik, *verhete's, *vere'sesebb, *vero"bb
2.3.1 Verbal inflections
The verbs may have 59 inflections in Hungarian.
3(Person)*2(Number)*4(Present indicative, Present imperativ, Present conditional, Past indicative)OS/2(Transitivity) +4(1s2s)+7(Infinitive)=59. This number is constant for every verb.
The list of inflected forms of the example verb 'ver':
verek,versz,ver,veru:nk,vertek,vernek verem,vered,veri,verju:k,veritek,verik vertem,verte'l,vert,vertu:nk,vertetek,vertek vertem,verted,verte,vertu:k,verte'tek,verte'k verne'k,verne'l,verne,verne'nk,verne'tek,verne'nek verne'm,verne'd,verne',verne'nk,verne'tek,verne'k verjek,verje'l,verjen,verju:nk,verjetek,verjenek verjem,verjed,verje,verju:k,verje'tek,verje'k verlek,vertelek,verne'lek,verjelek verni,vernem,verned,vernie,vernu:nk,vernetek,verniu:k
There are 6 further adverbs that are derived from verbs, but here we do not count them since they are infrequent.
vertedben, vertemben, vertetekben, vertu:kben, vertu:nkben, verte'ben
2.3.2. Nominal inflections
The nouns in Hungarian may have 924 inflections.
This list already contains elements that one can find strange at least but all of them are grammatical.
See one example on the derived form 'vere's' (N beat) in Appendix 3.
In Hungarian we have only one clitic, the '-e' question word which may follow any Hungarian word. So the final number should be multiplied by two.
2.5. Total number for the example word
Now we can calculate the actual numbers for the verb 'ver':
For this we assume that all prefixed forms may have all derivations.
C prefixed forms: 53 see 2.1. D v to v derivations: 9 see 2.2. E v to n derivations: 40 see 2.2. F verbs derived from verbs: 540 (C+1)*(D+1) G nouns derived from verbs: 2160 (C+1)*(E+1) W verbal inflections 59 see 2.3.1 Z nominal inflections 924 see 2.3.2 R verbal inflection combinations: 31.860 (W*F) S nominal infl. combinations: 1.995.840 (Z*G) T Total1 2.027.700 (R+S) U Total forms with clitics 4.055.400 (2*T)
So we have more than 4 million forms for a single verb. This is why we cannot supply any kind of list type dictionary. The dictionary containing only one word is bigger than what can be printed or handled by computer programs expecting a word form list of a language (If we have a clitic preprocessor like in MULTEXT project than the number is only 2 million forms per verbs).
The actual number is much bigger than the above calculated 2 million because we have compoundation. But it is out of our sight even for estimations.
These are the forms from the Explanatory Dictionary of Hungarian:
csapravere's,csipkevere's,csordakivere's,dio'vere's,hulla'mvere's hi'dvere's,hi'rvere's,istenvere's,je'gvere's,ko:te'lvere's,ka'rtyakevere's pe'nzvere's,szi'vvere's,sa'torvere's,e'rvere's,a'rvere's
but compoundation being fairly productive, an indefinite number of new forms can be idiosyncratically generated.