COP project 106 MULTEXT-East Deliverable D1.1 M --- Slovene
The application to Slovene has been elaborated by Tomaz Erjavec and Peter Holozan.
Acknowledgements:
The authors thank Velimir Gjurin, France Zagar,
Vladimír Petkevic, Lydia Sinapova and
David Stermole for their much appreciated comments and suggestions.
All errors of course remain our own.
All Slovene diacritical characters used have been encoded in the following way:
1. Noun (N) 1.1 Lexicon = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type common c stol obc<i samostalnik proper p Janez lastni samostalnik - -------------- -------------- - ----------------- ------------------------- 2 Gender masculine m stol mos<ki spol feminine f miza z<enski spol neuter n sonce srednji spol - -------------- -------------- - ----------------- ------------------------- 3 Number singular s stol ednina plural p stoli mnoz<ina dual d stola dvojina - -------------- -------------- - ----------------- ------------------------- 4 Case nominative n stol, medved imenovalnik genitive g stola, medveda rodilnik dative d stolu dajalnik accusative a stol, medveda toz<ilnik locative l stolu mestnik instrumental i stolom orodnik * ***************************** * ----------------- ------------------------- 5 Definiteness - - -------------- -------------- - 6 Clitic - - -------------- -------------- - 7 Animate - - -------------- -------------- - 8 Owner_Number - - -------------- -------------- - 9 Owner_Person - ---------------- -------------- - 10Owned_Number - ================================= 1.2 Combinations *** **** **** **** **** ===================================================== PoS Type Gend Numb Case Examples *** **** **** **** **** ===================================================== N c n s n iskanje ('gerund' classified a Noun) N p [mf] s n Janez, Micka N c [mfn] s n stol, lipa, sonce N c [mfn] s g stola, lipe, sonca N c [mfn] s d stolu, lipi, soncu N c [mfn] s a stol, lipo, sonce N c [mfn] s l stolu, lipi, soncu N c [mfn] s i stolom, lipo, soncem N c [mfn] p n stoli, lipe, sonca N c [mfn] p g stolov, lip, sonc N c [mfn] p d stolom, lipam, soncem N c [mfn] p a stole, lipe, sonca N c [mfn] p l stolih, lipah, soncih N c [mfn] p i stoli, lipami, sonci N c [mfn] d n stola, lipi, sonci N c [mfn] d g stolov, lip, sonc N c [mfn] d d stoloma, lipama, soncema N c [mfn] d a stola, lipi, sonci N c [mfn] d l stolih, lipah, soncih N c [mfn] d i stoloma, lipama, soncem *** **** **** **** **** ===================================================== 2. Verb (V) 2.1 Lexicon = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type main m delati glavni glagol auxiliary a imeti pomoz<ni glagol modal o hoteti naklonski glagol copula c biti pomoz<nik 'biti' - -------------- -------------- - ----------------- ------------------------- 2 VForm indicative i delam sedanjik imperative m delaj velelnik conditional c bi pogojnik od 'biti' infinitive n delati nedoloc<nik participle p delal,(po)delan delez<nik supine u delat namenilnik - -------------- -------------- - ----------------- ------------------------- 3 Tense present p delam, delaj sedanjik, velelnik future f bom prihodnjik od 'biti' past s delal opisni delez<nik na -l - -------------- -------------- - ----------------- ------------------------- 4 Person first 1 delam prva oseba second 2 delas< druga oseba third 3 dela tretja oseba - -------------- -------------- - ----------------- ------------------------- 5 Number singular s delam ednina plural p delamo mnoz<ina dual d delata dvojina - -------------- -------------- - ----------------- ------------------------- 6 Gender masculine m delal mos<ki spol feminine f delala z<enski spol neuter n delalo srednji spol ********************************* ----------------- ------------------------- 7 Voice active a delam, delal tvorni nac<in passive p (po)delan trpni nac<in - -------------- -------------- - ----------------- ------------------------- 8 Negative no n ho"cem nezanikan yes y no"cem zanikan - -------------- -------------- - ----------------- ------------------------- 9 Definiteness - - -------------- -------------- - 10Clitic - - -------------- -------------- - 11Case - - -------------- -------------- - 12Animate - - -------------- -------------- - 13Clitic_s - ================================= Notes: 1. The verb 'to be' in all its functions is Type=c; Auxiliary verbs Type=a do not include the 'to be' and modal verbs. 2. The "past participle" is actually used for making all the compound active tenses (future, past, pulperfect) and is encoded analytically as Type=participle, Tense=past, Voice=Active. 3. The passive participles is encoded analytically as Type=participle, Tense='-', Voice=Passive. This encoding will only be used for the passive participle in the predicative position, e.g. 'on je bil tepen'/he was beaten. The case marked passive participles in the attributive position (e.g. 'tepen pes'/beaten dog) or those in non-nominative case (e.g. 'hrano imam skuhano'/I have the food cooked[acc]) will be classified as (qualitative) adjectives. 4. The adjectival (e.g. 'stokajoc<'/moaning) and adverbial (e.g. 'lez<e'/lying down) participles are classified as adjectives and adverbs respectively. 5. Negative is always marked as 'n' except for two verbs: 'noc<em'/to not_want, 'nisem'/to not_be 2.2 Combinations *** **** **** **** **** **** **** ---- ---- ======================= PoS Type VFrm Tens Pers Numb Gend Voic Neg Examples *** **** **** **** **** **** **** ---- ---- ======================= V m n - - - - - n delati V a n - - - - - n imeti V o n - - - - - n hoteti V c n - - - - - n biti V c c - - - - - n bi V c i f [123] s - - n bom, bos<, bo V m i p 1 s - a n delam delas< dela V m m p 2 [sp] - - n delaj delajte V m u p [123] s - - n delat V m p s - s [mf] a n delal delala V m p - - s [mf] p n aretiran aretirana V a n p 1 s - a y nimam *** **** **** **** **** **** **** ---- ---- ======================= 3. Adjective (A) 3.1 Lexicon = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type qualificative f lep kakovostni possessive s stric<ev svojilni l.s. ordinal o slovenski vrstni - -------------- -------------- - ----------------- ------------------------- 2 Degree positive p lep osnovnik comparative c leps<i primerjalnik superlative s najleps<i presez<nik - -------------- -------------- - ----------------- ------------------------- 3 Gender masculine m lep mos<ki spol feminine f lepa z<enski spol neuter n lepo srednji spol - -------------- -------------- - ----------------- ------------------------- 4 Number singular s lep ednina plural p lepi mnoz<ina dual d lepa dvojina - -------------- -------------- - ----------------- ------------------------- 5 Case nominative n lep, lepi imenovalnik genitive g lepega rodilnik dative d lepemu dajalnik accusative a lep, lepega toz<ilnik locative l lepemu mestnik instrumental i lepim orodnik * ***************************** * ----------------- ------------------------- 6 Definiteness - - -------------- -------------- - 7 Clitic - - -------------- -------------- - 8 Animate - - -------------- -------------- - 9 Formation - - -------------- -------------- - 10Owner_Number - - -------------- -------------- - 11Owner_Person - ---------------- -------------- - 12Owned_Number - =============================== = Notes: The three deverbative adjectival participles are classified as qualificative adjectives. They differ from other adjectives by having Degree:- 3.2 Combinations *** **** **** **** **** **** ================================================ PoS Type Degr Gend Numb Case Examples *** **** **** **** **** **** ================================================ A s - m s n stric<ev A o - m s n slovenski A f [pcs] m s n lep/lepi leps<i najleps<i A f p [mfs] s n zdrav/zdravi zdrava zdravo A f - m s n otekel (obraz) premagan (sovrag) govorec< (pes) A f p m [spd] n lep lepa lepo *** **** **** **** **** **** ================================================ 4. Pronoun (P) 4.1 Lexicon = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type personal p jaz osebni demonstrative d ta kazalni indefinite i nekdo nedoloc<nostni possessive s moj svojilni interrogative q kdo vpras<alni relative r kdor oziralnostni reflexive x se povratni negative z nihc<e nikalni general g vsak celostni - -------------- -------------- - ----------------- ------------------------- 2 Person first 1 moj prva oseba second 2 tvoj druga oseba third 3 njegov tretja oseba - -------------- -------------- - ----------------- ------------------------- 3 Gender masculine m moj mos<ki spol feminine f moja z<enski spol neuter n moje srednji spol - -------------- -------------- - ----------------- ------------------------- 4 Number singular s moj ednina plural p moji mnoz<ina dual d moja dvojina - -------------- -------------- - ----------------- ------------------------- 5 Case nominative n moj imenovalnik genitive g mojega rodilnik dative d mojemu dajalnik accusative a moj/mojega toz<ilnik locative l mojemu mestnik instrumental i mojim orodnik - -------------- -------------- - ----------------- ------------------------- 6 Owner_Number singular s moj ednina svojine plural p nas< mnoz<ina svojine dual d najin dvojina svojine - -------------- -------------- - ----------------- ------------------------- 7 Owner_Gender masculine m njegov mos<ki spol svojine feminine f njen z<enski spol svojine neuter n njegov srednji spol svojine ********************************* ----------------- ------------------------- 8 Clitic no n mene beseda yes y me naslonka - -------------- -------------- - ----------------- ------------------------- 9 Referent_Type personal p sebe, se, si osebni povratni possessive s svoj svojilni povratni - -------------- -------------- - ----------------- ------------------------- 10Syntactic_Type nominal n kdo samostalnis<ki adjectival a kateri pridevnis<ki - -------------- -------------- - ----------------- ------------------------- 11Definiteness - - -------------- -------------- - 12Animate - - -------------- -------------- - 13Clitic_s - - -------------- -------------- - 14Pronoun_Form - - -------------- -------------- - 15Owner_Person - - -------------- -------------- - 16Owned_Number - ================================= Notes: 1. The Type taxonomy for pronouns different from the one of Slovene grammars. Mostly Sl. grammar classes were merged. In particular, the indefinite and relative types contain many of pronoun types from Slovene grammars: (nedoloc<nostni = nedoloc<ni+poljubnostni+mnogostni+istostni+drugostni) (oziralnostni = oziralni + oziralno poljubnostni) 2. Type=relative includes so called 'relative general' (oziralno poljubnostni) pronouns. They can consist of two words, where the second is 'koli'. Variant spelling exists: 'kolikorkoli', 'kolikor koli'. Possible solution: they can be treated as compounds. 3. Type=reflexive ecompasses all reflexive pronouns (sebe, se, sebi, si, svoj) as well as 'se' in its role as the obligatory 'constituent' of reflexive verbs. Personal and possessive reflexives are further distinguished via the Referent_Type attribute. 'se' in all its roles will be marked as the reflexive personal clitic pronoun. It is also morphologically ambiguous as it is the acc. or gen. case. 'se' can also appear as a suffix: e.g. 'zase'/for_oneself with variant spelling 'za se'. It is not clear what to do with this case. 4. Owner_Gender is relevant only for third person singular possessive pronouns, and distinguishes only masculine and feminine forms. It will be '-' in all other cases. 5. As in Slovene grammars, pronouns are distinguished between having a nominal and adjectival function. All pronoun types except demonstrative and possessive can be nominal, and all except personal can be adjectival. In the future version of the specifications, this attribute should be given a lower position, as it is used by three languages. 6. Referent_Type is used to distinguish personal reflexives (which include 'se' in all its functions) from the possesive reflexives ('svoj'). 4.2 Combinations *** **** **** **** **** **** **** **** ---- --- --- ================ PoS Type Pers Gend Numb Case PosN PosG Clit Ref Syn Examples *** **** **** **** **** **** **** **** ---- --- --- ================ P x - - - [ga] - - y p n se P x - - - [ga] - - n p n sebe P x - - - d - - y p n si P x - - - [dl] - - n p n sebi P x - m s a s - n s a svoj P p 1 - s n - - n - n jaz P s 1 m s [na] s - - - a moj P s 2 m s [na] s - - - a tvoj P s 3 m s [na] s [mn] - - a njegov P d - m s n - - - - a 1) P i - - s n - - - - n 2) P i - m s n - - - - a 3) P q - m s n - - - - n 4) P q - m s n - - - - a 5) P q - - - - - - - - a koliko P r - m s n - - - - n 6) P r - m s n - - - - a 7) P z - - s n - - - - n nihc<e, nobeden P z - m s n - - - - a nikakrs<en, noben P g - - s n - - - - n vsakdo P g - m s n - - - - a vsak **** **** **** **** **** **** **** **** --- --- --- ================ 1) tak, taks<en, takle, ta, tisti, oni, toliki 2) nekdo, kdo, marsikdo, eden, kateri, vsak, marsikateri 3) kak(r)s<en, enak, drugac<en, neki, kateri, marsikateri, isti, drug, nekoliko, kateri, nic<(?), vsi, dokaj(?) 4) kdo, kateri 5) c<igav, kaks<en, koliks<en, kateri 6) kdor, kar, kateri (...koli) 7) kakrs<en, kateri, c<igar, kolikor (...koli) 5. Determiner (D) Not applicable. 6. Article (T) Not applicable. Notes: This category (Slovene: 'c<len') exists for Slovene, but it encompasses only two words ('ta'/this, 'en'/one) and even these are used only colloquially. Furthermore, the context in which these two words can appear as articles is identical to the one in which they appear as a pronoun or numeral respectively. As it would be impossible to disambiguate between their two meanings, this category will not be used for Slovene. 7. Adverb (R) 7.1 Lexicon = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type general g potihoma prislov - -------------- -------------- - ----------------- ------------------------- 2 Degree positive p malo osnovnik comparative c manj primernik superlative s najmanj presez<nik ********************************* ----------------- ------------------------- 3 Clitic - - -------------- -------------- - 4 Number - - -------------- -------------- - 5 Person - ================================= Notes: Some pronouns can also have an adverbial function. This issue has not been considered here. 7.2 Combinations *** **** **** ============================================================== PoS Type Degr Examples *** **** **** ============================================================== R g p malo R g c manj R g s najmanj R g - absolutno, anglesko, potihoma, ... *** **** **** ============================================================== 8. Adposition (S) 8.1 Lexicon = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type preposition p nad predlog - -------------- -------------- - ----------------- ------------------------- 2 Formation simple s nad, pod (enostaven) predlog compound c zase, nanj predlog s pripon. zaimkom ********************************* ----------------- ------------------------- 3 Case (req.by prep.) genitive g brez rodilnik dative d k dajalnik accusative a po toz<ilnik locative l pri mestnik instrumental i s orodnik - -------------- -------------- - ----------------- ------------------------- 4 Clitic - ================================= Notes: Slovene has prepositions only, with some arguable exceptions (e.g. 'navkljub'/in spite of), which can be pre- or postpositions. These would need further study. 8.2 Combinations *** **** **** ---- ========================================================= PoS Type Form Case Examples *** **** **** ---- ========================================================= S p s g brez S p s d k S p s [al] po S p s l pri S p s [gi] s S pc c - zase, nase, nanj, nanjo, nanju,... *** **** **** ---- ========================================================= 9. Conjunction (C) 9.1 Lexicon = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type coordinating c in priredni veznik subordinating s da podredni veznik ********************************* ----------------- ------------------------- 2 Formation simple s in, da, ... enodelni, dvodelni compound c kljub_temu_da vec<besedni - -------------- -------------- - ----------------- ------------------------- 3 Coord_Type - - -------------- -------------- - 4 Sub_Type - - -------------- -------------- - 5 Clitic - - -------------- -------------- - 6 Number - - -------------- -------------- - 7 Person - ================================= Notes: 1. According to Slov. grammar, some conjunctions are 'two-part', but these can often be either single or two-part conjunctions with identical first and second conjunct, e.g. ali ... ali; such conunctions are treated as ordinary, 'one-part' conjunctions, i.e. as Formation=simple. 2. Conjunctions are also classified into 'multi-word' conjunctions, e.g. 'kljub temu da'/in spite of. These conjuctions should be merged by the tokeniser e.g.: kljub_temu_da and marked as Formation=compound. 9.2 Combinations *** **** ---- =============================================================== PoS Type Form Examples *** **** ---- =============================================================== C c s in, ali C s s da, ki C s c kljub temu da *** **** ==================================================================== 10. Numeral (M) 10.1 Lexicon = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type cardinal c en glavni s<tevnik ordinal o prvi vrstilni s<tevnik multiple m enojen mnoz<ilni s<tevnik special s dvoje, ... ostali s<tevniki - -------------- -------------- - ----------------- ------------------------- 2 Gender masculine m en mos<ki spol feminine f ena z<enski spol neuter n eno srednji spol - -------------- -------------- - ----------------- ------------------------- 3 Number singular s en ednina plural p tri mnoz<ina dual d dva dvojina - -------------- -------------- - ----------------- ------------------------- 4 Case nominative n en imenovalnik genitive g enega rodilnik dative d enemu dajalnik accusative a en / enega toz<ilnik locative l enemu mestnik instrumental i enim orodnik ********************************* ----------------- ------------------------- 5 Form digit d 1984 arabska s<tevilka roman r MCMXXCIV rimska stevilka letter l tisoc<devetsto s<tevnik - -------------- -------------- - ----------------- ------------------------- 6 Definiteness - - -------------- -------------- - 7 Clitic - - -------------- -------------- - 8 Class - - -------------- -------------- - 9 Animate - - -------------- -------------- - 10Owner_Number - - -------------- -------------- - 11Owner_Person - - -------------- -------------- - 12Owned_Number - = ============== ============== = Notes: Numerals in Slovene can function as nouns, adjectives or adverbs, and are in grammars described as subtypes of these categories. Therefore the above classification runs counter to the established practice. 10.2 Combinations *** **** **** **** **** ---- =============================================== PoS Type Gend Numb Case Form Examples *** **** **** **** **** ---- =============================================== M - - - - d 1984 M - - - - r MCMXXCIV M c m s n l en, dva, trije, s<tirje, pet M o m s n l prvi, drugi, tretji, c<etrti, peti M m m s n l enojen, dvojen, trojen, c<etveren, peter M s m s n l dvoje, troje, dvoj M c [mfn][spd] n l en, ena, eno, trije, tri, troje, dva, dve *** **** **** **** **** ==================================================== 11. Interjection (I) = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type - jejhata medmet - -------------- -------------- - ----------------- ------------------------- 2 Formation - ================================= 12. Residual (X) = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= P ATT - sic, $, a+b, ... Ostanki ================================= 13. Abbreviation (Y) = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Syntactic_Type - TAM okrajs<ava - -------------- -------------- - ------------------- ------------------------ 2 Gender - - -------------- -------------- - 3 Number - - -------------- -------------- - 4 Case - - -------------- -------------- - 5 Definiteness - ================================= Notes: This category usually functions as a proper noun or adjective, therefore gender, number and case could be assigned to it. However, cases where abbreviations are declined (i.e. mark their inflection with an ending, e.g. 'TAM-a') are rare, so including this information would only greatly increase the ambiguity of abbreviations. 14. Particle (Q) = ============== ============== = ================= ========================= P ATT VAL C Example Slovene term = ============== ============== = ================= ========================= 1 Type - spet c<lenek - -------------- -------------- - ----------------- ------------------------- 2 Formation - - -------------- -------------- - 3 Clitic - =================================