COP project 106 MULTEXT-East Deliverable D1.1 M --- Slovene
The application to Slovene has been elaborated by Tomaz Erjavec and Peter Holozan.
Acknowledgements:
The authors thank Velimir Gjurin, France Zagar,
Vladimír Petkevic, Lydia Sinapova and
David Stermole for their much appreciated comments and suggestions.
All errors of course remain our own.
All Slovene diacritical characters used have been encoded in the following way:
1. Noun (N)
1.1 Lexicon
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type common c stol obc<i samostalnik
proper p Janez lastni samostalnik
- -------------- -------------- - ----------------- -------------------------
2 Gender masculine m stol mos<ki spol
feminine f miza z<enski spol
neuter n sonce srednji spol
- -------------- -------------- - ----------------- -------------------------
3 Number singular s stol ednina
plural p stoli mnoz<ina
dual d stola dvojina
- -------------- -------------- - ----------------- -------------------------
4 Case nominative n stol, medved imenovalnik
genitive g stola, medveda rodilnik
dative d stolu dajalnik
accusative a stol, medveda toz<ilnik
locative l stolu mestnik
instrumental i stolom orodnik
* ***************************** * ----------------- -------------------------
5 Definiteness -
- -------------- -------------- -
6 Clitic -
- -------------- -------------- -
7 Animate -
- -------------- -------------- -
8 Owner_Number -
- -------------- -------------- -
9 Owner_Person -
---------------- -------------- -
10Owned_Number -
=================================
1.2 Combinations
*** **** **** **** **** =====================================================
PoS Type Gend Numb Case Examples
*** **** **** **** **** =====================================================
N c n s n iskanje ('gerund' classified a Noun)
N p [mf] s n Janez, Micka
N c [mfn] s n stol, lipa, sonce
N c [mfn] s g stola, lipe, sonca
N c [mfn] s d stolu, lipi, soncu
N c [mfn] s a stol, lipo, sonce
N c [mfn] s l stolu, lipi, soncu
N c [mfn] s i stolom, lipo, soncem
N c [mfn] p n stoli, lipe, sonca
N c [mfn] p g stolov, lip, sonc
N c [mfn] p d stolom, lipam, soncem
N c [mfn] p a stole, lipe, sonca
N c [mfn] p l stolih, lipah, soncih
N c [mfn] p i stoli, lipami, sonci
N c [mfn] d n stola, lipi, sonci
N c [mfn] d g stolov, lip, sonc
N c [mfn] d d stoloma, lipama, soncema
N c [mfn] d a stola, lipi, sonci
N c [mfn] d l stolih, lipah, soncih
N c [mfn] d i stoloma, lipama, soncem
*** **** **** **** **** =====================================================
2. Verb (V)
2.1 Lexicon
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type main m delati glavni glagol
auxiliary a imeti pomoz<ni glagol
modal o hoteti naklonski glagol
copula c biti pomoz<nik 'biti'
- -------------- -------------- - ----------------- -------------------------
2 VForm indicative i delam sedanjik
imperative m delaj velelnik
conditional c bi pogojnik od 'biti'
infinitive n delati nedoloc<nik
participle p delal,(po)delan delez<nik
supine u delat namenilnik
- -------------- -------------- - ----------------- -------------------------
3 Tense present p delam, delaj sedanjik, velelnik
future f bom prihodnjik od 'biti'
past s delal opisni delez<nik na -l
- -------------- -------------- - ----------------- -------------------------
4 Person first 1 delam prva oseba
second 2 delas< druga oseba
third 3 dela tretja oseba
- -------------- -------------- - ----------------- -------------------------
5 Number singular s delam ednina
plural p delamo mnoz<ina
dual d delata dvojina
- -------------- -------------- - ----------------- -------------------------
6 Gender masculine m delal mos<ki spol
feminine f delala z<enski spol
neuter n delalo srednji spol
********************************* ----------------- -------------------------
7 Voice active a delam, delal tvorni nac<in
passive p (po)delan trpni nac<in
- -------------- -------------- - ----------------- -------------------------
8 Negative no n ho"cem nezanikan
yes y no"cem zanikan
- -------------- -------------- - ----------------- -------------------------
9 Definiteness -
- -------------- -------------- -
10Clitic -
- -------------- -------------- -
11Case -
- -------------- -------------- -
12Animate -
- -------------- -------------- -
13Clitic_s -
=================================
Notes:
1. The verb 'to be' in all its functions is Type=c;
Auxiliary verbs Type=a do not include the 'to be' and modal verbs.
2. The "past participle" is actually used for making all the compound
active tenses (future, past, pulperfect) and is encoded
analytically as Type=participle, Tense=past, Voice=Active.
3. The passive participles is encoded analytically as
Type=participle, Tense='-', Voice=Passive.
This encoding will only be used for the passive participle in the
predicative position, e.g. 'on je bil tepen'/he was beaten. The
case marked passive participles in the attributive position (e.g.
'tepen pes'/beaten dog) or those in non-nominative case (e.g.
'hrano imam skuhano'/I have the food cooked[acc]) will be
classified as (qualitative) adjectives.
4. The adjectival (e.g. 'stokajoc<'/moaning) and adverbial (e.g.
'lez<e'/lying down) participles are classified as adjectives and
adverbs respectively.
5. Negative is always marked as 'n' except for two verbs:
'noc<em'/to not_want, 'nisem'/to not_be
2.2 Combinations
*** **** **** **** **** **** **** ---- ---- =======================
PoS Type VFrm Tens Pers Numb Gend Voic Neg Examples
*** **** **** **** **** **** **** ---- ---- =======================
V m n - - - - - n delati
V a n - - - - - n imeti
V o n - - - - - n hoteti
V c n - - - - - n biti
V c c - - - - - n bi
V c i f [123] s - - n bom, bos<, bo
V m i p 1 s - a n delam delas< dela
V m m p 2 [sp] - - n delaj delajte
V m u p [123] s - - n delat
V m p s - s [mf] a n delal delala
V m p - - s [mf] p n aretiran aretirana
V a n p 1 s - a y nimam
*** **** **** **** **** **** **** ---- ---- =======================
3. Adjective (A)
3.1 Lexicon
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type qualificative f lep kakovostni
possessive s stric<ev svojilni
l.s. ordinal o slovenski vrstni
- -------------- -------------- - ----------------- -------------------------
2 Degree positive p lep osnovnik
comparative c leps<i primerjalnik
superlative s najleps<i presez<nik
- -------------- -------------- - ----------------- -------------------------
3 Gender masculine m lep mos<ki spol
feminine f lepa z<enski spol
neuter n lepo srednji spol
- -------------- -------------- - ----------------- -------------------------
4 Number singular s lep ednina
plural p lepi mnoz<ina
dual d lepa dvojina
- -------------- -------------- - ----------------- -------------------------
5 Case nominative n lep, lepi imenovalnik
genitive g lepega rodilnik
dative d lepemu dajalnik
accusative a lep, lepega toz<ilnik
locative l lepemu mestnik
instrumental i lepim orodnik
* ***************************** * ----------------- -------------------------
6 Definiteness -
- -------------- -------------- -
7 Clitic -
- -------------- -------------- -
8 Animate -
- -------------- -------------- -
9 Formation -
- -------------- -------------- -
10Owner_Number -
- -------------- -------------- -
11Owner_Person -
---------------- -------------- -
12Owned_Number -
=============================== =
Notes:
The three deverbative adjectival participles are classified as
qualificative adjectives. They differ from other adjectives by having
Degree:-
3.2 Combinations
*** **** **** **** **** **** ================================================
PoS Type Degr Gend Numb Case Examples
*** **** **** **** **** **** ================================================
A s - m s n stric<ev
A o - m s n slovenski
A f [pcs] m s n lep/lepi leps<i najleps<i
A f p [mfs] s n zdrav/zdravi zdrava zdravo
A f - m s n otekel (obraz) premagan (sovrag) govorec< (pes)
A f p m [spd] n lep lepa lepo
*** **** **** **** **** **** ================================================
4. Pronoun (P)
4.1 Lexicon
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type personal p jaz osebni
demonstrative d ta kazalni
indefinite i nekdo nedoloc<nostni
possessive s moj svojilni
interrogative q kdo vpras<alni
relative r kdor oziralnostni
reflexive x se povratni
negative z nihc<e nikalni
general g vsak celostni
- -------------- -------------- - ----------------- -------------------------
2 Person first 1 moj prva oseba
second 2 tvoj druga oseba
third 3 njegov tretja oseba
- -------------- -------------- - ----------------- -------------------------
3 Gender masculine m moj mos<ki spol
feminine f moja z<enski spol
neuter n moje srednji spol
- -------------- -------------- - ----------------- -------------------------
4 Number singular s moj ednina
plural p moji mnoz<ina
dual d moja dvojina
- -------------- -------------- - ----------------- -------------------------
5 Case nominative n moj imenovalnik
genitive g mojega rodilnik
dative d mojemu dajalnik
accusative a moj/mojega toz<ilnik
locative l mojemu mestnik
instrumental i mojim orodnik
- -------------- -------------- - ----------------- -------------------------
6 Owner_Number singular s moj ednina svojine
plural p nas< mnoz<ina svojine
dual d najin dvojina svojine
- -------------- -------------- - ----------------- -------------------------
7 Owner_Gender masculine m njegov mos<ki spol svojine
feminine f njen z<enski spol svojine
neuter n njegov srednji spol svojine
********************************* ----------------- -------------------------
8 Clitic no n mene beseda
yes y me naslonka
- -------------- -------------- - ----------------- -------------------------
9 Referent_Type personal p sebe, se, si osebni povratni
possessive s svoj svojilni povratni
- -------------- -------------- - ----------------- -------------------------
10Syntactic_Type nominal n kdo samostalnis<ki
adjectival a kateri pridevnis<ki
- -------------- -------------- - ----------------- -------------------------
11Definiteness -
- -------------- -------------- -
12Animate -
- -------------- -------------- -
13Clitic_s -
- -------------- -------------- -
14Pronoun_Form -
- -------------- -------------- -
15Owner_Person -
- -------------- -------------- -
16Owned_Number -
=================================
Notes:
1. The Type taxonomy for pronouns different from the one of Slovene
grammars. Mostly Sl. grammar classes were merged. In particular,
the indefinite and relative types contain many of pronoun types
from Slovene grammars:
(nedoloc<nostni = nedoloc<ni+poljubnostni+mnogostni+istostni+drugostni)
(oziralnostni = oziralni + oziralno poljubnostni)
2. Type=relative includes so called 'relative general' (oziralno poljubnostni)
pronouns. They can consist of two words, where the second is 'koli'.
Variant spelling exists: 'kolikorkoli', 'kolikor koli'.
Possible solution: they can be treated as compounds.
3. Type=reflexive ecompasses all reflexive pronouns (sebe, se, sebi,
si, svoj) as well as 'se' in its role as the obligatory 'constituent' of
reflexive verbs. Personal and possessive reflexives are further
distinguished via the Referent_Type attribute.
'se' in all its roles will be marked as the reflexive personal
clitic pronoun.
It is also morphologically ambiguous as it is the acc. or gen. case.
'se' can also appear as a suffix: e.g. 'zase'/for_oneself with
variant spelling 'za se'. It is not clear what to do with this case.
4. Owner_Gender is relevant only for third person singular
possessive pronouns, and distinguishes only masculine and feminine
forms. It will be '-' in all other cases.
5. As in Slovene grammars, pronouns are distinguished between having a nominal
and adjectival function. All pronoun types except demonstrative and
possessive can be nominal, and all except personal can be
adjectival.
In the future version of the specifications, this attribute should
be given a lower position, as it is used by three languages.
6. Referent_Type is used to distinguish personal reflexives (which include
'se' in all its functions) from the possesive reflexives ('svoj').
4.2 Combinations
*** **** **** **** **** **** **** **** ---- --- --- ================
PoS Type Pers Gend Numb Case PosN PosG Clit Ref Syn Examples
*** **** **** **** **** **** **** **** ---- --- --- ================
P x - - - [ga] - - y p n se
P x - - - [ga] - - n p n sebe
P x - - - d - - y p n si
P x - - - [dl] - - n p n sebi
P x - m s a s - n s a svoj
P p 1 - s n - - n - n jaz
P s 1 m s [na] s - - - a moj
P s 2 m s [na] s - - - a tvoj
P s 3 m s [na] s [mn] - - a njegov
P d - m s n - - - - a 1)
P i - - s n - - - - n 2)
P i - m s n - - - - a 3)
P q - m s n - - - - n 4)
P q - m s n - - - - a 5)
P q - - - - - - - - a koliko
P r - m s n - - - - n 6)
P r - m s n - - - - a 7)
P z - - s n - - - - n nihc<e, nobeden
P z - m s n - - - - a nikakrs<en, noben
P g - - s n - - - - n vsakdo
P g - m s n - - - - a vsak
**** **** **** **** **** **** **** **** --- --- --- ================
1) tak, taks<en, takle, ta, tisti, oni, toliki
2) nekdo, kdo, marsikdo, eden, kateri, vsak, marsikateri
3) kak(r)s<en, enak, drugac<en, neki, kateri, marsikateri,
isti, drug, nekoliko, kateri, nic<(?), vsi, dokaj(?)
4) kdo, kateri
5) c<igav, kaks<en, koliks<en, kateri
6) kdor, kar, kateri (...koli)
7) kakrs<en, kateri, c<igar, kolikor (...koli)
5. Determiner (D)
Not applicable.
6. Article (T)
Not applicable.
Notes:
This category (Slovene: 'c<len') exists for Slovene, but it
encompasses only two words ('ta'/this, 'en'/one) and even these are
used only colloquially. Furthermore, the context in which these two
words can appear as articles is identical to the one in which they
appear as a pronoun or numeral respectively. As it would be impossible
to disambiguate between their two meanings, this category will not be
used for Slovene.
7. Adverb (R)
7.1 Lexicon
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type general g potihoma prislov
- -------------- -------------- - ----------------- -------------------------
2 Degree positive p malo osnovnik
comparative c manj primernik
superlative s najmanj presez<nik
********************************* ----------------- -------------------------
3 Clitic -
- -------------- -------------- -
4 Number -
- -------------- -------------- -
5 Person -
=================================
Notes:
Some pronouns can also have an adverbial function. This issue has not
been considered here.
7.2 Combinations
*** **** **** ==============================================================
PoS Type Degr Examples
*** **** **** ==============================================================
R g p malo
R g c manj
R g s najmanj
R g - absolutno, anglesko, potihoma, ...
*** **** **** ==============================================================
8. Adposition (S)
8.1 Lexicon
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type preposition p nad predlog
- -------------- -------------- - ----------------- -------------------------
2 Formation simple s nad, pod (enostaven) predlog
compound c zase, nanj predlog s pripon. zaimkom
********************************* ----------------- -------------------------
3 Case
(req.by prep.) genitive g brez rodilnik
dative d k dajalnik
accusative a po toz<ilnik
locative l pri mestnik
instrumental i s orodnik
- -------------- -------------- - ----------------- -------------------------
4 Clitic -
=================================
Notes:
Slovene has prepositions only, with some arguable exceptions
(e.g. 'navkljub'/in spite of), which can be pre- or postpositions.
These would need further study.
8.2 Combinations
*** **** **** ---- =========================================================
PoS Type Form Case Examples
*** **** **** ---- =========================================================
S p s g brez
S p s d k
S p s [al] po
S p s l pri
S p s [gi] s
S pc c - zase, nase, nanj, nanjo, nanju,...
*** **** **** ---- =========================================================
9. Conjunction (C)
9.1 Lexicon
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type coordinating c in priredni veznik
subordinating s da podredni veznik
********************************* ----------------- -------------------------
2 Formation simple s in, da, ... enodelni, dvodelni
compound c kljub_temu_da vec<besedni
- -------------- -------------- - ----------------- -------------------------
3 Coord_Type -
- -------------- -------------- -
4 Sub_Type -
- -------------- -------------- -
5 Clitic -
- -------------- -------------- -
6 Number -
- -------------- -------------- -
7 Person -
=================================
Notes:
1. According to Slov. grammar, some conjunctions are 'two-part', but these
can often be either single or two-part conjunctions with identical first
and second conjunct, e.g. ali ... ali; such conunctions are treated
as ordinary, 'one-part' conjunctions, i.e. as Formation=simple.
2. Conjunctions are also classified into 'multi-word' conjunctions,
e.g. 'kljub temu da'/in spite of. These conjuctions should be merged by
the tokeniser e.g.: kljub_temu_da and marked as Formation=compound.
9.2 Combinations
*** **** ---- ===============================================================
PoS Type Form Examples
*** **** ---- ===============================================================
C c s in, ali
C s s da, ki
C s c kljub temu da
*** **** ====================================================================
10. Numeral (M)
10.1 Lexicon
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type cardinal c en glavni s<tevnik
ordinal o prvi vrstilni s<tevnik
multiple m enojen mnoz<ilni s<tevnik
special s dvoje, ... ostali s<tevniki
- -------------- -------------- - ----------------- -------------------------
2 Gender masculine m en mos<ki spol
feminine f ena z<enski spol
neuter n eno srednji spol
- -------------- -------------- - ----------------- -------------------------
3 Number singular s en ednina
plural p tri mnoz<ina
dual d dva dvojina
- -------------- -------------- - ----------------- -------------------------
4 Case nominative n en imenovalnik
genitive g enega rodilnik
dative d enemu dajalnik
accusative a en / enega toz<ilnik
locative l enemu mestnik
instrumental i enim orodnik
********************************* ----------------- -------------------------
5 Form digit d 1984 arabska s<tevilka
roman r MCMXXCIV rimska stevilka
letter l tisoc<devetsto s<tevnik
- -------------- -------------- - ----------------- -------------------------
6 Definiteness -
- -------------- -------------- -
7 Clitic -
- -------------- -------------- -
8 Class -
- -------------- -------------- -
9 Animate -
- -------------- -------------- -
10Owner_Number -
- -------------- -------------- -
11Owner_Person -
- -------------- -------------- -
12Owned_Number -
= ============== ============== =
Notes:
Numerals in Slovene can function as nouns, adjectives or adverbs, and
are in grammars described as subtypes of these categories. Therefore
the above classification runs counter to the established practice.
10.2 Combinations
*** **** **** **** **** ---- ===============================================
PoS Type Gend Numb Case Form Examples
*** **** **** **** **** ---- ===============================================
M - - - - d 1984
M - - - - r MCMXXCIV
M c m s n l en, dva, trije, s<tirje, pet
M o m s n l prvi, drugi, tretji, c<etrti, peti
M m m s n l enojen, dvojen, trojen, c<etveren, peter
M s m s n l dvoje, troje, dvoj
M c [mfn][spd] n l en, ena, eno, trije, tri, troje, dva, dve
*** **** **** **** **** ====================================================
11. Interjection (I)
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type - jejhata medmet
- -------------- -------------- - ----------------- -------------------------
2 Formation -
=================================
12. Residual (X)
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
P ATT - sic, $, a+b, ... Ostanki
=================================
13. Abbreviation (Y)
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Syntactic_Type - TAM okrajs<ava
- -------------- -------------- - ------------------- ------------------------
2 Gender -
- -------------- -------------- -
3 Number -
- -------------- -------------- -
4 Case -
- -------------- -------------- -
5 Definiteness -
=================================
Notes:
This category usually functions as a proper noun or adjective,
therefore gender, number and case could be assigned to it. However,
cases where abbreviations are declined (i.e. mark their inflection
with an ending, e.g. 'TAM-a') are rare, so including this information
would only greatly increase the ambiguity of abbreviations.
14. Particle (Q)
= ============== ============== = ================= =========================
P ATT VAL C Example Slovene term
= ============== ============== = ================= =========================
1 Type - spet c<lenek
- -------------- -------------- - ----------------- -------------------------
2 Formation -
- -------------- -------------- -
3 Clitic -
=================================