Application to Slovene

COP project 106 MULTEXT-East Deliverable D1.1 M --- Slovene

The application to Slovene has been elaborated by Tomaz Erjavec and Peter Holozan.

The authors thank Velimir Gjurin, France Zagar, Vladimír Petkevic, Lydia Sinapova and David Stermole for their much appreciated comments and suggestions. All errors of course remain our own.

All Slovene diacritical characters used have been encoded in the following way:

  1. The Slovene 'hachek' diacritic ( ) is marked by the corresponding nondiacritic counterpart followed by ' <'. The following possibilities exist:
    c< s< z<

1. Noun (N)

1.1 Lexicon

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type           common         c stol              obc<i samostalnik        
                 proper         p Janez             lastni samostalnik       
- -------------- -------------- - ----------------- -------------------------
2 Gender         masculine      m stol              mos<ki spol              
                 feminine       f miza              z<enski spol             
                 neuter         n sonce             srednji spol             
- -------------- -------------- - ----------------- -------------------------
3 Number         singular       s stol              ednina                   
                 plural         p stoli             mnoz<ina                 
                 dual           d stola             dvojina                  
- -------------- -------------- - ----------------- -------------------------
4 Case           nominative     n stol,  medved     imenovalnik              
                 genitive       g stola, medveda    rodilnik                 
                 dative         d stolu             dajalnik                 
                 accusative     a stol,  medveda    toz<ilnik                
                 locative       l stolu             mestnik                  
                 instrumental   i stolom            orodnik                  
* ***************************** * ----------------- -------------------------
5 Definiteness                  -
- -------------- -------------- -
6 Clitic                        -
- -------------- -------------- -
7 Animate                       -
- -------------- -------------- -
8 Owner_Number                  -
- -------------- -------------- -
9 Owner_Person                  -
---------------- -------------- -
10Owned_Number                  -

1.2 Combinations

*** **** **** **** **** =====================================================
PoS Type Gend Numb Case Examples
*** **** **** **** **** =====================================================
 N   c    n     s    n  iskanje         ('gerund' classified a Noun)
 N   p   [mf]   s    n  Janez, Micka
 N   c   [mfn]  s    n  stol,    lipa,   sonce
 N   c   [mfn]  s    g  stola,   lipe,   sonca
 N   c   [mfn]  s    d  stolu,   lipi,   soncu
 N   c   [mfn]  s    a  stol,    lipo,   sonce
 N   c   [mfn]  s    l  stolu,   lipi,   soncu
 N   c   [mfn]  s    i  stolom,  lipo,   soncem
 N   c   [mfn]  p    n  stoli,   lipe,   sonca
 N   c   [mfn]  p    g  stolov,  lip,    sonc
 N   c   [mfn]  p    d  stolom,  lipam,  soncem
 N   c   [mfn]  p    a  stole,   lipe,   sonca
 N   c   [mfn]  p    l  stolih,  lipah,  soncih
 N   c   [mfn]  p    i  stoli,   lipami, sonci
 N   c   [mfn]  d    n  stola,   lipi,   sonci
 N   c   [mfn]  d    g  stolov,  lip,    sonc
 N   c   [mfn]  d    d  stoloma, lipama, soncema
 N   c   [mfn]  d    a  stola,   lipi,   sonci
 N   c   [mfn]  d    l  stolih,  lipah,  soncih
 N   c   [mfn]  d    i  stoloma, lipama, soncem
*** **** **** **** **** =====================================================

2. Verb (V)

2.1 Lexicon

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type           main           m delati            glavni glagol            
                 auxiliary      a imeti             pomoz<ni glagol          
                 modal          o hoteti            naklonski glagol         
                 copula         c biti              pomoz<nik 'biti'         
- -------------- -------------- - ----------------- -------------------------
2 VForm          indicative     i delam             sedanjik                 
                 imperative     m delaj             velelnik                 
                 conditional    c bi                pogojnik od 'biti'       
                 infinitive     n delati            nedoloc<nik              
                 participle     p delal,(po)delan   delez<nik                
                 supine         u delat             namenilnik               
- -------------- -------------- - ----------------- -------------------------
3 Tense          present        p delam, delaj      sedanjik, velelnik       
                 future         f bom               prihodnjik od 'biti'     
                 past           s delal             opisni delez<nik na -l   
- -------------- -------------- - ----------------- -------------------------
4 Person         first          1 delam             prva oseba               
                 second         2 delas<            druga oseba              
                 third          3 dela              tretja oseba             
- -------------- -------------- - ----------------- -------------------------
5 Number         singular       s delam             ednina                   
                 plural         p delamo            mnoz<ina                 
                 dual           d delata            dvojina                  
- -------------- -------------- - ----------------- -------------------------
6 Gender         masculine      m delal             mos<ki spol              
                 feminine       f delala            z<enski spol             
                 neuter         n delalo            srednji spol             
********************************* ----------------- -------------------------
7 Voice          active         a delam, delal      tvorni nac<in            
                 passive        p (po)delan         trpni nac<in             
- -------------- -------------- - ----------------- -------------------------
8 Negative       no             n ho"cem            nezanikan                
                 yes            y no"cem            zanikan                  
- -------------- -------------- - ----------------- -------------------------
9 Definiteness                  -
- -------------- -------------- -
10Clitic                        -
- -------------- -------------- -
11Case                          -
- -------------- -------------- -
12Animate                       -
- -------------- -------------- -
13Clitic_s                      -


1. The verb 'to be' in all its functions is Type=c;
   Auxiliary verbs Type=a do not include the 'to be' and modal verbs.
2. The "past participle" is actually used for making all the compound
   active tenses (future, past, pulperfect) and is encoded
   analytically as  Type=participle, Tense=past, Voice=Active.
3. The passive participles is encoded analytically as 
   Type=participle, Tense='-', Voice=Passive. 
   This encoding will only be used for the passive participle in the
   predicative position, e.g. 'on je bil tepen'/he was beaten. The
   case marked passive participles in the attributive position (e.g. 
   'tepen pes'/beaten dog) or those in non-nominative case (e.g. 
   'hrano imam skuhano'/I have the food cooked[acc]) will be
   classified as (qualitative) adjectives. 
4. The adjectival (e.g. 'stokajoc<'/moaning) and adverbial (e.g. 
   'lez<e'/lying down) participles are classified as adjectives and 
   adverbs respectively.
5. Negative is always marked as 'n' except for two verbs:
   'noc<em'/to not_want, 'nisem'/to not_be

2.2 Combinations

*** **** **** **** **** **** **** ---- ---- =======================
PoS Type VFrm Tens Pers Numb Gend Voic Neg  Examples
*** **** **** **** **** **** **** ---- ---- =======================
 V   m     n    -    -    -    -    -   n   delati
 V   a     n    -    -    -    -    -   n   imeti
 V   o     n    -    -    -    -    -   n   hoteti
 V   c     n    -    -    -    -    -   n   biti
 V   c     c    -    -    -    -    -   n   bi
 V   c     i    f  [123]  s    -    -   n   bom, bos<, bo         
 V   m     i    p    1    s    -    a   n   delam delas< dela
 V   m     m    p    2  [sp]   -    -   n   delaj delajte
 V   m     u    p  [123]  s    -    -   n   delat
 V   m     p    s    -    s   [mf]  a   n   delal delala
 V   m     p    -    -    s   [mf]  p   n   aretiran aretirana
 V   a     n    p    1    s    -    a   y   nimam
*** **** **** **** **** **** **** ---- ---- =======================

3. Adjective (A)

3.1 Lexicon

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type           qualificative  f lep               kakovostni               
                 possessive     s stric<ev          svojilni                 
           l.s.  ordinal        o slovenski         vrstni                   
- -------------- -------------- - ----------------- -------------------------
2 Degree         positive       p lep               osnovnik                 
                 comparative    c leps<i            primerjalnik             
                 superlative    s najleps<i         presez<nik               
- -------------- -------------- - ----------------- -------------------------
3 Gender         masculine      m lep               mos<ki spol              
                 feminine       f lepa              z<enski spol             
                 neuter         n lepo              srednji spol             
- -------------- -------------- - ----------------- -------------------------
4 Number         singular       s lep               ednina                   
                 plural         p lepi              mnoz<ina                 
                 dual           d lepa              dvojina                  
- -------------- -------------- - ----------------- -------------------------
5 Case           nominative     n lep, lepi         imenovalnik              
                 genitive       g lepega            rodilnik                 
                 dative         d lepemu            dajalnik                 
                 accusative     a lep, lepega       toz<ilnik                
                 locative       l lepemu            mestnik                  
                 instrumental   i lepim             orodnik                  
* ***************************** * ----------------- -------------------------
6 Definiteness                  -
- -------------- -------------- -
7 Clitic                        -
- -------------- -------------- -
8 Animate                       -
- -------------- -------------- -
9 Formation                     -
- -------------- -------------- -
10Owner_Number                  -
- -------------- -------------- -
11Owner_Person                  -
---------------- -------------- -
12Owned_Number                  -
=============================== =


The three deverbative adjectival participles are classified as
qualificative adjectives. They differ from other adjectives by having

3.2 Combinations

*** **** **** **** **** **** ================================================
PoS Type Degr Gend Numb Case Examples
*** **** **** **** **** **** ================================================
 A   s     -    m    s    n  stric<ev
 A   o     -    m    s    n  slovenski
 A   f   [pcs]  m    s    n  lep/lepi leps<i najleps<i
 A   f     p  [mfs]  s    n  zdrav/zdravi zdrava zdravo
 A   f     -    m    s    n  otekel (obraz) premagan (sovrag) govorec< (pes)
 A   f     p    m  [spd]  n  lep lepa lepo
*** **** **** **** **** **** ================================================

4. Pronoun (P)

4.1 Lexicon

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type           personal       p jaz               osebni                   
                 demonstrative  d ta                kazalni                  
                 indefinite     i nekdo             nedoloc<nostni           
                 possessive     s moj               svojilni                 
                 interrogative  q kdo               vpras<alni               
                 relative       r kdor              oziralnostni             
                 reflexive      x se                povratni                 
                 negative       z nihc<e            nikalni
                 general        g vsak              celostni
- -------------- -------------- - ----------------- -------------------------
2 Person         first          1 moj               prva oseba               
                 second         2 tvoj              druga oseba              
                 third          3 njegov            tretja oseba             
- -------------- -------------- - ----------------- -------------------------
3 Gender         masculine      m moj               mos<ki spol              
                 feminine       f moja              z<enski spol             
                 neuter         n moje              srednji spol             
- -------------- -------------- - ----------------- -------------------------
4 Number         singular       s moj               ednina                   
                 plural         p moji              mnoz<ina                 
                 dual           d moja              dvojina                  
- -------------- -------------- - ----------------- -------------------------
5 Case           nominative     n moj               imenovalnik              
                 genitive       g mojega            rodilnik                 
                 dative         d mojemu            dajalnik                 
                 accusative     a moj/mojega        toz<ilnik                
                 locative       l mojemu            mestnik                  
                 instrumental   i mojim             orodnik                  
- -------------- -------------- - ----------------- -------------------------
6 Owner_Number   singular       s moj               ednina svojine           
                 plural         p nas<              mnoz<ina svojine         
                 dual           d najin             dvojina svojine          
- -------------- -------------- - ----------------- -------------------------
7 Owner_Gender   masculine      m njegov            mos<ki spol svojine      
                 feminine       f njen              z<enski spol svojine     
                 neuter         n njegov            srednji spol svojine
********************************* ----------------- -------------------------
8 Clitic         no             n mene              beseda                   
                 yes            y me                naslonka                 
- -------------- -------------- - ----------------- -------------------------
9 Referent_Type  personal       p sebe, se, si      osebni povratni  
                 possessive     s svoj              svojilni povratni
- -------------- -------------- - ----------------- -------------------------
10Syntactic_Type nominal        n kdo               samostalnis<ki            
                 adjectival     a kateri            pridevnis<ki              
- -------------- -------------- - ----------------- ------------------------- 
11Definiteness                  -
- -------------- -------------- -
12Animate                       -
- -------------- -------------- -
13Clitic_s                      -
- -------------- -------------- -
14Pronoun_Form                  -
- -------------- -------------- -
15Owner_Person                  -
- -------------- -------------- -
16Owned_Number                  -


1. The Type taxonomy for pronouns different from the one of Slovene
   grammars. Mostly Sl. grammar classes were merged. In particular,
   the indefinite and relative types contain many of pronoun types
   from Slovene grammars:
   (nedoloc<nostni = nedoloc<ni+poljubnostni+mnogostni+istostni+drugostni) 
   (oziralnostni = oziralni + oziralno poljubnostni) 
2. Type=relative includes so called 'relative general' (oziralno poljubnostni) 
   pronouns. They can consist of two words, where the second is 'koli'. 
   Variant spelling exists: 'kolikorkoli', 'kolikor koli'.
   Possible solution: they can be treated as compounds.
3. Type=reflexive ecompasses all reflexive pronouns (sebe, se, sebi,
   si, svoj) as well as 'se' in its role as the obligatory 'constituent' of 
   reflexive verbs. Personal and possessive reflexives are further
   distinguished via the Referent_Type attribute.
   'se' in all its roles will be marked as the reflexive personal
   clitic pronoun.
   It is also morphologically ambiguous as it is the acc. or gen. case.
  'se' can also appear as a suffix: e.g. 'zase'/for_oneself with
   variant spelling 'za se'. It is not clear what to do with this case.
4. Owner_Gender is relevant only for third person singular
   possessive pronouns, and distinguishes only masculine and feminine 
   forms. It will be '-' in all other cases.
5. As in Slovene grammars, pronouns are distinguished between having a nominal
   and adjectival function. All pronoun types except demonstrative and
   possessive can be nominal, and all except personal can be
   In the future version of the specifications, this attribute should
   be given a lower position, as it is used by three languages.
6. Referent_Type is used to distinguish personal reflexives (which include
   'se' in all its functions) from the possesive reflexives ('svoj').

4.2 Combinations

*** **** **** **** **** **** **** **** ---- --- --- ================
PoS Type Pers Gend Numb Case PosN PosG Clit Ref Syn Examples
*** **** **** **** **** **** **** **** ---- --- --- ================
 P    x    -    -    -  [ga]   -    -    y   p   n   se
 P    x    -    -    -  [ga]   -    -    n   p   n   sebe
 P    x    -    -    -   d     -    -    y   p   n   si
 P    x    -    -    -  [dl]   -    -    n   p   n   sebi
 P    x    -    m    s    a    s    -    n   s   a   svoj
 P    p    1    -    s    n    -    -    n   -   n   jaz
 P    s    1    m    s  [na]   s    -    -   -   a   moj
 P    s    2    m    s  [na]   s    -    -   -   a   tvoj
 P    s    3    m    s  [na]   s  [mn]   -   -   a   njegov
 P    d    -    m    s    n    -    -    -   -   a   1)
 P    i    -    -    s    n    -    -    -   -   n   2)
 P    i    -    m    s    n    -    -    -   -   a   3)
 P    q    -    m    s    n    -    -    -   -   n   4)
 P    q    -    m    s    n    -    -    -   -   a   5)
 P    q    -    -    -    -    -    -    -   -   a   koliko
 P    r    -    m    s    n    -    -    -   -   n   6)
 P    r    -    m    s    n    -    -    -   -   a   7)
 P    z    -    -    s    n    -    -    -   -   n   nihc<e, nobeden
 P    z    -    m    s    n    -    -    -   -   a   nikakrs<en, noben
 P    g    -    -    s    n    -    -    -   -   n   vsakdo
 P    g    -    m    s    n    -    -    -   -   a   vsak
**** **** **** **** **** **** **** **** --- --- --- ================

1) tak, taks<en, takle, ta, tisti, oni, toliki
2) nekdo, kdo, marsikdo, eden, kateri, vsak, marsikateri
3) kak(r)s<en,  enak, drugac<en, neki, kateri, marsikateri,
   isti, drug, nekoliko, kateri, nic<(?), vsi, dokaj(?)
4) kdo, kateri
5) c<igav, kaks<en, koliks<en, kateri
6) kdor, kar, kateri (...koli)
7) kakrs<en, kateri, c<igar, kolikor (...koli)

5. Determiner (D)

Not applicable.

6. Article (T)

Not applicable.


This category (Slovene: 'c<len') exists for Slovene, but it
encompasses only two words ('ta'/this, 'en'/one) and even these are
used only colloquially.  Furthermore, the context in which these two
words can appear as articles is identical to the one in which they
appear as a pronoun or numeral respectively. As it would be impossible
to disambiguate between their two meanings, this category will not be
used for Slovene.

7. Adverb (R)

7.1 Lexicon

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type           general        g potihoma          prislov                  
- -------------- -------------- - ----------------- -------------------------
2 Degree         positive       p malo              osnovnik                 
                 comparative    c manj              primernik                
                 superlative    s najmanj           presez<nik               
********************************* ----------------- -------------------------
3 Clitic                        -
- -------------- -------------- -
4 Number                        -
- -------------- -------------- -
5 Person                        -


Some pronouns can also have an adverbial function. This issue has not
been considered here.

7.2 Combinations

*** **** **** ==============================================================
PoS Type Degr Examples
*** **** **** ==============================================================
 R    g    p  malo
 R    g    c  manj
 R    g    s  najmanj
 R    g    -  absolutno, anglesko, potihoma, ...
*** **** **** ==============================================================

8. Adposition (S)

8.1 Lexicon

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type           preposition    p nad               predlog                  
- -------------- -------------- - ----------------- -------------------------
2 Formation      simple         s nad, pod          (enostaven) predlog
                 compound       c zase, nanj        predlog s pripon. zaimkom
********************************* ----------------- -------------------------
3 Case                          
  ( prep.) genitive       g brez              rodilnik                 
                 dative         d k                 dajalnik                 
                 accusative     a po                toz<ilnik                
                 locative       l pri               mestnik                  
                 instrumental   i s                 orodnik                  
- -------------- -------------- - ----------------- -------------------------
4 Clitic                        -


Slovene has prepositions only, with some arguable exceptions
(e.g. 'navkljub'/in spite of), which can be pre- or postpositions.
These would need further study.

8.2 Combinations

*** **** **** ---- =========================================================
PoS Type Form  Case Examples
*** **** **** ---- =========================================================
 S    p    s    g   brez
 S    p    s    d   k
 S    p    s   [al] po
 S    p    s    l   pri
 S    p    s   [gi] s
 S    pc   c    -   zase, nase, nanj, nanjo, nanju,...
*** **** **** ---- =========================================================

9. Conjunction (C)

9.1 Lexicon

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type           coordinating   c in                priredni veznik          
                 subordinating  s da                podredni veznik          
********************************* ----------------- -------------------------
2 Formation      simple         s in, da, ...       enodelni, dvodelni 
                 compound       c kljub_temu_da     vec<besedni 
- -------------- -------------- - ----------------- -------------------------
3 Coord_Type                    -
- -------------- -------------- -
4 Sub_Type                      -
- -------------- -------------- -
5 Clitic                        -
- -------------- -------------- -
6 Number                        -
- -------------- -------------- -
7 Person                        -


1. According to Slov. grammar, some conjunctions are 'two-part', but these 
   can often be either single or two-part conjunctions with identical first 
   and second conjunct, e.g. ali ... ali; such conunctions are treated
   as ordinary, 'one-part' conjunctions, i.e. as Formation=simple.

2. Conjunctions are also classified into 'multi-word' conjunctions,
   e.g. 'kljub temu da'/in spite of. These conjuctions should be merged by
   the tokeniser e.g.: kljub_temu_da and marked as Formation=compound.

9.2 Combinations

*** **** ---- ===============================================================
PoS Type Form Examples
*** **** ---- ===============================================================
 C    c    s   in, ali
 C    s    s   da, ki
 C    s    c   kljub temu da
*** **** ====================================================================

10. Numeral (M)

10.1 Lexicon

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type           cardinal       c en                glavni s<tevnik          
                 ordinal        o prvi              vrstilni s<tevnik        
                 multiple       m enojen            mnoz<ilni s<tevnik       
                 special        s dvoje, ...        ostali s<tevniki         
- -------------- -------------- - ----------------- -------------------------
2 Gender         masculine      m en                mos<ki spol              
                 feminine       f ena               z<enski spol             
                 neuter         n eno               srednji spol             
- -------------- -------------- - ----------------- -------------------------
3 Number         singular       s en                ednina                   
                 plural         p tri               mnoz<ina                 
                 dual           d dva               dvojina                  
- -------------- -------------- - ----------------- -------------------------
4 Case           nominative     n en                imenovalnik              
                 genitive       g enega             rodilnik                 
                 dative         d enemu             dajalnik                 
                 accusative     a en / enega        toz<ilnik                
                 locative       l enemu             mestnik                  
                 instrumental   i enim              orodnik                  
********************************* ----------------- -------------------------
5 Form           digit          d 1984              arabska s<tevilka
                 roman          r MCMXXCIV          rimska stevilka
                 letter         l tisoc<devetsto    s<tevnik
- -------------- -------------- - ----------------- -------------------------
6 Definiteness                  -
- -------------- -------------- -
7 Clitic                        -
- -------------- -------------- -
8 Class                         -
- -------------- -------------- -
9 Animate                       -
- -------------- -------------- - 
10Owner_Number                  -
- -------------- -------------- -
11Owner_Person                  -
- -------------- -------------- -
12Owned_Number                  -
= ============== ============== =


Numerals in Slovene can function as nouns, adjectives or adverbs, and
are in grammars described as subtypes of these categories. Therefore
the above classification runs counter to the established practice.

10.2 Combinations

*** **** **** **** **** ---- ===============================================
PoS Type Gend Numb Case Form Examples
*** **** **** **** **** ---- ===============================================
 M    -    -    -    -    d  1984
 M    -    -    -    -    r  MCMXXCIV
 M    c    m    s    n    l  en, dva, trije, s<tirje, pet
 M    o    m    s    n    l  prvi, drugi, tretji, c<etrti, peti
 M    m    m    s    n    l  enojen, dvojen, trojen, c<etveren, peter
 M    s    m    s    n    l  dvoje, troje, dvoj
 M    c  [mfn][spd]  n    l  en, ena, eno, trije, tri, troje, dva, dve
*** **** **** **** **** ====================================================

11. Interjection (I)

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type                          - jejhata           medmet                   
- -------------- -------------- - ----------------- -------------------------
2 Formation                     -

12. Residual (X)

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
P ATT                           - sic, $, a+b, ...  Ostanki

13. Abbreviation (Y)

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Syntactic_Type                - TAM               okrajs<ava               
- -------------- -------------- - ------------------- ------------------------
2 Gender                        -
- -------------- -------------- -
3 Number                        -
- -------------- -------------- -
4 Case                          -
- -------------- -------------- -
5 Definiteness                  -


This category usually functions as a proper noun or adjective,
therefore gender, number and case could be assigned to it. However,
cases where abbreviations are declined (i.e. mark their inflection
with an ending, e.g. 'TAM-a') are rare, so including this information
would only greatly increase the ambiguity of abbreviations.

14. Particle (Q)

= ============== ============== = ================= =========================
P ATT            VAL            C Example           Slovene term             
= ============== ============== = ================= =========================
1 Type                          - spet              c<lenek                  
- -------------- -------------- - ----------------- -------------------------
2 Formation                     -
- -------------- -------------- -
3 Clitic                        -

Tomaz Erjavec
Wed Oct 16 12:08:36 MDT 1996