imp25k

Dept. of Knowledge Technologies, JSI

TEI Header

§file description
§title statement
§title

imp25k lexicon of historical Slovene
§principal researcher
§name Tomaž Erjavec
§address

Department of Knowledge Technologies

Jožef Stefan Institute

Jamova cesta 39

SI-1000 Ljubljana

Slovenia

§statement of responsibility
§name Maja Žorga Dulmin
§responsibility

Linguistic annotation leader.
§statement of responsibility
§name Darja Fišer
§responsibility

Linguistic annotation, preparation of annotator materials.
§statement of responsibility
§name Tina Benčina
§name Katja Cingerle
§name Metod Čepar (ZRC SAZU)
§name Alenka Jelovšek (ZRC SAZU)
§name Urška Kamenšek
§name Nina Mikulin
§name Zala Šmid
§responsibility

Linguistic annotatiton.
§edition statement
§edition 1.1
§extent 28,000 headwords<term>
§publication statement
§distributor
§address

Department of Knowledge Technologies

Jožef Stefan Institute

Jamova cesta 39

SI-1000 Ljubljana

Slovenia

§publication place http://nl.ijs.si/imp/
§availability

This work is licenced under the Creative Commons Attribution 4.0 licence. You should give the original authors of the digital resource credit. In scientific publications this means citing the relevant publication or publications describing the work on this digital resource. The bibliography is available from the page http://nl.ijs.si/imp/.

§date 2014-09-13
§source description

This lexicon was automatically derived from two corpora of historical Slovene, in particular the fully manualy annotated goo300k and selected words from the larger foo3M corpus.

§encoding description
§project description

EU project IMPACT: ‘Improving Access to Text’ (2010–2012).

National and University Library and Jožef Stefan Institute: Production of Ground Truth Dataset of Historical Slovene (for "NUK" and "FPGN" sigla)

§project description

Google research award ‘Language Models for Historical Slovene’ (2011–2012).

§editorial practice declaration
§standard values

The two-letter language codes follow ISO 639 and are defined in the language usage element. An exception is the code "sl-bohoric" designating Slovene written in the Bohorič alphabet.

Coarse-grained morphosyntactic descriptions follow the IMP morphosyntactic specification, c.f. http://nl.ijs.si/imp/msd/

§tagging declaration
§namespace

name = http://www.tei-c.org/ns/1.0
§tag usage

gi = text occurs = 1
text
§tag usage

gi = body occurs = 1
text body
§tag usage

gi = entry occurs = 28034
entry
§tag usage

gi = form occurs = 168452
form information group
§tag usage

gi = orth occurs = 168452
orthographic form
§tag usage

gi = lbl occurs = 266407
label
§tag usage

gi = gramGrp occurs = 31137
grammatical information group
§tag usage

gi = pos occurs = 61250
part of speech
§tag usage

gi = gram occurs = 170053
grammatical information
§tag usage

gi = gloss occurs = 2700
gloss
§tag usage

gi = cit occurs = 210775
cited quotation
§tag usage

gi = quote occurs = 210775
quotation
§tag usage

gi = oVar occurs = 212705
orthographic-variant reference
§tag usage

gi = milestone occurs = 125072
milestone
§tag usage

gi = bibl occurs = 213253
bibliographic citation
§tag usage

gi = author occurs = 147104
author
§tag usage

gi = title occurs = 210775
title
§tag usage

gi = date occurs = 210775
date
§feature system declaration
§feature library

§feature

name = CATEGORY id = N0-en corresponds to = S0-sl
symbolic value

value = Noun
§feature

name = Type id = N1.c-en corresponds to = S1.o-sl
symbolic value

value = common
§feature

name = Type id = N1.p-en corresponds to = S1.l-sl
symbolic value

value = proper
§feature

name = Gender id = N2.m-en corresponds to = S2.m-sl
symbolic value

value = masculine
§feature

name = Gender id = N2.f-en corresponds to = S2.z-sl
symbolic value

value = feminine
§feature

name = Gender id = N2.n-en corresponds to = S2.s-sl
symbolic value

value = neuter
§feature

name = CATEGORY id = V0-en corresponds to = G0-sl
symbolic value

value = Verb
§feature

name = Type id = V1.m-en corresponds to = G1.g-sl
symbolic value

value = main
§feature

name = Type id = V1.a-en corresponds to = G1.p-sl
symbolic value

value = auxiliary
§feature

name = Aspect id = V2.e-en corresponds to = G2.d-sl
symbolic value

value = perfective
§feature

name = Aspect id = V2.p-en corresponds to = G2.n-sl
symbolic value

value = progressive
§feature

name = Aspect id = V2.b-en corresponds to = G2.v-sl
symbolic value

value = biaspectual
§feature

name = CATEGORY id = A0-en corresponds to = P0-sl
symbolic value

value = Adjective
§feature

name = Type id = A1.g-en corresponds to = P1.p-sl
symbolic value

value = general
§feature

name = Type id = A1.s-en corresponds to = P1.s-sl
symbolic value

value = possessive
§feature

name = Type id = A1.p-en corresponds to = P1.d-sl
symbolic value

value = participle
§feature

name = Degree id = A2.p-en corresponds to = P2.n-sl
symbolic value

value = positive
§feature

name = Degree id = A2.c-en corresponds to = P2.p-sl
symbolic value

value = comparative
§feature

name = Degree id = A2.s-en corresponds to = P2.s-sl
symbolic value

value = superlative
§feature

name = CATEGORY id = R0-en corresponds to = R0-sl
symbolic value

value = Adverb
§feature

name = Type id = R1.g-en corresponds to = R1.s-sl
symbolic value

value = general
§feature

name = Type id = R1.r-en corresponds to = R1.d-sl
symbolic value

value = participle
§feature

name = Degree id = R2.p-en corresponds to = R2.n-sl
symbolic value

value = positive
§feature

name = Degree id = R2.c-en corresponds to = R2.r-sl
symbolic value

value = comparative
§feature

name = Degree id = R2.s-en corresponds to = R2.s-sl
symbolic value

value = superlative
§feature

name = CATEGORY id = P0-en corresponds to = Z0-sl
symbolic value

value = Pronoun
§feature

name = CATEGORY id = M0-en corresponds to = K0-sl
symbolic value

value = Numeral
§feature

name = Form id = M1.d-en corresponds to = K1.a-sl
symbolic value

value = digit
§feature

name = Form id = M1.r-en corresponds to = K1.r-sl
symbolic value

value = roman
§feature

name = Form id = M1.l-en corresponds to = K1.b-sl
symbolic value

value = letter
§feature

name = CATEGORY id = S0-en corresponds to = D0-sl
symbolic value

value = Preposition
§feature

name = CATEGORY id = C0-en corresponds to = V0-sl
symbolic value

value = Conjunction
§feature

name = CATEGORY id = Q0-en corresponds to = L0-sl
symbolic value

value = Particle
§feature

name = CATEGORY id = I0-en corresponds to = M0-sl
symbolic value

value = Interjection
§feature

name = CATEGORY id = Y0-en corresponds to = O0-sl
symbolic value

value = Abbreviation
§feature

name = CATEGORY id = X0-en corresponds to = N0-sl
symbolic value

value = Residual
§feature

name = Type id = X1.f-en corresponds to = N1.j-sl
symbolic value

value = foreign
§feature

name = Type id = X1.t-en corresponds to = N1.t-sl
symbolic value

value = typo
§feature

name = Type id = X1.p-en corresponds to = N1.p-sl
symbolic value

value = program
§feature-value library

§feature structure

id = Ncm corresponds to = Som
CATEGORY = Noun, Type = common, Gender = masculine
§feature structure

id = Ncf corresponds to = Soz
CATEGORY = Noun, Type = common, Gender = feminine
§feature structure

id = Ncn corresponds to = Sos
CATEGORY = Noun, Type = common, Gender = neuter
§feature structure

id = Npm corresponds to = Slm
CATEGORY = Noun, Type = proper, Gender = masculine
§feature structure

id = Npf corresponds to = Slz
CATEGORY = Noun, Type = proper, Gender = feminine
§feature structure

id = Npn corresponds to = Sls
CATEGORY = Noun, Type = proper, Gender = neuter
§feature structure

id = Va corresponds to = Gp
CATEGORY = Verb, Type = auxiliary
§feature structure

id = Vme corresponds to = Ggd
CATEGORY = Verb, Type = main, Aspect = perfective
§feature structure

id = Vmp corresponds to = Ggn
CATEGORY = Verb, Type = main, Aspect = progressive
§feature structure

id = Vmb corresponds to = Ggv
CATEGORY = Verb, Type = main, Aspect = biaspectual
§feature structure

id = Agp corresponds to = Ppn
CATEGORY = Adjective, Type = general, Degree = positive
§feature structure

id = Agc corresponds to = Ppp
CATEGORY = Adjective, Type = general, Degree = comparative
§feature structure

id = Ags corresponds to = Pps
CATEGORY = Adjective, Type = general, Degree = superlative
§feature structure

id = App corresponds to = Pdn
CATEGORY = Adjective, Type = participle, Degree = positive
§feature structure

id = Asp corresponds to = Psn
CATEGORY = Adjective, Type = possessive, Degree = positive
§feature structure

id = Rgp corresponds to = Rsn
CATEGORY = Adverb, Type = general, Degree = positive
§feature structure

id = Rgc corresponds to = Rsr
CATEGORY = Adverb, Type = general, Degree = comparative
§feature structure

id = Rgs corresponds to = Rss
CATEGORY = Adverb, Type = general, Degree = superlative
§feature structure

id = Rr corresponds to = Rd
CATEGORY = Adverb, Type = participle
§feature structure

id = P corresponds to = Z
CATEGORY = Pronoun
§feature structure

id = Md corresponds to = Ka
CATEGORY = Numeral, Form = digit
§feature structure

id = Mr corresponds to = Kr
CATEGORY = Numeral, Form = roman
§feature structure

id = Ml corresponds to = Kb
CATEGORY = Numeral, Form = letter
§feature structure

id = S corresponds to = D
CATEGORY = Preposition
§feature structure

id = C corresponds to = V
CATEGORY = Conjunction
§feature structure

id = Q corresponds to = L
CATEGORY = Particle
§feature structure

id = I corresponds to = M
CATEGORY = Interjection
§feature structure

id = Y corresponds to = O
CATEGORY = Abbreviation
§feature structure

id = X corresponds to = N
CATEGORY = Residual
§feature structure

id = Xf corresponds to = Nj
CATEGORY = Residual, Type = foreign
§feature structure

id = Xt corresponds to = Nt
CATEGORY = Residual, Type = typo
§feature structure

id = Xp corresponds to = Np
CATEGORY = Residual, Type = program
§text-profile description
§language usage
§language

ident = sl
§term

Slovene
§language

ident = sl-bohoric
§term

Slovene written using the Bohorič alphabet
§language

ident = sl-dajnko
§term

Slovene in Dajnko alphabet
§language

ident = sl-metelko
§term

Slovene in Metelko alphabet
§language

ident = de
§term

German
§language

ident = la
§term

Latin
§language

ident = en
§term

English
§revision description
§change Tomaž Erjavec<name>: Generation of lexicon from corpus.
§date 2015-05-25
§change Tomaž Erjavec<name>: Fixed error in occurrence counts, a few corrections in the corpus.
§date 2014-09-13
§change Tomaž Erjavec<name>: Generation of lexicon from corpus for V1.0.
§date 2014-01-09


Datum: 2015-05-25

Avtorske pravice za besedilo te izdaje določa licenca Creative Commons Priznanje avtorstva 3.0.