DSI

TEI Header

§file description
§title statement
§title

type = main
DSI-ana Linguistically annotated corpus of Informatics 2003-2019
§statement of responsibility
§name Katarina Puc
§responsibility

Acquisition of source texts.
§statement of responsibility
§name Tomaž Erjavec
§responsibility

Conversion to TEI P5, linguistic annotation.
§edition statement
§edition V5.0
§extent
§measure

unit = texts
1,776 articles
§measure

unit = words
4,335,534 words
§measure

unit = tokens
5,245,073 tokens
§publication statement
§distributor CLARIN.SI
§publication place http://hdl.handle.net/11356/1239
§date 2019-07-25
§availability

§anonymous block

CLARIN.SI Academic End-User Licence Agreement ACA ID-BY-NC-INF-NORED v1.0
§source description

Proceedings of the conferences "Days of Slovene Informatics" (2003-2019)

Proceedings of the conference "Informatics in Public Administration" (2015-2018)

Journal "Applied Informatics" (2010-2019)

§encoding description
§project description

The corpus was compiled as a development aid for on-line terminological dictionary of informatics iSlovar.

Zasnova korpusa je opisana v: Špela Vintar Tomaž Erjavec iKorpus in luščenje izrazja za Islovar Zbornik Šeste konference Jezikovne tehnologije, IJS, Ljubljana 2008. str. 65-69

§tagging declaration
§namespace

name = http://www.tei-c.org/ns/1.0
§tag usage

gi = text occurs = 30
text
§tag usage

gi = body occurs = 30
text body
§tag usage

gi = div occurs = 1776
text division
§tag usage

gi = head occurs = 1776
heading
§tag usage

gi = docAuthor occurs = 3005
document author
§tag usage

gi = forename occurs = 3005
forename
§tag usage

gi = surname occurs = 3005
surname
§tag usage

gi = p occurs = 125550
paragraph
§tag usage

gi = s occurs = 266940
s-unit
§tag usage

gi = w occurs = 4444265
word
§tag usage

gi = pc occurs = 800808
punctuation character
§tag usage

gi = c occurs = 4321684
character
§listPrefixDef
§prefixDef

ident = mte

Private URIs with this prefix point to feature-structure elements defining the Slovene MULTEXT-East V6 MSDs.

§application information

MSD tagging and lemmatisation with ReLDI Tagger trained for Slovene, available from https://github.com/clarinsi/reldi-tagger.

§application

ident = reldi-tagger
§label ReLDI tagger
§revision description
§change

when = 2019-07-25
Tomaž Erjavec<name>: Dodani letniki 2017-2019, dodani metapodatki o prispevkih, na novo pretvorjen in označen celoten korpus.
§change

when = 2017-03-22
Tomaž Erjavec<name>: Dodan letnik 2016 in na novo pretvorjen in označen celoten korpus.
§change

when = 2016-02-24
Tomaž Erjavec<name>: Dodan letnik 2015
§change

when = 2015-03-15
Tomaž Erjavec<name>: Dodan letnik 2014
§change

when = 2013-10-15
Tomaž Erjavec<name>: Dodan letnik 2013
§change

when = 2012-12-23
Tomaž Erjavec<name>: Spremenjena struktura, dodan letnik 2012


Datum: 2019-09-06

Avtorske pravice za besedilo te izdaje določa licenca Creative Commons Priznanje avtorstva 4.0.