VAYNA

TEI Header

§file description
§title statement
§title

type = main
Corpus "Attacks on the Yugoslav National Army" (1989)
§title

type = sub
Linguistically annotated version
§statement of responsibility
§personal name Igor Žagar
§personal name Peter Tancig
§responsibility

Project leads, sample definition
§statement of responsibility
§personal name Tomaž Erjavec
§responsibility

Conversion of corpus to TEI, linguistic processing
§edition statement
§edition 1.1
§extent
§measure

unit = text
360 text
§publication statement www.clarin.si

This work is licensed under the Creative Commons Attribution 4.0 International License.

§publisher
§organization name

Slovenian research infrastructure CLARIN.SI
§identifying number

type = handle
http://hdl.handle.net/11356/1237
§publication place http://hdl.handle.net/11356/1237
§availability

§licence http://creativecommons.org/licenses/by/4.0/
§date

when = 2019-06-01
§source description

The corpus and its analysis is described in: Tancig, Peter, Žagar, Igor: Računalniško podprta analiza velikih tekstualnih baz podatkov: Primer napadov na JNA Zbornik V. kongresa Zveze društev za uporabno jezikoslovje Jugoslavije, Ljubljana 1989. Str. 51-56. URN:NBN:SI:doc-XGCMAHI4

§encoding description
§project description

The corpus was made to empirically verify the claims that Slovene media are attacking the Yugoslav National Army.

§sampling declaration
§editorial practice declaration

The corpus was typed-in by students on the basis of the original articles. The corpus therefore contains typos and omissons. The typed-in texts were formatted for the needs of the OKUS concordaner. This format was then semi-automatically normalised and converted to TEI. In this step found typos were corrected, hyphenated words were merged and the hyphen used as a sentence punctuation was converted to '—'.

§tagging declaration
§namespace

name = http://www.tei-c.org/ns/1.0
§tag usage

gi = text occurs = 1
text
§tag usage

gi = body occurs = 1
text body
§tag usage

gi = author occurs = 251
author
§tag usage

gi = div occurs = 360
text division
§tag usage

gi = bibl occurs = 355
bibliographic citation
§tag usage

gi = title occurs = 355
title
§tag usage

gi = publisher occurs = 350
publisher
§tag usage

gi = date occurs = 345
date
§tag usage

gi = p occurs = 3899
paragraph
§tag usage

gi = s occurs = 11460
s-unit
§tag usage

gi = name occurs = 10123
name
§tag usage

gi = c occurs = 256777
character
§tag usage

gi = w occurs = 259501
word
§tag usage

gi = pc occurs = 41165
punctuation character
§listPrefixDef
§prefixDef

ident = mte

Private URIs with this prefix point to feature-structure elements defining the Slovene MULTEXT-East Version 6 MSDs.

§application information

MSD tagging and lemmatisation with ReLDI Tagger trained for Slovene, available from https://github.com/clarinsi/reldi-tagger.

Named entity recognition done with the Janes NER program, trained for Slovene and available at https://github.com/clarinsi/janes-ner.

§application

ident = reldi-tagger
§label ReLDI tagger
§application

ident = janes-ner
§label NER system for South Slavic languages
§classification declarations
§taxonomy
§description

Text types
§category

id = report
description

Report
§category

id = comment
description

Comment
§category

id = letter
description

Letter
§text-profile description
§language usage
§language

ident = sl
Slovenian
§language

ident = en
English


Datum: 2019-09-29

Avtorske pravice za besedilo te izdaje določa licenca Creative Commons Priznanje avtorstva 4.0.