TEI Header

§file description
§title statement
§title
JRC DGT Translation Memory: sl,en,de,fr,it
§statement of responsibility
§name Tomaž Erjavec, IJS
§responsibility
Conversion to TEI, word-level linguistic annotation
§edition statement
§edition V0.4
§extent 165 million words<term>
§publication statement
§availability

Korpus je dostopen pod enakimi pogoji kot izvorna baza DGT-TM.

§source description http://langtech.jrc.ec.europa.eu/
§bibliographic citation
§title JRC DGT-TM: Translation Memory in 22 languages
§author Directorate-General for Translation
§publisher JRC
§date 2004-2011
§bibliographic citation Steinberger Ralf, Andreas Eisele, Szymon Klocek, Spyridon Pilos, Patrick Schlüter<author> (2012<date>). DGT-TM: A freely Available Translation Memory in 22 Languages.<title> Proceedings of the 8th international conference on Language Resources and Evaluation (LREC'2012), Istanbul, 21-27 May 2012.
§encoding description
§project description

For the purpuses of this corpus the Slovene, English, German, French and Italian parts were extracted from the JRC DGT TM and the languages were automatically linguistically annotated on the word level with lemma and PoS tag.

§editorial practice declaration
§interpretation

The text has been automatically tokenised, part-of-speech tagged and lemmatised. For Slovene, the ToTrTaLe tool was used, while the other langauges were processed with TreeTagger. Two tags are given for each word. For Slovene, @ctag gives the reduced SPOOK tag, while @ana gives the complete JOS morphosyntactic tag. For the other languages, @ctag gives the TreeTagger PoS tag, while @ana gives its mapping to its equivalent SPOOK tag.

§text-profile description
§text classification
§keywords
scheme = local
§term
legislation/law
§language usage
§language
ident = sl
§term
Slovene
§language
ident = de
§term
German
§language
ident = en
§term
English
§language
ident = fr
§language
ident = it
§term
Italian
§revision description
§change Tomaž Erjavec<name>: Added texts from 2011 and de,fr,it.
§date 2013-01-14
§change Tomaž Erjavec<name>: First version of en+sl corpus, corpus header.
§date 2012-11-04