The IJS-ELAN corpus contains 1 million words from 15 parallel Slovene-English / English-Slovene texts. The composition, annotation, encoding and availability of the corpus are meant to facilitate developments of language technology and studies in bilingual terminology extraction, primarily for the Slovene language.
The corpus is freely available for downloading, but please acknowledge in any publications making use of the corpus by citing the relevant papers describing the version of the corpus you have used.
You can also use the on-line concordancer on the IJS-ELAN corpus.
Two versions of the corpus are available:
The IJS-ELAN corpus was produced at the Dept. of Intelligent Systems, Institute Jozef Stefan. Thanks go to Spela Vintar, Roman Maurer and Andrej Skubic for acquiring and alinging portions of the corpus and to the company Amebis, d.o.o. for lexically annotating the corpus. The compilation of the corpus was partially financed by subcontract to the EU MLIS 121 project ELAN, by subcontract to the Copernicus Joint Project CONCEDE, and by the grant MZT L2-0461-0106 from the Ministry of Science and Technology of Slovenia.