Natural Language Server

This server hosts language resources and services, primarily for Slovene but also for other languages. The resources have been produced at JSI or in projects where JSI was a partner. The resources are typically stored in XML followng the Text Encoding Initiative Guidelines and made as freely avaiable as possible: the resources (such as corpora) can often be downloaded under a Creative Commons licence.

This server is operational since 1994 and was among the first http servers in Slovenia.

Dept. of Knowledge Technologies, Jožef Stefan Institute
Jožef Stefan Institute

Digital humanities

  • CLARIN.SI: Slovenian language research infrastructure
  • KonText and noSketch Engine: concordances over many corpora
  • Janes: Slovene user-generated content
  • IMP: Digital library, corpus and lexicon of historical Slovene
  • eZB, scholarly digital editions:
    • eZISS: digital critical editions of Slovenian literature
    • NRSS: 17 and 18 century unknown manuscripts
    • eZMono: Digital monographs
  • jaSlo: Japanese - Slovene on-line learner's dictionary

Language technologies

  • CLARIN.SI repository: downloadable language resources (datasets)
  • CLARIN.SI @ GitHub: open source language annotation tools (software)
  • SDJT: the Slovenian Language Technologies Society
  • MULTEXT-East: Multilingual corpora, lexica and morphosyntactic specifications
  • JOS: Manually annotated Slovene corpora and tagset specifications
  • ToTaLe: Lemmatising and PoS tagging Slovene texts
  • sloWNet, the Slovene WordNet:
  • SDT: Slovene dependency treebank


Tomaž Erjavec, JSI, Dept. of Knowledge Technologies
tomaz.erjavec at