The JANES Project

JANES – Jezikoslovna analiza nestandardne slovenščine (Linguistic Analysis of Nonstandard Slovene, with the official title Resources, Tools and Methods for the Research of Nonstandard Internet Slovene, J6―6842) was a national research project funded by the Slovenian Research Agency between 1 July 2014 and 30 January 2018. The project team, led by Darja Fišer, consisted of 8 researchers from the Faculty of Arts (University of Ljubljana) and the Jožef Stefan Institute.

The project built a large corpus of internet Slovene containing tweets, forum posts, blog texts, and comments on news articles and on Wikipedia pages and users. The corpus served as the basis for linguistic analyses of nonstandard Slovene. It is also helped to improve language-technology tools for processing texts written in nonstandard Slovene, by manually annotating datasets and training tools on these datasets. Finally, the project also compiled a glossary of internet Slovene.

The developed resources were made openly available under Creative Commons Licenses in order to facilitate the analysis of modern Slovene and enable further development of more robust language-technology applications for Slovene, which are necessary for the full functionality of Slovene in the digital age. The methods developed in the project are also applicable to related languages.

The project was divided in three workpackages:

  1. Compiling a corpus of internet Slovene
  2. Corpus analysis of internet Slovene
  3. Development of tools and resources for processing internet Slovene