The JANES Project
JANES – Jezikoslovna analiza nestandardne slovenščine (Linguistic Analysis of Nonstandard Slovene, with the official title Resources, Tools and Methods for the Research of Nonstandard Internet Slovene (J6―6842)) is a national research project funded by the Slovenian Research Agency between 1 July 2014 and 30 June 2017. The project team, led by Darja Fišer, consists of 8 researchers from the Faculty of Arts (University of Ljubljana) and the Jožef Stefan Institute.
The project is building a large corpus of internet Slovene containing tweets, forum posts, blog texts, and comments on news articles and on Wikipedia pages and users. The corpus serves as the basis for linguistic analyses of nonstandard Slovene. It is also helping improve language-technology tools for processing texts written in nonstandard Slovene. The project also aims to build a glossary of internet Slovene.
The developed resources will be made openly available under Creative Commons License 4.0. They will facilitate the analysis of modern Slovene and enable the development of more robust language-technology applications for Slovene, which are necessary for the full functionality of Slovene in the digital age. The methods developed in the project are also be applicable to related languages.
The project is divided in three workpackages:
- Compiling a corpus of internet Slovene
- Corpus analysis of internet Slovene
- Development of tools and resources for processing internet Slovene