The department is involved in various areas of computational
linguistics, natural language processing and Human Language
Technologies, which focus, to a large extent, on the
Slovene
language. Areas of expertise include standards for text encoding,
linguistic annotation of textual data, development and processing of
mono- and multilingual language corpora, machine learning of language
structure, text mining, information retrieval and extraction,
terminology extraction, computer-aided translation, computational
lexicography and production of complex digital editions.
We activelly promote the development of HLT for the Slovene
language; we are among the founding members of the Slovenian Language
Technologies Society, which organises bi-annual conferences, while
the language resources we produce are encoded
according to international standards (in particular, TEI) and freely downloadable for
research use.
Related areas of research at the department are Text and Web
Mining and Learning Language in
Logic.
The department has been involved in numerous projects that deal with
the compilation of language resources, mainly for Slovene in an
multilingual setting. Whenever possible, we make the results publicly
available.
- Downloadable corpora for HLT research:
- MULTEXT-East Version 3:
East and Central European multilingual corpus and lexical resources
- IJS-ELAN Version 2:
Slovene-English parallel corpus, 1 million words
- SVEZ-IJS Version 1:
Slovene-English parallel corpus of EU legal texts, 10 million words
- SDT, the Slovene Dependency Treebank:
Slovene syntactically annotated corpus, 30.000 words
- Web services:
- Digital publishing:
- Dictionaries:
The main people at the department that are involved in various areas of HLT
are
Slovene projects:
EU projects:
- PASCAL Pattern Analysis, Statistical Modelling and Computational Learning (Network of Excellence)
- SEKT - Semantically-Enabled Knowledge Technologies
- ALVIS - Superpeer Semantic Search Engine
Bilateral projects:
- Slovenia - Serbia and Montenegro (2004-2005)
The development of language
resources for Slovene-Serbian machine translation
- Slovenia - Macedonia (2005-2006)
Gathering, Annotation and Analysis of Macedonian/Slovenian Language Resources
Old projects
Slovene projects:
EU projects:
- CONCEDE:
Consortium for Central European Dictionary Encoding
- ILP2:
Inductive Logic Programming II
- ELAN
Europan Language Activity Network
- MULTEXT-EAST:
Multilingual Text & Corpora for Eastern and Central European
Languages
- TELRI:
Trans-European Language Resources Infrastructure
- The Fourth Conference on
Language Technologies
October 13-14, 2004, Jožef Stefan Institute, Ljubljana.
(on-line proceedings)
- The Third Conference on
Language Technologies
October 14-15, 2002, Jožef Stefan Institute, Ljubljana.
(on-line proceedings)
- The Second Conference on
Language Technologies
October 17-18 2000, Cankarjev dom, Ljubljana.
(on-line proceedings)
- The First Conference on
Language Technologies for the Slovene Language
6-7 October 1998, Cankarjev Dom, Ljubljana.
(on-line proceedings)
- EAMT 2000:
European Association for Machine Translation Workshop
10-12 May 2000, Austrotel, Ljubljana.
- 5th TELRI Seminar:
Corpus Linguistics: How to Extract Meaning from Corpora
22-24 September 2000, Arts Faculty, Ljubljana.
- Machine Learning in Text Data Analysis,
a Workshop of the Sixteenth International Conference on Machine Learning
June 30 1999, Bled, Slovenia.
- Language
Technologies - Multilingual Aspects, a Workshop of the
32. annual meeting of the Societas Linguistica Europaea
July 8-11 1999, Arts Faculty, Ljubljana
- ICML'99:
16th Int. Conference on Machine Learning
30. June 1999, Bled
- ICML'99 Workshop
on Machine Learning in Text Data Analysis
- Learning
Language in Logic (LLL) Workshop
30th June 1999, Bled
Research into Natural Language Processing has been carried out at the
Institute Jozef Stefan since the 70's, at the Dept. for Computer
Systems E4. The head of the E4 Laboratory for Natural Language
was Dr. Peter Tancig. In 1995, the Lab was merged with the Artificial
Intelligence Laboratory into the
Dept. of Intelligent Systems, E8.
The members of the former lab were for a while known as the Language
and Speech (Technologies) Group and cooperated in the project
RR(S)J:
Computational Understanding of (the Slovene) Language.
But many of the students and
researchers left to lead different lives, while members of the ex-AI
lab became involved in various aspects of processing natural
language; the boundary between the 'natural language' and 'artificial
intelligence' members of the department thus became rather blurry.
Listed below are former members of and students at the NL Laboratory:
- Sandi Kodric, now in big business
- Gorazd Bozic and Matija Grabnar, now at
Arnes
- Andrej Bekes, now at the
Dept. for Asian and African studies,
University of Ljubljana
- Dusan Peterc, now at
Arahne
- Simon Weilguny now in Koper
- Jelena Meznaric, now in Paris
- Sanja Bezjak, now at Pliva
- Agata Saje, now at Krka
- Marijan Miletic, now at
Artinian
- Milan Stamenkovic, now in Stuttgart?
- Peter Tancig, now in the
Society of
Slovenian Researchers
In 2004 the Department split into two new departments:
E8, the
Dept. of Knowledge Technologies
and E9, the
Dept. of Intelligent Systems.
HLT activities continue in both departments, although this page only
documents the work at the
Dept. of Knowledge Technologies.
Page last updated 2006-05-18,
et