Tomaž Erjavec
I work at the Dept. of Knowledge Technologies
at the Jožef Stefan Institute
in Ljubljana, Slovenia.
New:

- language technologies for Slovene
- development of textual corpora and other language
resources
- standardisation of text encoding
- machine learning methods for natural language
- computational morphology
- complex digital editions
See also the IJS Natural Language
Server.
International
- EU FP7 INF 211983 MONDILEX (2007-2009)
- Conceptual Modelling of Networking of Centres for High-Quality
Research in Slavic Lexicography and Their Digital Resources
- SEE-ERA.NET ICT 10503 RP (2007-2008)
- Building Language Resources and Translation Models for Machine Translation focused on South Slavic and Balkan Languages
- EU CLARIN (2006--)
- Research infrastructure for language resources
- Bilateral Slovene-Serbian project (2004-2005)
- Development of Slovene and Serbian Language Resources for
Machine Translation
- GENIA
JSPS Research for the Future program (2002)
- Automatic extraction of information from biomedical texts
(local page)
- CONCEDE
Copernicus Joint Project (1998-2000):
- Consortium for Central European Dictionary Encoding
(local page)
- TELRI
Copernicus Concerted Action (1999-2001, 1995-1997):
- Trans-European Language Resources Infrastructure II
(local page)
- ELAN
MLIS EU Project (1998-1999):
- European Language Activity Network
- MULTEXT-EAST
Copernicus Joint Project COP 106 (1996-1997):
- Multilingual Texts and Corpora for Eastern and Central
European Languages
- LLL
Informal SIG
- Learning Language in Logic
(local page)
- ILD
UK SALT project (1994-1996)
- The Integrated Language Database
(RA, Centre for Cognitive Science, 1994)
National
- Knowledge Technologies (2004-2009)
Ministry of Sports, Science and Education Research Project
- Cover financing for IJS Department of Knowledge Technologies
Mini projects (work in progress):
- Ministry of Education, Science and Sport Applied Project (2004-2006)
-
Digital Critical Editions of Slovene Literature
- Slovene Cross-Ministry Targeted Research Projects: (2006-2008)
-
VoiceTRAN - a speech-to-speech communicator
- Slovene Cross-Ministry Targeted Research Projects: (2005-2006)
-
Oblikovanje slovenskega korpusnega omrežja
(Compilation of the Slovene Corpus Network)
-
Izdelava virov in sistema za simultano prevajanje slovenscina-anglescina
(Producing resources and system for simultaneous translation Slovene-English)
- Ministry of Information Society Project (2001)
- Localisation of Open Source Spell Checkers
ispell and aspell
- MZT L2-0461-0106 (1998-2001)
- Development of Digital Publishing with Distance Learning Support
(project leader)
- MZT T2-0409 (1998-2000)
- Speech Copora and Tools for the Slovenian Language
- FIDA (1996-1999)
- Corpus of the Slovene Language
(TEI/SGML consulting)
(local access)
- GNUsl (1995--)
- A GNU effort for the Slovene Language
(server maintenance, resource contribution)
(local access)
- RR(S)J
Slovene Ministry of Science & Technology funded project (1993-1996)
- Ra"cunalni"sko razumevanje (slovenskega) jezika
(Computational Understanding of (Slovene) Language)
(researcher)
|
Note: The preparation of the COBISS bibliography takes some time!
|
On-line publications in Slovene are available
here
and selected English publications with on-line versions,
are the following:
- Tomaž Erjavec:
TEI and Microsoft: a marriage made in....
In Digital Historical Corpora- Architecture, Annotation, and Retrieval.
Dagstuhl Seminar Proceedings 06491, 2007.
- Tomaž Erjavec, Sarossy Bence:
Morphosyntactic Tagging of Slovene Legal Language.
Informatica,
30, pp. 483-488, 2006.
-
Ralf Steinberger, Bruno Pouliquen, Anna Widiger, Camelia Ignat,
Tomaž Erjavec, Dan Tufis, Daniel Varga.
The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages.
In Proceedings of the Fifth International Conference on Language Resources and
Evaluation, LREC'06,
ELRA, Paris, 2006.
-
Tomaž Erjavec.
The English-Slovene ACQUIS corpus.
In Proceedings of the Fifth International Conference on Language Resources and
Evaluation, LREC'06,
ELRA, Paris, 2006.
-
Saso Džeroski, Tomaž Erjavec,
Nina Ledinek, Petr Pajas, Zdenek Źabokrtsky, Andreja Źele.
Towards a Slovene Dependency Treebank.
In Proceedings of the Fifth International Conference on Language Resources and
Evaluation, LREC'06,
ELRA, Paris, 2006.
-
Tomaž Erjavec, Darja Fišer.
Building Slovene Wordnet.
In Proceedings of the Fifth International Conference on Language Resources and
Evaluation, LREC'06,
ELRA, Paris, 2006.
-
Tomaž Erjavec, Camelia Ignat, Bruno Pouliquen, Ralf Steinberger.
Massive multi-lingual corpus
compilation: Acquis Communautaire and totale.
In Proceedings of the 2nd Language & Technology Conference, April 21-23, 2005, Poznan, Poland. 2005, pp. 32-36.
-
Jerneja Žganec Gros, France Mihelič, Tomaž Erjavec, Špela Vintar.
The VoiceTRAN speech-to-speech communicator.
In 8th International Conference, TDS 2005, Karlovy Vary, Czech Republic, September 12-15, 2005. Text, speech and dialogue : proceedings, (Lecture notes in computer science, Lecture notes in artificial intelligence, 3658). Berlin: Springer, 2005, pp. 379-384.
- Tomaž Erjavec, Matija Ogrin.
Digitalisation of literary heritage using open standards.
In Paul Cunningham, Miriam Cunningham (eds.).
Innovation and knowledge economy: issues, applications, case studies,
(Information and communication technologies and the
knowledge economy). Amsterdam [etc.]: IOS Press, 2005,
str. 999-1006.
- Tomaž Erjavec.
MULTEXT-East Version 3:
Multilingual Morphosyntactic Specifications, Lexicons and Corpora.
In: Proc. of the Fourth Intl. Conf. on
Language Resources and Evaluation,
LREC'04,
pp. 1535 - 1538,
ELRA, Paris, 2004.
[c.f. also http://nl.ijs.si/ME/V3/]
- Syd Bauman, Alejandro Bia,
Lou Burnard, Tomaž Erjavec, Christine Ruotolo and Susan Schreibman:
Migrating Language Resources from SGML to XML:
the Text Encoding Initiative Recommendations.
In: Proc. of the Fourth Intl. Conf. on
Language Resources and Evaluation,
LREC'04,
p. 139 - 142
ELRA, Paris, 2004.
[c.f. also http://www.tei-c.org.uk/Activities/MI/]
- Tomaž Erjavec and Sašo Džeroski:
Machine Learning of Morphosyntactic Structure:
Lemmatising Unknown Slovene Words.
Applied Artificial Intelligence
18(1), pp. 17-40, 2004.
- Tomaž Erjavec, Roger Evans, Nancy Ide, Adam Kilgarriff:
From Machine Readable Dictionaries to Lexical Databases: the Concede Experience.
In the Proceedings of the 7th International Conference on
Computational Lexicography, COMPLEX'03, Budapest, 2003.
- Tomaž Erjavec:
Compiling and Using the
IJS-ELAN Parallel Corpus.
Informatica,
26(3), pp. 299-307, 2002.
- Sašo Džeroski, Tomaž Erjavec and Jakub Zavrel:
Morphosyntactic Tagging of
Slovene: Evaluating PoS Taggers and Tagsets.
Second International Conference on Language Resources
and Evaluation, LREC'00, pp. 1099-1104, 2000.
(available also in PDF)
- James Cussens, Sašo Džeroski, Tomaž Erjavec:
Morphosyntactic
Tagging of Slovene using Progol.
Proceedings of the Ninth International Workshop on Inductive Logic
Programming, ILP-99; volume 1634 of Lecture Notes
in Artificial Intelligence; Springer-Verlag, pp. 68--79.
- Suresh Manandhar, Saso Džeroski, Tomaž Erjavec (1998):
Learning Multilingual Morphology with CLOG. In
David Page (ed):
Inductive Logic Programming, 8th International Conference, ILP'98,
Proceedings.
Lecture Notes in Artificial Intelligence 1446, Springer,
pp. 135-144.
An estimation of citations as provided by the NEC ResearchIndex is
here.
Entry in SICRIS
Education
- [1997] PhD in Computer Science, University of Ljubljana.
Title of
thesis:
Title of thesis:
Treatments of Slovene Verb Morphology in Inheritance Models
- [1992] MSc in
Cognitive Science,
University of Edinburgh.
- [1990] MSc in Computer Science, University of Ljubljana.
- [1984] BSc in Computer Science and Electrical Engineering,
University of Ljubljana.
- Editorial board member:
- Programme Committee Chair of ESSLLI 2007,
the 19th European Summer School in Logic, Language and Information
- President of
SDJT, the
Slovenian Language Technologies Society
(1998-2005)
- Council member of the
Text Encoding Initiative Consortium
(2001-2003)
- Member of the
TEI Task Force on SGML to XML Migration
(2002-2003)
- Advisory board member of
EACL,
the European Chapter of the Association for Computational Linguistics
(1998-2002)
- Conference organisation:
- Conference IS-LTC 2008:
Sixth Language Technologies Conference
October 16 - 17, 2008, Jozef Stefan Institute, Ljubljana.
- Conference IS-LTC 2006:
Fifth Slovenian and First International
Language Technologies Conference
October 9-10 2006, Jozef Stefan Institute, Ljubljana.
- Conference JEZIKOVNE
TEHNOLOGIJE (Language Technologies)
October 13-14 2004, Jozef Stefan Institute, Ljubljana.
- Conference "Znanstvene izdaje v elektronskem mediju :
vecdisciplinarno posvetovanje" (Scientific Editions in the Electronic
Medium: a Multidisciplinary Conference).
Ljubljana: ZRC SAZU, June
2, 2004
- Workshop on Morphological Processing of
Slavic Languages, a workshop of the
EACL 2003
April 13 2003, Budapest.
- Conference JEZIKOVNE
TEHNOLOGIJE (Language Technologies)
October 14-15 2002, Jozef Stefan Institute, Ljubljana.
- Conference JEZIKOVNE
TEHNOLOGIJE (Language Technologies)
October 17-18 2000, Cankarjev dom, Ljubljana.
- 5th TELRI Seminar:
Corpus Linguistics: How to Extract Meaning from Corpora
September 22-24 2000, Arts Faculty, Ljubljana.
- EAMT 2000:
European Association for Machine
Translation Workshop
May 10-12 2000, Austrotel, Ljubljana.
- Language Technologies - Multilingual Aspects, Workshop in
the scope of the 32. annual meeting of the Societas Linguistica
Europaea
July 8-11 1999, Arts Faculty, Ljubljana
- Conference Language
Technologies for the Slovene Language
6-7 October 1998, Ljubljana
- TELRI
"Workshop
on Alignment and Exploitation of Texts"
February 1-2 1997, Ljubljana.
- Journal reviewing:
Computational Linguistics,
Literary and Linguistic Computing,
Journal on Research in Language and Computation,
Slovene journals "Informatica" and "Uporabna informatika",
Croatian journal "Suvremena Lingvistika"
- Program committee member of:
- 2010
-
- LREC 2010:
Seventh international conference on Language Resources and Evaluation
19-21 May 2010, La Valleta, Malta
- LREC Workshop:
Exploitation of Multilingual Resources and Tools for
Central and (South-) Eastern European Languages
23rd May 2010,
- 2009
-
- LTC 2009:
4th Language & Technology Conference
Poznan, Poland, November 6-8, 2009.
- EPIA 2009:
Fourteenth Portuguese Conference on Artificial Intelligence
University of Aveiro, October 12-15, 2009.
- Linguistic Processing Pipelines
Potsdam, 29.09.2009.
- LAW III:
The Third Linguistic Annotation Workshop.
Held in conjunction with ACL-IJCNLP 2009
Singapore, August 6-7, 2009.
- BSNLP 2009:
Workshop on Balto-Slavonic Natural Language Processing
(organised in the scope of the International Joint Conference
Intelligent Information Systems)
Cracow, June 15th, 2009
- 2008
-
- FASSBL 2008:
The Sixth International Conference
Formal Approaches to South Slavic and Balkan Languages
Dubrovnik, September 25-28, 2008
- PoS-3:
3rd International Conference "Perspectives on Slavistics"
Hamburg, August 28-31, 2008,
- TeachCL-08:
The Third Workshop on Issues in Teaching Computational Linguistics.
Held in conjunction with ACL 2008: HLT
Columbus, Ohio, June 19th and 20th, 2008
- Sixth international
conference on Language Resources and
Evaluation, LREC 2008
28-30 May 2008, Marrakech, Morroco
-
- LAW II:
The 2nd Linguistic Annotation Workshop.
26-27 May 2008.
- Fourth Global WordNet Conference
January 22-25, 2008, Szeged, Hungary
- 2007
-
- 45th Annual Meeting of the Association for Computational Linguistics
June 23-30, 2007, Prague, Czech Republic
- TLT 2007:
6th International Workshop on Treebanks and Linguistic Theories
December 7-8, 2007, Bergen, Norway
[on-line proceeedings]
- Lexis and Grammar Conference 2007, The 26th International Conference on Lexis and Grammar
October 2-6, 2007, Bonifacio, Corsica
- 3rd Language & Technology Conference:
Human Language Technologies as a Challenge for Computer Science and Linguistics
October 5-7, 2007, Poznan, Poland
- 6th International Conference
Recent Advances on Natural Language Processing, RANLP-2007
27-29 September 2007, Borovets, Bulgaria
- 2006
-
- TLT 2006:
5th International Workshop on Treebanks and Linguistic Theories
December 1-2, 2006, Prague, Czech Republic
- 9th INTEX/NooJ Workshop
June 1-3, 2006, Belgrade, Serbia
- LREC Workshop on Merging and Layering Linguistic Annotations
May 23, 2006, Genoa, Italy
- EACL 2006:
The 11th Meeting of the European Association of Computational Linguistics
April 3-7, 2006, Trento, Italy
- EACL 2006 Workshop on
Multi-dimensional Markup in Natural Language Processing: 5th
Workshop on NLP and XML (NLPXML-2006)
April 4, 2006, Trento, Italy
- 2005
-
- TLT 2005,
Fourth Workshop on Treebanks and Linguistic Theories
Barcelona, 9-10 December 2005
- 6th International Workshop on
Linguistically Interpreted Corpora (LINC-2005)
IJCNLP05, Jeju Island, 15 October 2005
- Language and
Speech Infrastructure for Information Access in the Balkan
Countries
September 25, 2005, Borovets, Bulgaria
- The 4th
International Workshop on Learning Language in Logic (LLL05)
August 7, 2005, Bonn, Germany.
- Corpus
Linguistics 2005
July 14-17, 2005, Birmingham, U.K.
- ACL 2005 Workshop on
Effective Tools and Methodologies for Teaching Natural Language
Processing and Computational Linguistics
June 25, 2005, Ann Arbor, MI.
- The 2-nd Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics
April 21-23, 2005, Poznan, Poland
- 2004
-
- 2003
-
- The Second Workshop
on Treebanks and Linguistic Theories (TLT 2003)
November 14-15, 2003, Växjö, Sweden
- Workshop on Information
Extraction for Slavonic and other Central and Eastern European
Languages
8-9 September 2003 Borovets, Bulgaria
- 4th International Workshop on
Linguistically INterpreted Corpora, a workshop of the
EACL 2003
April 2003, Budapest.
- EACL 2003 Workshop on Language Technology and the Semantic Web
(NLPXML'03)
April 2003, Budapest.
- EACL 2003 Workshop on
Morphological Processing of
Slavic Languages
April 2003, Budapest.
- Corpus
Linguistics 2003
Lancaster University (UK), 28 March - 1 April 2003
- Workshop on Shallow Processing of Large Corpora
Lancaster University (UK), 27 March, 2003
- 2002
-
- TLT 2002:
Workshop on Treebanks and Linguistic Theories 2002
September 20-21, 2002, Sozopol, Bulgaria
- NLPXML 2002:
2nd Workshop on NLP and XML
September 1, 2002, Taipei, Taiwan
- COLING 2002:
19th International Conference on Computational Linguistics
August 24 - September 1, 2002, Taipei, Taiwan
- PRICAI 2002:
The Seventh Pacific Rim International Conference on Artificial Intelligence
August 18-22, Tokyo
- 2001
-
- LLL 2001:
3rd Learning Language in Logic Workshop, ILP'01
September 8-9, 2001, Strasbourg
- ACL/EACL 2001:
39th Annual Meeting of the Association for Computational Linguistics
July 6 - 11 2001, Toulouse
- COMPLEX 2001:
6th Conference on Computational Lexicography
June 28 - 30 2001, Birmingham
- 2000
-
- EMNLP/VLC 2000:
Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and
Very Large Corpora
October 7-8, 2000, Hong-Kong
- ACL Workshop:
Comparing Corpora
October 7-8, 2000, Hong-Kong
- LLL 2000: 2nd Learning Language in Logic Workshop
September 13-14, 2000, Lisbon
- LINC-2000:
Workshop on Linguistically Interpreted Corpora.
August 6, 2000, Luxembourg
- CLIN 2000:
Eleventh CLIN Meeting (Computational Linguistics in the Netherlands)
November 3, 2000, Tilburg
- 1999
-
- Information Society '99
October 12-14, 1999 Ljubljana
- Fourth ESSLLI Student Session
Aurgust 9-20, 1999 Utrecht
- Learning
Language in Logic (LLL) Workshop, ICML'99
June 30, 1999, Bled, Slovenia
- Workshop
on Machine Learning in Text Data Analysis, ICML'99
June 30, 1999, Bled, Slovenia.
- ILP'99: Ninth International Workshop on Inductive Logic
Programming
June 24-27, 1999, Bled, Slovenia.
- LINC-99: Workshop on
Linguistically Interpreted Corpora, EACL'99
June 12, 1999, Bergen
- EACL'99: Ninth Conference of the European Chapter of the Association
for Computational Linguistics
June 8-12, 1999, Bergen
- President of Technical Committee on Terminology
at the
Slovenian Institute for Standardization
(c.f. ISO TC37 SC4).
- Member of Societies
ACL,
ACL SIGNLL,
ACL SIGPHON,
SDJT,
SLAIS

Page last updated 2010-02-15,
et