New Media and Language Technologies

Lecturers
Course timetable and materials
February 28 2007, 11:15 - 15:00, JSIPS
- Tomaž Erjavec: Introduction to LT
[PDF] /1 hr/
- Tomaž Erjavec: Processing words
[PDF] /1 hr/
- Jerneja Žganec Gros: Speech technologies
- Introduction [PDF]
- How to build a text-to-speech system [PDF]
- Speech technology applications [PDF]
/2 hrs/
March 21 2007, 11:15 - 15:00, JSIPS
- Tomaž Erjavec: Corpus linguistics
[HTML slides],
[HTML handout] /2 hrs/
- Sašo Džeroski: Language Resources and Machine Learning
[PPT,
PDF] /2 hrs/
Assessment
- Oral exam, covering the lectures and two parts from the
Jurafsky & Martin book - see below;
- Practical work, accompanied by a report (3,000 words),
describing the problem; approach taken to solving it;
related work; and the evaluation of the results.
Each student should first discuss with the lecturers the manner of
taking the exam and topics covered with
Tomaž Erjavec.
Literature list
- The main textbook used in the course which should be studied for an oral exam is:
Daniel Jurafsky, James H. Martin.
Speech
and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and Speech
Recognition. Prentice-Hall, 2000.
Contents:
I. Words,
II. Syntax,
III. Semantics,
IV. Pragmatics,
V. Multilingual Processing.
- All slides accompanying the lectures are available on the Web (links
next to the lectures above)
- Supplementary reading for the course topics are the following papers:
- A Machine Learning Approach to Automatic
Functor Assignment in the Prague Dependency Treebank.
Zdenek Žabokrtsky, Petr Sgall, Saso Džeroski.
In Proceedings of the Third International Conference on Language Resources
and Evaluation, LREC'02.
- Machine Learning of Morphosyntactic Structure:
Lemmatising Unknown Slovene Words.
Tomaž Erjavec and Sašo Džeroski.
Applied Artificial Intelligence, 18(1), pp. 17-40, 2004.
- Slovenian Text-to-Speech Synthesis for Speech
User Interfaces.
Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešič, Mario Žganec, Stanislav Gruden.
Proceedings of the Third World Enformatika Conference, WEC 2005.
- The VoiceTRAN Speech-to-Speech
Communicator. Jerneja Žganec Gros, France Mihelič, Tomaž
Erjavec, and Špela Vintar. Proceedings of the 8th International
Conference on Text, Speech and Dialogue, TSD 2005. (Lecture notes in
computer science, Lecture notes in artificial intelligence,
3658. Berlin: Springer)
-
Digitisation of Literary
Heritage Using Open Standards.
Tomaž Erjavec, Matija Ogrin. In Proceedings of eChallenges 2005,
19 - 21 October 2005, Ljubljana.
-
MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications,
Lexicons and Corpora.
Tomaž Erjavec.
In Proceedings of the Fourth International Conference on Language Resources
and Evaluation, LREC'04.
- The following books are also available:
Available datasets:
Last updated 2007-02-27,
et