New Media and Language Technologies

Part of "New Media and eScience" MSc Programme
Jožef Stefan International Postgraduate School
Winter 2006 / Spring 2007

URL http://nl.ijs.si/et/teach/jsi06-hlt/

Lecturers

Tomaž Erjavec
Sašo Džeroski
Jerneja Žganec Gros

Course timetable and materials

February 28 2007, 11:15 - 15:00, JSIPS

Tomaž Erjavec: Introduction to LT [PDF] /1 hr/
Tomaž Erjavec: Processing words [PDF] /1 hr/
Jerneja Žganec Gros: Speech technologies
- Introduction [PDF]
- How to build a text-to-speech system [PDF]
- Speech technology applications [PDF]
/2 hrs/

March 21 2007, 11:15 - 15:00, JSIPS

Tomaž Erjavec: Corpus linguistics [HTML slides], [HTML handout] /2 hrs/
Sašo Džeroski: Language Resources and Machine Learning [PPT, PDF] /2 hrs/

Assessment

Oral exam, covering the lectures and two parts from the Jurafsky & Martin book - see below;
Practical work, accompanied by a report (3,000 words), describing the problem; approach taken to solving it; related work; and the evaluation of the results.

Each student should first discuss with the lecturers the manner of taking the exam and topics covered with Tomaž Erjavec.

Literature list

The main textbook used in the course which should be studied for an oral exam is:
Daniel Jurafsky, James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice-Hall, 2000.
Contents: I. Words, II. Syntax, III. Semantics, IV. Pragmatics, V. Multilingual Processing.
All slides accompanying the lectures are available on the Web (links next to the lectures above)
Supplementary reading for the course topics are the following papers:
- A Machine Learning Approach to Automatic Functor Assignment in the Prague Dependency Treebank. Zdenek Žabokrtsky, Petr Sgall, Saso Džeroski. In Proceedings of the Third International Conference on Language Resources and Evaluation, LREC'02.
- Machine Learning of Morphosyntactic Structure: Lemmatising Unknown Slovene Words. Tomaž Erjavec and Sašo Džeroski. Applied Artificial Intelligence, 18(1), pp. 17-40, 2004.
- Slovenian Text-to-Speech Synthesis for Speech User Interfaces. Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešič, Mario Žganec, Stanislav Gruden. Proceedings of the Third World Enformatika Conference, WEC 2005.
- The VoiceTRAN Speech-to-Speech Communicator. Jerneja Žganec Gros, France Mihelič, Tomaž Erjavec, and Špela Vintar. Proceedings of the 8th International Conference on Text, Speech and Dialogue, TSD 2005. (Lecture notes in computer science, Lecture notes in artificial intelligence, 3658. Berlin: Springer)
- Digitisation of Literary Heritage Using Open Standards. Tomaž Erjavec, Matija Ogrin. In Proceedings of eChallenges 2005, 19 - 21 October 2005, Ljubljana.
- MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora. Tomaž Erjavec. In Proceedings of the Fourth International Conference on Language Resources and Evaluation, LREC'04.
The following books are also available:
- The Oxford Handbook of Computational Linguistics. Ruslan Mitkov (ed.) Oxford University Press, 2003. [Sample]
- Foundations of Statistical Natural Language Processing. Christopher D. Manning, Hinrich Schutze. MIT Press. 1999
- ..and many other books and papers of the JSI library (room S10)

Available datasets:

MULTEXT-East (Slovene) corpus and lexicon
IJS-ELAN corpus
SVEZ-IJS corpus
Slovene Dependency Treebank
Slovene WordNet

Last updated 2007-02-27, et

New Media and Language Technologies

Part of "New Media and eScience" MSc Programme Jožef Stefan International Postgraduate School Winter 2006 / Spring 2007

URL http://nl.ijs.si/et/teach/jsi06-hlt/

Lecturers

Course timetable and materials

February 28 2007, 11:15 - 15:00, JSIPS

March 21 2007, 11:15 - 15:00, JSIPS

Assessment

Literature list

Part of "New Media and eScience" MSc Programme
Jožef Stefan International Postgraduate School
Winter 2006 / Spring 2007