New Media and Language Technologies

Lecturers
Course timetable and materials
December 3 2008 15:15 - 19:00, MPŠ
Tomaž Erjavec:
- Introduction to LT
[PPT]
[PDF]
- Processing words
[PPT],
[PDF]
December 10, 2008, 15:15 - 19:00, MPŠ
- Tomaž Erjavec:
Computer corpora and Morphosyntactic tagging
[PPT]
[PDF]
- Nina Ledinek: Syntactic analysis and the
Slovene Dependency Treebank
[PPT]
[PDF]
a sample from SDT is in here
- Darja Fišer: Semantic lexica and Slovene WordNet
[PDF]
March 4, 2009, 15:15 - 19:00, MPŠ
Sašo Džeroski:
- Language Resources and Machine Learning
[PPT,
PDF]
April 1, 2009, 11:15 - 15:00, E8 Orange room (Note changed
date and location!)
- Jerneja Žganec Gros, Speech Technologies:
Introduction,
Applications,
Text-to-Speech
April 8, 2009, 15:15 - 19:00, MPŠ
- Presentation and discussion of seminar work by students
Assessment
Seminar work, consisting of an experiment (to be determined in
consultation with the lecturer), accompanied by a report (3,000 words),
describing the problem; approach taken to solving it;
related work; and the evaluation of the results.
Exam dates:
- 13.5.2009, 17.00-19.00
- 10.6.2009, 16.00-18.00
- 24.6.2008, 15.00-17.00
Suggestions for seminar topics
- Train and test the
Brill tagger
on the
JOS
corpus
- Make and analysis of the JOS treebank (an example is
here) and try to train and test
MALT
parser on it.
- Use the
Slovene WordNet
for various tasks.
Literature list
-
The main textbook for the field is:
Daniel Jurafsky, James H. Martin.
Speech
and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics and Speech
Recognition. Prentice-Hall, 2000.
Contents:
I. Words,
II. Syntax,
III. Semantics,
IV. Pragmatics,
V. Multilingual Processing.
- All slides accompanying the lectures are available on the Web (links
next to the lectures above)
- Supplementary reading for the course topics are the following papers:
- A Machine Learning Approach to Automatic
Functor Assignment in the Prague Dependency Treebank.
Zdenek Žabokrtsky, Petr Sgall, Saso Džeroski.
In Proceedings of the Third International Conference on Language Resources
and Evaluation, LREC'02.
- Machine Learning of Morphosyntactic Structure:
Lemmatising Unknown Slovene Words.
Tomaž Erjavec and Sašo Džeroski.
Applied Artificial Intelligence, 18(1), pp. 17-40, 2004.
- Slovenian Text-to-Speech Synthesis for Speech
User Interfaces.
Jerneja Žganec Gros, Aleš Mihelič, Nikola Pavešič, Mario Žganec, Stanislav Gruden.
Proceedings of the Third World Enformatika Conference, WEC 2005.
- The VoiceTRAN Speech-to-Speech
Communicator. Jerneja Žganec Gros, France Mihelič, Tomaž
Erjavec, and Špela Vintar. Proceedings of the 8th International
Conference on Text, Speech and Dialogue, TSD 2005. (Lecture notes in
computer science, Lecture notes in artificial intelligence,
3658. Berlin: Springer)
-
Digitisation of Literary
Heritage Using Open Standards.
Tomaž Erjavec, Matija Ogrin. In Proceedings of eChallenges 2005,
19 - 21 October 2005, Ljubljana.
-
MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications,
Lexicons and Corpora.
Tomaž Erjavec.
In Proceedings of the Fourth International Conference on Language Resources
and Evaluation, LREC'04.
- The following books are also available:
Available datasets:
Last updated 2009-04-08,
et