New Media and Language Technologies

Part of "New Media and eScience" MSc Programme
Jožef Stefan International Postgraduate School
Winter 2008 / Spring 2009

URL http://nl.ijs.si/et/teach/mps08-hlt/

logo


Lecturers

Course timetable and materials

December 3 2008 15:15 - 19:00, MPŠ

Tomaž Erjavec:
  1. Introduction to LT [PPT] [PDF]
  2. Processing words [PPT], [PDF]

December 10, 2008, 15:15 - 19:00, MPŠ

  1. Tomaž Erjavec: Computer corpora and Morphosyntactic tagging [PPT] [PDF]
  2. Nina Ledinek: Syntactic analysis and the Slovene Dependency Treebank [PPT] [PDF]
    a sample from SDT is in here
  3. Darja Fišer: Semantic lexica and Slovene WordNet [PDF]

March 4, 2009, 15:15 - 19:00, MPŠ

Sašo Džeroski:
  1. Language Resources and Machine Learning [PPT, PDF]

April 1, 2009, 11:15 - 15:00, E8 Orange room (Note changed date and location!)

  1. Jerneja Žganec Gros, Speech Technologies:
    Introduction, Applications, Text-to-Speech

April 8, 2009, 15:15 - 19:00, MPŠ

  1. Presentation and discussion of seminar work by students

Assessment

Seminar work, consisting of an experiment (to be determined in consultation with the lecturer), accompanied by a report (3,000 words), describing the problem; approach taken to solving it; related work; and the evaluation of the results.

Exam dates:

Suggestions for seminar topics

  1. Train and test the Brill tagger on the JOS corpus
  2. Make and analysis of the JOS treebank (an example is here) and try to train and test MALT parser on it.
  3. Use the Slovene WordNet for various tasks.

Literature list

  1. The main textbook for the field is:
    Daniel Jurafsky, James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice-Hall, 2000.
    Contents: I. Words, II. Syntax, III. Semantics, IV. Pragmatics, V. Multilingual Processing.
  2. All slides accompanying the lectures are available on the Web (links next to the lectures above)
  3. Supplementary reading for the course topics are the following papers:
  4. The following books are also available:
Available datasets:

Valid HTML 4.01!

Last updated 2009-04-08, et