Annotation of Language Resources

Tomaž Erjavec
Anne-Marie Mineur

One Week Introductory Course at
August 5 - 9, Trento

Here you can find the materials for the one-week introductory course given at ESSLLI 2002. The materials consist of the Course Reader and the slides for the five lectures of the course, given here in a relatively compact form, in HTML:
Lecture I: The XML Recommendation
This lecture introduces the eXtended Markup Language, XML, and discuss the motivation for its development, its history and building blocks, i.e. elements, attributes and entities, and how they fit together.
Lecture II: XML-Related Recommendations
This lecture discusses developments related to XML, in particular XML Schemas, XML Namespaces, XPath, and the XML transformation language, XSLT.
Lecture III: Annotation Software
This lecture presents, for the most part, XML-related software, e.g. parsers and transformation engines and editors. Other linguistic annotation software is also mentioned, in particular several corpus workbenches and statistic tool packages. The lecture concludes with a case study on annotating the GENIA Corpus with LTG tools.
Lecture IV: TEI and other Language Encoding Recommendations
This lecture presents the XML-based Text Encoding Initiative Guidelines and other language encoding recommendations. We present the history, organisation and architecture of TEI and illustrate it with various applications. We also discuss some language engineering standards that came about as a result of EU projects, EAGLES, ISLE, (X)CES and finish with a few lexicon exchange initiatives, i.e. MARTIF, TMX and OLIF.
Lecture V: Metadata
The last part of the course deals with a set of XML-based initiatives for encoding (language) meta-data, i.e. the Open Archives Initiative in combination with the Dublin Core, the Open Linguistic Archives Consortium, the OLAC based Language Typology Resource Center as well as the Typological Database Project.

