Presentation at Tsujii Lab, University of Tokyo

Tomaž Erjavec

Dept. of Intelligent Systems
Jozef Stefan Institute
Ljubljana
Slovenia

Tuesday, 22 January 2002

This presentation is available at http://nl.ijs.si/et/talks/tsujiilab/

Overview:






Slovenia


IJS AI and NLP


My work and history

Education: International projects:
ILD UK SALT project (1994-1996)
The Integrated Language Database
(RA, Centre for Cognitive Science, 1994)
MULTEXT-EAST Copernicus Joint Project COP 106 (1996-1997):
Multilingual Texts and Corpora for Eastern and Central European Languages
ELAN MLIS EU Project (1998-1999):
European Language Activity Network
(local page)
TELRI Copernicus Concerted Action (1999-2001, 1995-1997):
Trans-European Language Resources Infrastructure II
(local page)
CONCEDE Copernicus Joint Project (1998-2000):
Consortium for Central European Dictionary Encoding
(local page)
Slovene projects:
Ministry of Information Society Project (2001)
Localisation of Open Source Spell Checkers ispell and aspell
(collaborator)
MZT L2-0461-0106 (1998-2001)
Development of Digital Publishing with Distance Learning Support
(project leader)
MZT T2-0409 (1998-2000)
Speech Copora and Tools for the Slovenian Language
FIDA (1996-1999)
Reference corpus of the Slovene Language
(TEI/SGML consulting)
GNUsl (1995--)
A GNU effort for the Slovene Language
(server maintenance, resource contribution)
Summer school teaching: Functions:

Research interests

Before 1995 (Edinburgh, PhD): Since (EU projects):

Work in Tokyo

Work will concentrate on GENIA corpus and resources

TEI encoding

Developing a version of GPML and XLiNo that is compatible with TEI guidelines.

A preliminary SGML prolog:

<!DOCTYPE TEI.2  SYSTEM "tei2.dtd"  [
  <!ENTITY % TEI.prose "INCLUDE">
  <!ENTITY % TEI.dictionaries "INCLUDE">
  <!ENTITY % TEI.terminology "INCLUDE">
  <!ENTITY % TEI.general "INCLUDE">
  <!ENTITY % TEI.linking "INCLUDE">
  <!ENTITY % TEI.analysis "INCLUDE">
  <!ENTITY % TEI.fs "INCLUDE">
  <!ENTITY % TEI.corpus "INCLUDE">
]>

Multiple Hierarchies

Design of multiple hierarchies for GENIA annotation; hot topic in XML world - see e.g. the paper Implementing Concurrent Markup in XML. One possibility: use of stand-off markup, as advocated in e.g. CES and can be implemented using XML XLink.

Transformations

Using XSLT (with XPath and XPointer) to implement various corpus renderings (visualisations).

Machine learning

Work on ILP learning or literature based discovery on MEDLINE abstracts, together with Saso Dzeroski.

Also use of other information soruces connected to MEDLINE, i.e. MeSH and UMLS.


Tomaž Erjavec, 2002-01-22