An Overview of Public Domain Language Engineering Generic Tools
This page is an accompaniment to the talk given at the
TELRI
"Language Resources for Language Technology"
Seminar.
You can also get the slides to the talk in
PostScript format.
A draft version of the paper with the same title, but somewhat
different content from the talk is now also available:
Comments welcome!
Follow links to tools, tool repositories and information
pages mentioned in the talk:
- Corpus annotation standards
- SGML tools
- Corpus annotation tools
- Computational linguistic tools
- SGML
is an ISO standard, which codifies (descriptions of) document structure;
- HyTime
is an ISO standard for hypermedia, utilising SGML;
- TEI
is a set of proposals for encoding textual data;
- HTML is
the of the World Wide Web.
Extensive SGML archives are in
ftp.ifi.uio.no/pub/SGML and
ftp.gmd.de/gmd/sgml
A good guide to SGML tools is
"The Whirlwind Guide
to SGML Tools and Vendors", from where most the following taxonomy has
been shamelessly plundered:
- Segmenter: one is to be available from the MULTEXT(-EAST) project;
- Taggers (Part-of-Speech disambiguation):
- Church & Gale aligner (to be available from the
MULTEXT(-EAST)
project);
Tools and information on them are available from:
And here are some more interesting CL tools:
And some more
Computational Linguistics
on the Internet.
To TELRI WG5 page
Tomaz Erjavec,
IJS
Last updated 14. Jan. 1996