MULTEXT-East:
Multilingual Text Tools and Corpora for Central and Eastern
European Languages. EU Copernicus Project COP106
Concede:
Consortium for Central European Dictionary Encoding.
EU Copernicus Project PL96-1142
The Concede project had the aim of developing a unified dictionary
encoding schema and the experiments were done with lexical tokens
extracted from Orwell's "1984" multilingual corpus developed within
the MULTEXT-East project. The headword extraction considered various
frequency intervals and considering all word categories (POS) so that
different kinds of encoding problems be revealed. The MULTEXT-East
corpus has been significantly improved for the purpose of CONCEDE
project.
|