Morphosyntactic specifications (the theoretical part and converting the MSD index to
the format demanded by the specs), tag correspondence tables from the IPIC to the MTE
format for the converter.
Preparing a list of tags and statistics of their usage from the IPIC for conversion
(MSD index within the morphosyntactic specifications), conversion code, extracting the
lexicon from the IPIC and recalculating statistics to fit the MTE tagset.
The tagging of the text was performed with the help of the TaKIPI program (http://nlp.ipipan.waw.pl/TaKIPI/),
specially developed for tagging Polish using the IPIC (IIS PAS Corpus: http://korpus.pl)
tagset and based on the Morfeusz Morphosyntactic Analyzer for Polish (http://nlp.ipipan.waw.pl/~wolinski/morfeusz/). Afterwards the tag converter was
used to recode it into MTE-style format. To conform with MTE’s major demands, the
converter provides a more detailed description of some parts of speech, different PoS
grouping and considerable differences in word segmentation principles. A detailed
description of the correspondences between tags can be found at http://www.domeczek.pl/~natko/papers/MTE-pl_Ljub.pdf. The discussed conversion
method has been implemented in the Python programming language; the code and the data
are available online at http://domeczek.pl/~polukr/mte-conv/.