Current enconding practices use widely different naming conventions for
corpus tags. We can find different sets of labels also for the same
language - for example SUBSMS, SBMS, NCMS, Nms, etc. can represent
"Common noun, masc. sing." in different systems for the very same
language.
It has been found, as already mentioned, that corpus tags are strongly
committed to the tool and to the language. Therefore, each language will
have its own set based on different considerations. However it was
considered helpful to suggest some naming conventions for the sake of
harmonization. The following is an attempt done by the French partners
to give simple general guidelines for achieving a coherent naming
convention within the project.
This is not a formal system, and may lead to ambiguities. In order to
have a final set of tags a thourough testing must be performed as
experimentation is going to show the behaviour of a given set. Also
considerations coming from the decision taken with respect to the need
and usefulness of special devices for automatic conversion are expected
to have some impact in the concrete tags given for a language. Thus,
the tags proposed by each group for the time being must be considered
temptative until the end of the experimentation phase.