The novels have their markup normalised: a) structure
annotation with div and p (attributes xml:id and type); b)
segmentation annotation with s (attribute xml:id); c) tokenisation
annotation with w, c, (attribute type) d) linguistic
annotation with w attributes lemma and ana.
The linguistic interpretation of the text consists of
marking up the word tokens with their context disambiguated lemma and
MULTEXT-East morphosyntactic description. The various texts have
undergone various amounts of validation, so error-rates between
The MULTEXT-East morphosyntactic descriptions (MSDs) follow
the revised common tables of lexical specifications
MULTEXT-East/Mondilex. The lexical MSDs have been converted to a fslib,
a feature-structure library, while their decomposition into features
is given in a flib, a feature library.
The words in the texts have theirs MSD encoded as the value
of the ana (#IDREF) attribute. This attribute refers to a fs, which, in
turn, refers via its #IDREFS feats to the f elemetns that define it.