Dan Tufis & al * Corpora and Corpus-Based Morpho-Lexical Processing
When considering lemma homography, the MSD-ambiguity figures are shown in Table 1.6.
Ambiguity level | 1 | 2 | 3 | 4 |
Number of lemmas | 31218 | 1927 | 107 | 7 |
The following two tables in this section show the distribution of the MSDs over the MSD-ambiguity classes and of POSes over POS-ambiguity classes, respectively. Table 1.7 reads as follows: a given POS-msd (an MSD belonging to a given part of speech) appears in k MSD-ambiguity classes. For instance, verbal MSDs (V-msd) appear in 490 out of the total 981 MSD-ambiguity classes, while nominal MSDs (N-msd) appear in 445 MSD-ambiguity classes.
POS-msd | N-msd | V-msd | A-msd | P-msd | D-msd | M-msd | R-msd | T-msd | S-msd | C-msd | Q-msd | I-msd | Y-msd | X-msd |
No. of MSD-amb. classes | 445 | 490 | 334 | 131 | 93 | 73 | 102 | 15 | 22 | 18 | 7 | 19 | 34 | 9 |
If one considers only the part of speech and the POS-ambiguity classes, the corresponding distribution is shown in Table 1.8. Out of the total number of 90 POS-ambiguity classes, 34 contain the verb (V), 30 contain the noun (N) and so on.
POS | N | V | A | P | D | M | R | T | S | C | Q | I | Y | X |
No. of POS-amb. classes | 30 | 34 | 18 | 28 | 12 | 14 | 28 | 10 | 16 | 11 | 6 | 11 | 16 | 9 |
Comparing the figures in Table 1.7 and Table 1.8 one may draw some interesting conclusions. For instance, considering the number of word-forms with more than 1 MSD (63411, i.e. 18,22% of the total number of word-forms) any such ambiguous word-form will have in almost 50% of the cases, one or more verbal readings; if considering only part of speech, in almost than 38% of the cases an ambiguous word-form would have a verb interpretation. Table 1.9 summarises this comparison for all parts of speech.
% | N | V | A | P | D | M | R | T | S | C | Q | I | Y | X |
MSD | 45 | 50 | 34 | 13 | 9 | 7 | 10 | 2 | 2 | 2 | 0.7 | 2 | 3 | 0.9 |
POS | 33 | 38 | 20 | 31 | 13 | 16 | 31 | 11 | 18 | 12 | 7 | 12 | 18 | 10 |
38