Task Leader:
PISA-ILC - Nicoletta Calzolari and Monica Monachini
March 1995
The work carried out in this task aims at formulating harmonized
specifications
and at proposing
a notation for the lexica and the tagsets,
to be contributed by each language group
involved in the MULTEXT Project.
MULTEXT's general aim is to develop tools for
corpus annotation which contribute to the standardization of this kind
of work in an
academic and an industrial environment. These tools will
be provided with resources from six different languages to ensure their
validity. Resources used
to feed the tools are, among others, lexical lists
for the six
languages, containing the necessary information to run the tools.
Tools that will use lexica are mainly those which perfom
morphological analysis and generation, and lexical lookup tools. MULTEXT
proposes to deliver a morphological tool together with
basic
morphological rules and a number of base form entries, duly
coded with respect to the rules. The morphological tool is intended
to expand
these base forms into word-form lists, with corresponding
morphosyntactic information. These word-forms will,
in turn, be used for the tagger,
providing that a correspondence between the morphosyntactic
information and the tags to be used by the tagger is defined. The
morphological tool must guarantee extensibility of the MULTEXT
tools, as it is thought to be used by end-users to enlarge lexical
material treated by the tools. It is also expected that a
morphological analysis will be able
to perform a ``guess" on at least the
category of
unknown words and, where possible, on morphosyntactic features.
Within MULTEXT, therefore,
``lexical list" refers to a list of forms
with related information: both to base-form lexica, coded in
such a way as to feed
the morphological tool, and to the word-form lexica, containing
relevant information for corpus annotation purposes.
At the first workpackage coordinators' meeting held
in Paris, and as also reported in D1.6.1. (September 1994), it was
agreed that in
view of the urgent need for lexical lists for the creation of
the tools, lexical lists of word-forms
in a particular format could be supplied
already in the first phase,
meanwhile leaving for the second phase the development of
base-form
morphological lexica, input for the morphological tool. These word-form
lexical
lists were generated from the resources
already available at
the different sites. Further work will be done in order to ensure the
complete mappability between the results of the morphological tool
and the formalism proposed for lexical lists.
The present report
is mainly devoted to the definition of the information associated with
the word-form lists, from now on referred to as
``lexical descriptions".
We provide here
the notation to be used in the lists corresponding to each
language to describe a given word-form. Major effort has been devoted
to ensure compatibility between the three different types of
information to be associated with a given word: morphological
information, morphosyntactic lexical description and TAG label.
The present report is divided into four sections:
Classification of lexical items relies on the old tradition of Greek and Latin grammar. What is normally referred to as ``Parts-of-Speech" distinction for different words is well-known to be a crucial task, but not accurate or universal. Lyons (1981, p.109), for instance, warns the reader about this:
| ``It is important to realize, however, that the traditional list of ten or so parts of speech is very heteregeneous in composition and reflects, in many of the details of the definitions that accompany it, specific features of the grammatical structure of Greek and Latin that are far from being universal. Furthermore, the definitions themselves are often logically defective. Some of them are circular; and most of them combine inflectional, syntactic and semantic criteria which yield conflicting results when they are applied to a wide range of particular instances in several languages. ... Like most of the definitions in traditional grammar, they rely heavily upon the good sense and tolerance of those who apply and interpret them." |
These difficulties in classifying word classes have been the concern
of many linguists and greatly affect computational applications,
as one cannot expect from machines the sort of ``good sense and
tolerance" asked for in
applying current classifications. On the other
hand, the tools MULTEXT is going to develop will be used by humans
sharing similar linguistic backgrounds.
It is, therefore, imperative that MULTEXT makes these tools user-frendly.
It should not be forgotten that
the output of corpus annotation as its main goal, as
well as the internal codification used for this purpose, should be
easily
understandable by the expected end-users of its products.
MULTEXT tools will be associated with
data for demonstration and validation purposes, but,
being public domain tools, one should expect that being allowed to
use them for experimentation, end-users will incorporate
their own classes and
distinctions. It must be ensured that users can
take supplied data as
guidelines to show the functionalities and behaviour of the tools
(the MULTEXT-EAST project evidences the importance of this
consideration).
With this aim, MULTEXT proposes to address classification problems
by joining forces with the EAGLES initiative
(MULTEXT T.A. 1993, p.10) which proposes to address them by
highlighting ``the area of common ground and some aspects of
discrepancy between the different systems for classifying
morphological units, in order to provide, after testing with respect
to all EC languages, the possibility of elaborating common consensual
guidelines for morphosyntactic encoding in lexica and corpora"
(``Synopsis and Comparison of Morphosyntactic Phenomena encoded in
Lexicons and in Corpora. A Common Proposal and Applications to
European Languages",
Monachini and Calzolari, Oct. 1994, p.12).
In EAGLES, a bottom-up procedure, looking at existing practices in a large number of lexical and textual projects world-wide (both in lexical specifications and in corpus tagsets), has been followed, thus allowing to highlight the large core of commonalities between lexical and textual large projects with respect to the morphosyntactic phenomena described. The procedure adopted within EAGLES was, in fact:
Thus, the EAGLES proposal (in the already mentioned
EAGLES reports ``Synopsis and Comparison
of Morphosyntactic Phenomena encoded in Lexicons and in Corpora" and
the ``Morphosyntactic Annotation",
Leech and Wilson, 1994) - which is also at the
basis of, or is mappable to,
the lexical and corpus specifications of the LRE projects
DELIS, RENOS, CRATER and MECOLB, MLAP project PAROLE, and the French
project GRACE -
has been the starting point of Task 1.6 within
MULTEXT.
The partners have been asked to:
| NOUN | Type | Gender | Number | Case | Count | Definitness | Inflection |
| L0 | N O U N | ||||||
| com | m | sg | nom | ||||
| L | prop | f | pl | gen | |||
| 1 | n | dat | |||||
| acc | |||||||
| L | cou | ||||||
| 2 | mass | ||||||
| It c | It n | Gr voc | Da def | Da/Ge weak | |||
| L | Du f(m) | Gr ind | Da indf | Da/Ge strg | |||
| 2 | Du cont | Da unmk | Da/Ge mix | ||||
| b | Sp trns | ||||||
| Sp notr | |||||||
Reports on the evaluation of the EAGLES specifications have been
contributed by the partners involved in this MULTEXT task, and
comments, suggestions and critical remarks are being taken into
account in the EAGLES proposal
which is being accordingly revised. MULTEXT Task 1.6. can be
seen as the largest contribution, together with DELIS, to the testing,
refining and revising of the EAGLES proposal. An example of this
interaction was the major revision of the EAGLES proposal which
affected the
Pronoun/Determiner category proposed, now split into two different
categories.
Experience shows that the process of consensus building is a slow process, because of the different interests to be adjusted. Considerations coming from ``re-usability" of existing material, as well as from theoretical and application-oriented arguments, have been raised in discussions under this task and should also be taken into account when evaluating its progress and results. Leading ideas to reach final decisions have been described above and will be examined in detail in the following subsections. They can be summarized by the statement of the MULTEXT strong committment to standardization and harmonization of lexical encoding initiatives, now active in Europe with the aim of sharing public domain resources.
After discussion on several issues, the MULTEXT partners agreed on the
necessity of differentiating corpus tags used for the PoS
disambiguator, or tagger, from the information which a lexicon can
offer. This is because
the former is an application-oriented representation of the
information described by the latter and depends very much on the tool
used. This decision was also in accordance with the orientations
given by the EAGLES Lexicon and Corpus working groups (see Monachini
and Calzolari, 1994).
Thus the terminology adopted in MULTEXT reflects this separation:
Hence, it was agreed that two different objects will be produced for each
language:
a. a lexicon where morphosyntactic features for each word form are
encoded with fine granularity, as close as possible to the
recommended EAGLES level-1.
b. a set of tags for the purpose of automatic disambiguation. In
practical terms these tags are to reflect broader categories on the
basis of the
limitations of a statistical tool. This set will be defined
and refined upon experimentations with the tagger tool.
In Task 1.6, it was decided - in accordance to the Technical Annex -
to begin taking EAGLES recommendations as
input for deciding on the basic morphosyntactic information to be
associated with the word-forms contained in the lexicon.
The MULTEXT application of EAGLES
recommendations needed to ensure that the information contained in the
so-called
Level-1 were significant for most of the languages to be
treated. Thus the work under this task may also be considered as
a concrete
validation of EAGLES work on electronic lexica. The
underlying aim of EAGLES is concerned with the re-usability of
electronic lexica, and, following this general tendency, MULTEXT lexical
descriptions
also had to be (as far as possible) independent from the
application, aiming at a general description of each language and
containing a basic set of shared information.
Also, for
the sake of ``re-usability" of the lexical material supplied, it was
judged that the lexical information to be encoded should be as detailed
as possible. Thus fine-granularity of the information would allow
other users to rearrange categories, when necessary, without much
difficulty.
The actual corpus tags we will be using will depend on at least the following:
We can fix (1), but
(2) is highly dependent on the tool.
That is why we concentrated on (1) in the first phase.
The corpus tags will be developed for each language with a specific
application in mind, i.e. that of producing a corpus tagged for
part-of-speech
(and possibly other morphosyntactic information) by means of
automatic disambiguation. The set of corpus tags will, very likely, be
revised many times during the course of the project, in order to find
an optimal set for each language.
It would be ideal to tag a corpus with the lexical descriptions
themselves for each word. However, it is well known that this is well
beyond the capabilities of the state-of-the-art tagging techniques.
Corpus tags are, therefore, to be seen as kinds of underspecified
lexical tags. There are two reasons why we may want underspecified
corpus tags:
1. Experience shows that some distinctions are difficult to get right
with a high accuracy.
For example, in some languages, the disambiguation between indicative
present and subjunctive present in a corpus is extremely difficult
to achieve by
automatic means. If some verbs have different forms for the indicative
and the subjunctive (e.g. Fr. venir: indic. = viens, subj. = vienne;
It. indic. = vieni, subj. = venga), many have the same form (e.g. Fr.
manger: indic., subj. and imper. = mange; It. indic. and subj. = ami).
In this latter case, disambiguation can only be achieved with very
complex
parsing of sentences.
Therefore, lexical entries will contain the following detailed and granular information associated with the word-forms
mange (manger) Main verb Indicative present, 1st person sing.
mange (manger) Main verb Indicative present, 3rd person sing.
mange (manger) Main verb Subjunctive present, 1st person sing.
mange (manger) Main verb Subjunctive present, 3rd person sing.
mange (manger) Main verb Imperative present, 2nd person sing.
ami (amare) Main verb Indicative present, 2nd person sing
ami (amare) Main verb Subjunctive present, 1st person sing.
ami (amare) Main verb Subjunctive present, 2nd person sing.
ami (amare) Main verb Subjunctive present, 3rd person sing.
wheras corpus tags will provide broader categories, collapsing several
lexical descriptions.
2. In order to train the tagger, we need statistical tables (based on
co-occurrences of tags). If we have a large tagset, we need a very
large corpus to train the disambiguator, in order to observe rare
co-occurrences. For example, in the proposal for French (see below),
there are 249 different lexical descriptions, but only 74 collapsed
corpus tags. Experience (Church, Penn Treebank, IBM France, etc.)
shows that the tagset should be under 100. Actually the Penn Treebank
collapsed many tags compared to the original Brown corpus,
and got better
results.
Two other observations are of relevance as regards the relation
between lexical specifications and corpus tags.
(a) Sometimes tagging classes are in reality different from lexical
descriptions. For example,
classes for punctuation are needed, certain types of
semantic or pragmatic or lexical information can be present in the
tags (e.g. the days of the week).
(b) Furthermore, the ``collapsing" decisions in TAGS are language
dependent, therefore it is not possible to have completely identical
tagsets across languages. To illustrate, we can give as an example
the
differences related to person differentiations in verbal morphology.
In Spanish, first and third person of different tenses have the same spelling:
Yo/El cantaba (Imperfect)
Yo/El cantari'a (Conditional)
Yo/El cante (present of subjunctive)
Taking into account that the subject in Spanish is not obligatory, and that the tagger cannot know if the preceeding NP is in fact the subject of the verb, there is no way to discriminate between the two forms. Hence a conflating tag is recommended, marked for instance as ``non-second-singular" form or as ``first- third singular". Also French has homographs for different verbal persons, but these are the first and the second person of some tenses:
Je/Tu viens
Je/Tu e'tais
The French tag cannot
be the same as the Spanish one, but it could be
``non-third-singular" or ``first-second-singular". Moreover,
having two different tags in French for the homograph could be
justified, due to the obligatory presence of a lexical subject, as the
tagger will be able to disambiguate among them due to the presence of
a pronoun in a near
context of most of their occurences.
For some languages (e.g. French, English and Italian) a lot of past experience and empirical evidence exists, which can be used to choose a reasonable initial tagset, that can be seen as preliminary and which can be refined later on in the project. For example, for English, the Penn tagset or the BNC are very good candidates. For French, the IBM tagset is a very good start (the French proposal presented in the following is very close to it). For Italian the tagset based on the DMI (Calzolari et al. 1983) is also a good starting point. These tagsets are the result of years of trial-and-error adjustments, and it seems reasonable not to ignore them. All of these tagsets are, moreover, compatible with the EAGLES proposal, i.e. mappable to it.
As stated in the Introduction, MULTEXT will supply a
morphological tool which will need some information on lemmas in
order to
produce the entire set of associated word-forms. A list of
word-forms will
constitute another lexicon which, as referred in the Technical Annex,
constitutes a value in itself. Hence, the lexica supplied by MULTEXT
are of two types:
(a) word-forms, containing
The information and notation of the lemma dictionary is closely related to the morphological tool used and also on the rules implemented within the tool. Due to the fact, mentioned already in the Introduction, that the availability of word-form lists was considered of priority for corpus annotation tool development, we first concentrated on the definition of the word-form lists following EAGLES recommendations for the morphosyntactic annotation to be encoded, as explained in the preceding section. It was possible to define a representation of morphosyntactic information for these word-form lists independent from a morphological tool, in such a way as to ensure that lemma dictionaries and the output of morphological modules (the ones produced for MULTEXT or others) be compatible and easily mappable to such lists. Following current practices for NLP, the notation used should represent information in attribute/value formalisms (as was done also in EAGLES) and should also be self-informative for human inspection and understanding. Considerations concerning the desirability that these descriptions are able to provide information about language-specific characteristics, where also taken into account. Following these ideas, a notation format was suggested whose main characteristics are:
These characteristics make the proposed lexical description notation (see section 3.1 for more details) synonymous with attribute/value pairs used in current unification formalisms. The next sections introduce such formalism and the information to be encoded.
The notation format proposed
to represent lexical descriptions consists of
linear strings of characters representing the morphosyntactic information to
be associated with word-forms. The string is constructed following the
philosophy of the Intermediate Format proposed in the EAGLES Corpus proposal
(Leech and Wilson, 1994), i.e. of having agreed symbols in predefined and
fixed positions: the positions of a string of characters are numbered 0,
1,2, etc. in the following way:
a. the agreed character at position 0 encodes part-of-speech;
b. each character at position 1, 2, n, encodes the value of one attribute
(person, gender, number, etc.);
c. if an attribute does not apply, the corresponding position in the string
contains a special marker, in our case `-' (hyphen).
Example: Ncms- (noun,common,masculine,singular,nocase)
This notation adopts the EAGLES Intermediate Format with a small
revision: the Intermediate Format encodes information by
means of digits, while in MULTEXT characters of a mnemonic nature
are preferred.
It is worth noting here that this representation is proposed for
word-form lists which will be used for a specific application, i.e.
corpus annotation. We have
foreseen these lexical descriptions as containing a full description of
lexical items. As
noted above, the sets of tags, to be used properly for automatic
corpus annotation tools, are expected to contain less information.
These lexical descriptions can be seen as notational variants of the feature-based notation in the form of attribute-value pairs. In fact, the string notation proposed, e.g.
Ex.: Ncms- (noun,common,masculine,singular,nocase)is completely synonymous to a feature-structure representation:
Ex.: {cat=noun, type=common, gender=masculine, number=singular, case=none}
or
{cat=noun, type=common, gender=masculine, number=singular}
The above feature structures are often also represented as follows:
+- -+
| Cat: Noun |
| Type: common |
| Gender: masculine |
| Number: singular |
+- -+
Formal characteristics relevant for our applications have been
kept.
Use of
position in the string to
encode attributes makes no restrictions on the set of
characters to be used as values. It could then be inferred that, if we
wanted to keep the formal characteristic of order independent notation,
we would have to make sure that the characters meant to represent
attribute-values
are not ambiguous. As attributes and values are linked by
positional criteria, the need of a special marker for void
attribute-value pairs is evident if we want to keep descriptions
coherent. Thus, the ``Ncms-"
style can be viewed as a short-hand notation
convenient for some users and straightforwardly mappable to the
information used in unification-based attribute-value pairs
formalisms.
When comparing MULTEXT lexical description representation format with other notations one must keep in mind that they are intended to describe word-forms, and are used in very large lexical lists which contain word-forms. It seems to us relevant to comment on this point because, although it can be justified (and we will do so below) that the same formal operations can be declared in both styles, there is little evidence for justifying the need of operations such as negation and disjunction of features and values when applying them to tagged word-forms as a result of corpus annotation.
a. not
applicable given a particular combination of attributes/values, i.e.
although the attribute applies to the category in a given language, it
does not apply to a particular subclass of the category.
b. not applicable to a particular lexical item, although the attribute
applies to the rest of its paradigm.
Example: in the description of pronouns, for personal pronouns the grammatical person is to be encoded, but for demonstrative pronouns it is avoided; in this case '-' is applied following (a). On the other hand, gender cannot be informative for some personal pronouns, but it is still relevant for other personal pronouns; the application of `-' follows (b):
Pd-ms "Este" Pronoun, demonstrative, masculine, singular. Pp1-s "Yo" Pronoun, personal, first, singular. Pp1mp "Nosotros" Pronoun, personal, first, masculine, plural. Pp1fp "Nosotras" Pronoun, personal, first, feminine, plural.Their uses are clearly not equivalent, but there would only be meaningful differences would occur in highly typed theories of lexical description. For illustrating this point let'us have the following type system for pronouns:
TYPES SUBTYPES ATTRIBUTES VALUES
Pronoun gender masculine
feminine
number singular
plural
Demonstrative
Personal person 1
2
3
For this system, gender and number attributes belong to the set of
features which describe all pronouns. Person will only belong to the
set of features which describe personal pronouns - in addition to
gender and to number. Applied to this type system,
case (a) would
mean that the attribute-value pair does not
belong to the set of features which describe a subtype, while (b)
would mean indeterminacy of a given word-form (which could be
expressed as a disjunction of all the values for the particular
attribute or leaving a void for
the value, being open to unification;
this choice mainly depends on the purpose of the description, e.g.
syntactic parsing).
|phon este|
|cat |gender masc||
| 'dem'|number sing||
|phon yo |
|cat |gender [] ||
| 'pers'|number sing||
| |person 1 ||
In simpler flat type systems where distinctions are made only for the
generic type ``pronoun", both cases a. and b. will be
treated by unification mechanisms in the same way.
From the conversion point of view, we have to be concerned with the
output of the MULTEXT morphological tool, as it will be the source of
word-form lexical lists. The
Mmorph tool does not incorporate a highly
hierarchical typing system and thus no problems are expected in
converting Mmorph output into lexical descriptions of the proposed
format, if desired. The
results from applying the Mmorph tool
will probably (it strongly depends on implementation
strategies) be the following:
1. a non present attribute in the description attached to the
word-form;
2. a disjunction expression, i.e. {gender=masc
fem};
3. encoded as a third possible value, i.e. {gender=none}.
The simplest case for converting would be the third one, as then automatic non-intelligent conversion is possible. In the first two cases the conversion routine will have to make some inferences on type declarations. It is also expected, that when converting from other lexical sources, special conversion routines will have to be used. As seen above, the conversion from ``Ncms" lexical description notation into other unification based format will only be difficult if the target formalism is a highly typed system. If this is not the case, the presence of the ``not-applicable" marker will have to be converted into a special value or into nothing, leaving it open. For conversion into highly typed system it might be useful to have cases (a) and (b) marked by different characters, in order to guide an intelligent conversion routine to the desired results.
The tags (see the examples below) used to exemplify issues and
problems to be dealt with in the mapping between lexical descriptions
and corpus tags, come from the tagsets proposed in the
language-specific applications of four of the MULTEXT partners.
These tagsets (containing dfferences among them, because
constructed on the basis of tagging practices already
used by the partners) should be considered as a preliminary proposal
to be discussed for harmozation
and refined after experimentations on the MULTEXT
tagger.
Mapping of these lexical descriptions into corpus tags has also
been taken into
account. It is also considered desirable to see whether under-informative
corpus tags can be directly mappable to the lexical descriptions each one
subsumes.
Decisions about corpus tags are language dependent. The information to
be encoded depends on the ability of a given tool to disambiguate
between different potential lexical descriptions for a
given word-form. We have already mentioned the key concepts to be
applied for defining sets of corpus tags in the preceding sections.
Therefore one can first assume that the mapping from lexical
descriptions onto corpus tags can be done with conversion tables which
relate two different items: corpus tags and lexical descriptions. These
tables are likely to be modified many times in the course of the
project, based on experimentation with the disambiguation tool.
An example of such mappings is:
Lex.spec. TAG Definition Pp1msa- P1S Personal pronoun, first person, masc. sing. accusative Px1msa- P1S Reflexive pronoun, first person, masc. sing. accusative Pp1fsa- P1S Personal pronoun, first person, fem. sing. accusative Px1fsa- P1S Reflexive pronoun, first person, fem. sing. accusative Pp1msd- P1S Personal pronoun, first person, masc. sing. dative Px1msd- P1S Reflexive pronoun, first person, masc. sing, dative Pp1fsd- P1S Personal pronoun, first person, fem. sing., dative Px1fsd- P1S Reflexive pronoun, first person, fem. sing., dativeAll these lexical descriptions correspond to the Spanish form ``me". For this word-form the tags P1S - which conflates all the possible lexical descriptions - has been decided on the basis of the assumption that an automatic tool would have disambiguation problems in assigning the correct analysis among all the lexical descriptions. The correct analyis of this word-form would require syntactic analysis.
The mapping from the lexical descriptions to the corpus tags should be
applicative, that is, ``each lexical description should map to one and
only one corpus tag, while it is not possible to do
the reverse" due to the
limitations of current tagging
techniques. The situation where corpus tags
are more precise than a lexical description (i.e. one lexical tag
corresponds to more than one corpus tag) should be, in principle,
avoided.
In order to avoid
redundancy in the conversion tables and to make
tag optimization work easier,
it has been proposed to study the
possibility of having intermediate representations which prepare
the
conflation of information and which facilitate automatic mapping from
lexical descriptions onto tags. This intermediate internal notation
makes use of ``regular expressions" which incorporate operators in
order to sum up the information referred by different lexical
descriptions and conflated in a given tag.
For the example given above,
the resulting regular expression may incorporate two operators:
``match any" (.), ``list" ([])
- other possible operators proposed are
``disjunction"
and negation
.
P[px]1.s[ad]- P1SHowever, the application of such regular expressions is still being studied as its use conveys some requirements on the conflation of lexical descriptions and on the construction of corpus tags. An example will illustrate the issues to be taken into account. For Spanish, first and third person of some tenses are homographs. This can be taken into account when conflating information:
Verbal paradigm regular exp. TAG cantaba, comi'a, veni'a Vmii[13]s- VMIIS cantari'a, comeri'a, vendri'a Vmcs[13]s- VMCSS cante, coma, venga Vmsp[13]s- VMSPS cantara, comiera, viniera Vmsi[13]s- VMSIS
For Italian, the conflation of information on homographs also in the verbal paradigm may cause problems to the applicative principle mentioned above:
Verbal paradigm lex.descr. regular exp. TAG
premiate Vmip2p- Vm([ims]p2p-)|(ps-pf) VMP2IMCPP
Vmmp2p-
Vmsp2p-
Vmps-pf
leggete Vmip2p- Vm[im]p2p- VMP2IMP
Vmmp2p-
leggiate Vmsp2p- Vmsp2p- VP2CP
lette Vmps-pf Vmps-ps VFPPR
As can be seen, if we use tags such as the ones above
which are based on the principle ``one graphical form - one tag",
there is a violation of the applicative principle,
i.e. the same lexical description will correspond to two different
tags, because of different conflation clusters.
In general,
it is observed
that the use of operators in regular expressions
results in a form of marking the information which is not
going to be expressed in the corpus tag. Thus, tags would have to contain
less information than the regular expression and hence than the lexical
description.
Another issue to be considered is the following. Having tags with little lexical information, as in the following French example, may lead to another problematic issue in cases where such regular expressions are also used in helping to recover all possible lexical information from a given ``under-specified" corpus tag. The mapping from the regular expression onto lexical descriptions will also have to take into account the word-form in order to reject possible descriptions which do not correspond to the tagged word-forms. Below are some examples from the proposed verbal tags and regular expressions:
TAG Regular expression Lexical descriptions Possible word-forms
VM1P Vm[iscm][pifs]1p-- Vmip1p-- venons
Vmii1p-- venions
Vmif1p-- viendrons
... ....
Let us consider that
the word ``venons" is tagged as ``VM1P". If we want to know which are the
lexical description to which the tag can be referring to, the explosion of
the information contained in the regular expression will also
give lexical
descriptions which do not correspond to the word ``venons", but to other
words. Regular expressions can only map a given tag for a word-form into
all possible lexical descriptions for such a word-form if
the information conflated only
reflects ambiguities due to homography. Only
with this criterion for defining tags, all
the possible lexical descriptions subsumed by the corpus tag and
expressed in the regular expressions will be true of a given
tagged word.
If the criterion for conflating information is limited to homograph ambiguities, we see - as in the following example - that all possible lexical descriptions expanded from the regular expression are true of a given word-form.
TAG Regular expression Lexical descriptions Possible word-forms
VSXICP Vm(sp.s)|(ip2s)- Vmip2s- ami
Vmsp1s- ami
Vmsp2s- ami
Vmsp3s- ami
As mentioned in the section ``Comparison of Attributes/values
used by languages", the application of the proposed operators in
regular expressions for avoiding redundancy, in some cases,
is not needed if lexical
expressions already encode the possibility of having,
for a given word-form,
more than one possible lexical description. This is the case with
the proposed values ``common" for gender, ``invariant" for number (in
Italian), or ``object" for case (in French pronouns).
Almost all the languages treated in MULTEXT have nouns, adjectives, determiners (among others) which have the same word-form both for feminine and masculine agreement. The Italian group has proposed a value for gender named ``common" which avoids having to write two different entries with the same word-form, but with different lexical descriptions. In fact, this use of a special value advances the possible use of proposed operators in the regular expression.
word-form lexical description regular expression TAG insegnante Nccs- Nccs- NNScould also be expressed as:
word-form lexical description regular expression TAG
insegnante Ncms- Nc[mf]s- or Nc.s- NNS
Ncfs- or Nc(m|f)s
The need, as well as the
consequences, for the mapping between
lexical descriptions and
corpus tags, of the regular expressions must still be regulated. It
should be noted that regular expressions can be regarded
as a convenient way to map the lexical descriptions to the corpus
tags since, in many cases, the information in the lexicon is more
precise than the information we can/want to have in the corpus tag
set. Such a mapping still seems very interesting because there are
many corpus tag systems, even for the same language, which makes it
extremely difficult to relate the one to the other. Regular
expressions could act as a common reference for the different systems
to make comparison easy. Besides, regular expressions could make
translations between the lexical description and corpus tags
easier and
enable the automatic generation of conversion tables.
The categories listed below with the relevant attributes and values are
based on EAGLES documents and are the results of a first testing based
on a proposal made by Veronis et al. 1994 for lexical specifications in
MULTEXT.
As it has already been mentioned in the section ``Background
considerations" that propose features for describing lexical items
of different languages aiming at defining a set which can be
said ``common" for all of them is a complex task. The underlying
philosophy for this task has then be to lead different groups into a
pragmatic solution where the concept of an "harmonized" set of features
could be reached.
The groups have first worked out
their lexical descriptions taking as input
EAGLES and Veronis et al. (1994) documents. The very general criterion
was to encode those proposed features which were considered relevant for
the language in question. Therefore MULTEXT also followed EAGLES
bottom-up methodology in trying to define extensively the features
``used" in the lexical descriptions for each group language,
as this procedure will
make evident the features commonly used. After this phase, whose result
can now be seen in the section ``Comparison of attribute/values used by
the groups", a new phase is envisaged as to accomodate language-specific
considerations into a general model to be used by MULTEXT. This
accomodation must take into account extensibility to other languages and
also application motivated arguments, as well as internal coherence.
For this new phase more specific criteria would be desirable with
respect the addition of new features to the EAGLES Level-1 set. The
aimed result is a ``harmonized" set of features which properly describe
lexical items of the different languages.
Following the general aim of the project, these harmonized
specifications - and the related resources - will contribute to the
standarization of the corpus annotation work. They are supposed to
serve as a user oriented additional characteristic of our tool package
in the sense that end-users will have a common ground for inspecting
and understanding the resources and tool results independently
to a
large extent of the language. This common set of features will also be
a common ground to perform comparisons of different annotation tool
results, because, as mentioned in the previous section, the existence
of many lexical description systems is causing nowadays a problem for
comparing results.
Therefore the categories and features listed below are the
common reference for the work done by the
different groups. Further discussion on this first proposal is to be
found in the section ``Comparison of the attributes/values used by the
groups" which is in turn to define criteria for changing this first
proposal.
Tables of categories
=============== ====
Part-of-Speech Code
=============== ====
Noun N
Verb V
Adjective A
Pronoun P
Determiner D (for those who do not have a separate category
Article T for Articles, these are included in Determiner)
Adverb R
Adposition S
Conjunction C
Numeral M
Interjection I
Unique U
Residual X
Abbreviation Y
=============== ====
Each character at positions 1, 2, etc. encodes the value of one
attribute (person, gender, number, etc.), according to the tables
given below.
2.2.2 Attribute/value tables
----------------------------
Abbreviations used:
P Position (starts with 0 for encoding PoS values)
ATT Attribute name
VAL Value
C Code
1. Nouns (N)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type common c
proper p
- -------------- -------------- -
2 Gender masculine m
feminine f
neuter n
- -------------- -------------- -
3 Number singular s
plural p
- -------------- -------------- -
4 Case nominative n
genitive g
dative d
accusative a
= ============== ============== =
2. Verbs (V)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type main m
auxiliary a
modal o
- -------------- -------------- -
2 Mood/VForm indicative i
subjunctive s
imperative m
conditional c
infinitive n
participle p
gerund g
supine s
base b
- -------------- -------------- -
3 Tense present p
imperfect i
future f
past s
- -------------- -------------- -
4 Person first 1
second 2
third 3
- -------------- -------------- -
5 Number singular s
plural p
- -------------- -------------- -
6 Gender masculine m
feminine f
neuter n
= ============== ============== =
3. Adjectives (A)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type qualificative f
ordinal o
cardinal c
indefinite i
possessive s
- -------------- -------------- -
2 Degree positive p
comparative c
superlative s
- -------------- -------------- -
3 Gender masculine m
feminine f
neuter n
- -------------- -------------- -
4 Number singular s
plural p
- -------------- -------------- -
5 Case nominative n
genitive g
dative d
accusative a
= ============== ============== =
4. Pronouns (P)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type personal p
demonstrative d
indefinite i
possessive s
interrogative t
relative r
exclamative e
reflexive x
reciprocal l
- -------------- -------------- -
2 Person first 1
second 2
third 3
- -------------- -------------- -
3 Gender masculine m
feminine f
neuter n
- -------------- -------------- -
4 Number singular s
plural p
- -------------- -------------- -
5 Case nominative n
genitive g
dative d
accusative a
oblique o
object j
- -------------- -------------- -
6 Possessor singular s
plural p
= ============== ============== =
5. Determiners (D)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type demonstrative d
indefinite i
possessive s
interrogative t
- -------------- -------------- -
2 Person first 1
second 2
third 3
- -------------- -------------- -
3 Gender masculine m
feminine f
neuter n
- -------------- -------------- -
4 Number singular s
plural p
- -------------- -------------- -
5 Case nominative n
genitive g
dative d
accusative a
oblique o
- -------------- -------------- -
6 Possessor singular s
plural p
= ============== ============== =
6. Articles (T)
= ============ =============== =
P ATT VAL C
= ============ =============== =
1 Type definite d
indefinite i
------------- ---------------- -
2 Gender masculine m
feminine f
neuter n
------------- ---------------- -
3 Number singular s
plural p
------------- ----------------- -
4 Case nominative n
genitive g
dative d
accusative a
= ============ ================ =
7. Adverbs (R)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type general g
particle p
- -------------- -------------- -
2 Degree positive p
comparative c
superlative s
= ============== ============== =
8. Adpositions (S)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type preposition p
postposition t
circumposition c
- -------------- -------------- -
2 Formation simple s
compound c
= ============== ============== =
9. Conjunctions (C)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type coordinating c
subordinating s
= ============== ============== =
10. Numerals (M)
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type cardinal c
ordinal o
- -------------- -------------- -
2 Gender masculine m
feminine f
neuter n
- -------------- -------------- -
3 Number singular s
plural p
- -------------- -------------- -
5 Case nominative n
genitive g
dative d
accusative a
= ============== ============== =
11. Interjections (I)
12. Unique membership class (U)
13. Residual (X)
14. Abbreviations (Y)
The following tables reflect the attributes and values used for
lexical description in MULTEXT. They take into account input supplied
by different groups which the reader can find further detailed in
specific language application annexes (see section 5).
It is worth noting that the
tables reflect both
features used by five groups using lexical descriptions
as proposed in previous versions of this document and also
features for Dutch
resulting from morphological generation with the
``mmorph" tool. The
comparison is certainly
of help in order to
have a clear picture of the level of consensus
reached with respect to harmonization when elaborating lexical lists.
It has already been mentioned in previous sections that the project is
working towards defining criteria for the application of EAGLES
guidelines on standardization of lexical resources for easy
re-usability.
Looking at the tables below some very general issues arise with
respect to the application criteria of the general tables to the
particular languages and the interpretation of the general guidelines.
Until now groups have been working on the assumption that they must
encode recommended Level-1 EAGLES features if they are relevant
to their languages. The possibility of adding new
values and new attribute/value pairs was also foreseen
if recommended features
were not enough to describe lexical items with
fine-granularity of
lexical descriptions. It was also
found useful in view of
supplying lexical material
to be used by other tools than the MULTEXT ones. This
openness has led to a number of incoherencies with respect to
application criteria which we summarize
in the points below. A decision with respect to general criteria for
application must be reached in the
next phase. Hence, here the
issues concerning harmonization which arise from comparing application
sections follow.
There is an unbalanced treatment of features considered as ``general"
for the different categories. The presence of a particular attribute
seems to be mainly justified for two reasons:
- representative in most of the studied languages;
- linguistic tradition.
We see in the comparative tables that a particular language is allowed
to add a new
attribute because of its relevance for the lexical description of a
given category (i.e.
when the language items belonging to that category are
inflected or marked with respect to it). The most evident case is the
proposal made in order to encode Possessor-gender (among other
features) for Pronouns and Determiners. It is obvious that this
feature cannot be used by languages which do not have different forms
regarding this particular distinction.
On the other hand, note that ``case" as a feature recommended as
``general" for describing Nouns, Adjectives, Pronouns, Determiners,
Articles and Numerals, is in fact used only for Pronouns by most of
the languages, and only German can apply it for the rest of
the categories.
What we mean by ``unbalanced treatment" has to do with the fact that
features being used by just a few languages, or even just one, receive
different treatment when considering them ``general" or ``language
specific".
Also arising from the possibility of adding language specific
attributes and values where relevant for a given description, the
procedure followed in this task has shown that it has not
been easy to reach a consensus in order to harmonize a number of
specific features
and values considered by a given language.
One of
cases of such proliferation of features is seen, in fact,
when considering the
comparative table for Determiners. One of the groups suggests having
language specific types to refer to ``definite article" and
``indefinite article", while other groups prefer to have a general
type
``article" and other attributes, i.e. Quantification or Definiteness to
encode this distinction at a lower level. The particular
features suggested by the groups which, in our opinion, could be adapted
to the EAGLES model will be discussed during the next phase.
Because of this openness with respect to
adding attibutes and values, we
would like to point out the case for the values ``common" and
``invariant" added by the Italian descriptions to all nominal inflected
categories for the attributes ``gender" and ``number" respectively
(where a disjunction of values could be used instead). It
is a fact that most of the languages could easily adopt this value for
the forms which are identical for masculine/feminine, singular/plural
agreement features, but this issue has certainly to be clarified and
further discussed.
Probably a decision with respect to the ``fine
granularity" of lexical descriptions should be devised. In fact there
is another example in another category of the same strategy, that
is to conflate in a new value for a given attribute a homography
which causes explosion of entries. The
French group has suggested conflating
accusative/dative values for pronominal case into ``object" as
a generic value. The new division proposed would also apply to other
romance languages but it might compromise the ``fine granularity"
tendency the project aimed at for lexical descriptions.
There can also be observed a certain confrontation of two different
traditions when some
groups propose to add a new attribute to characterise an
element while others propose to add a new value to label a new class
under a general,
already available attribute such as ``type". To add a new
attribute would correspond to the unification based grammar practice,
and a label for a class would correspond to the so called ``taxonomic"
theories. We see an example of this confrontation in the proposal of
having an attribute/value ``wh" for marking relative particles in
different categories: pronouns, determiners, adverbs. EAGLES level-1
seems to prefer separating relatives with a different value for the
attribute ``type" of pronouns and determiners. Marking as an additional
feature the relative characteristics of a given pronominal would help
for instance to specifically characterise items such as the
English ``whose"
or Spanish ``cuyo, cuya, cuyos, cuyas" which are normally described as
Possessive relative pronouns. Under the current classification a decision
must be made either
to put them under the Possessive or the Relative value of
the attribute ``type". It has also to be mentioned that no special
treatment can be made for relative adverbs which are not taken as a
separate class under Adverb type in the EAGLES proposal. Thus, from the
comparison made, it is worth mentioning that a new
attribute ``wh" for adverbs or, as suggested by the German group, a new
value for interrogative - and also for relatives -
adverbs should be devised.
As we have seen, the
EAGLES recommendations lack in some cases the desired
fine-grained distinctions which groups working in MULTEXT consider
desirable for our applications. Another example of this case is raised
by German and English. The groups dealing with these languages - and it
could also be applied to the rest of the
languages - have suggested a
specific value for comparative conjunctions. This addition seems
reasonable under the argument that it is an important feature with
respect to distributional criteria and can be of great importance for
tagging purposes. Again, some guidelines must be defined for considering
the addition of features not contained in EAGLES level-1, but it is
worth noting that several of the features added for language specific
reasons could be considered as applicable to the rest of the
languages.
We recommend a new round of discussions
on the new features suggested in
specific language applications to see whether they can be of use in
our concrete application and applicable to the rest of the
languages. Once
this discussion has led to conclusions, the approved features must be
included in the general model. Besides linguistic considerations,
having an agreed set of general features is of great concern for the
chosen notation style in lexical descriptions.
There must be regulations with
respect to the encoding of language specific attributes by the other
groups or on the ways of differentiating them from those of the general
model. This is especially relevant if general conversion routines are to
be developed. And because of theroretical coherence, the treatment
given to these features must take into account the above mentioned
``unbalanced treatment of features". Some other doubts remain in
connection with theoretical coherence and the applied nature of the
lexica to be supplied. We would only mention one of them to
illustrate the kind of issues which must be taken into account in the
next phase. It has to do with agreement features of person, number and
gender. Are they to be encoded with respect to grammatical agreement
or with respect to semantic differentiations. As it is now, following
EAGLES recommendations, it seems as if only semantic considerations
are taken into account, i.e. Possessive-person
of determiners is taken as
the ``possessor person" for most of the languages which in fact does
not trigger agreement.
A decision must be taken with respect to these cases and more specific
guidelines must be established for further development of lexical
descriptions. It seems from the comparison made that the general
criteria ``relevant for your language" is not enough. New guidelines
must also take the application side into account.
Comparison tables
Abbreviations used:
P = Position (starts with 0 for encoding PoS values)
ATT = Attribute name
VAL = Value
C = Code
x = value marked by a given group (any character other than x
means that a given 'language group' codes, in their
application, the relevant value with that character, not using
the agreed one).
The column of characters is left empty in correspondence of
language specific attributes/values of
Dutch: they are attested, in fact, among the set of attributes
and values for Dutch implementation of Mmorph, where they are
not represented by means of single codes.
1. Nouns (N)
Features used by the groups IT DE ES FR NL EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type common c x x x x x x
proper p x x x x x x
- -------------- -------------- -
2 Gender masculine m x x x x x
feminine f x x x x x
neuter n x x
l-s. common c x
l-s. De x
l-s. Het x
l-s. None x
- -------------- -------------- -
3 Number singular s x x x x x x
plural p x x x x x x
l-s. invariant n x
- -------------- -------------- -
4 Case nominative n x
genitive g x
dative d x
accusative a x
= ============== ============== =
5 Sem-gender M x
F x
N x
- -------------- -------------- -
2. Verbs (V)
Features used by the groups
IT GE SP FR DU EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type main m x x x x x v
auxiliary a x x x x x x
modal o x x m
l-s. copula x
l-s. impersonal x
- -------------- -------------- -
2 Mood/VForm indicative i x x x x
subjunctive s x x x x
imperative m x x x x
conditional c x x x
infinitive n x x x x x
participle p x x x x
gerund g x x
supine s
base b x
l-s. inf. + particle u x
l-s. ImPart x
l-s. Past participle
l-s. Present participle
l-s. PerfPart x
l-s. Fin x
- -------------- -------------- -
3 Tense present p x x x x x x
imperfect i x x x x
future f x x x
past s x x x x x
- -------------- -------------- -
4 Person first 1 x x x x x x
second 2 x x x x x x
third 3 x x x x x x
- -------------- -------------- -
5 Number singular s x x x x x x
plural p x x x x x x
- -------------- -------------- -
6 Gender masculine m x x x
feminine f x x x
neuter n
l-s. common c x
= ============== ============== =
7 Clitic l-s. no n x x
yes y x x
- -------------- -------------- -
8 Clitic l-s. both t x
accusa a x
dative d x
- -------------- -------------- -
3. Adjectives (A)
Features used by the groups
IT DE ES FR NL EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type qualificative f x x x
ordinal o x x
cardinal c x x
indefinite i x
possessive s x x x
l-s. part1 1 x
part2 2 x
- -------------- -------------- -
2 Degree positive p x x x x x x
comparative c x x x x x x
superlative s x x x x x
- -------------- -------------- -
3 Gender masculine m x x x x
feminine f x x x x
neuter n x
l-s. common c x
- -------------- -------------- -
4 Number singular s x x x x
plural p x x x x
l-spc. invariant n x
- -------------- -------------- -
5 Case nominative n x
genitive g x
dative d x
accusative a x
= ============== ============== =
6 Position l-spc. attributive a x
predicative p x
- -------------- -------------- -
4. Pronouns (P)
Features used by the groups
IT DE ES FR NL EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type personal p x x x x x x
demonstrative d x x x x x
indefinite i x x x x
possessive s x x x x
interrogative t x x x x x
relative r x x x x x
exclamative e x
reflexive x x x x
reciprocal l x
l-s. general g x
l-s. quantificational x
- -------------- -------------- -
2 Person first 1 x x x x x x
second 2 x x x x x x
third 3 x x x x x x
- -------------- -------------- -
3 Gender masculine m x x x x
feminine f x x x x
neuter n x x x
l-s. common c x
- -------------- -------------- -
4 Number singular s x x x x x x
plural p x x x x x x
l-s. invariant n x
- -------------- -------------- -
5 Case nominative n x x x
genitive g x
dative d x x
accusative a x x
oblique o x x
object j x
l-s. 1 x
l-s. 4 x
- -------------- -------------- -
6 Possessor singular s x x x
plural p x x x
= ============== ============== =
7 Wh Not-wh n x
Relative r x
Int q x
- -------------- -------------- -
8 Poss-person First 1 x
Second 2 x
Third 3 x
- -------------- -------------- -
9 Poss-gender Masculine m x
Femenine f x
Neuter n x
- -------------- -------------- -
10Sem-gender M x
F x
N x
- -------------- -------------- -
5. Determiners (D)
Features used by the groups
IT DE ES FR NL EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type demonstrative d x x x x x
indefinite i x x x x
possessive s x x x x x x
interrogative t x x x x
exclamative e x
relative r x
article a x x
l-s. Def-article t x
l-s. Indef-article a x
l-s. General g x
l-s. quantificational x
- -------------- -------------- -
2 Person first 1 x x x x
second 2 x x x x
third 3 x x x x
- -------------- -------------- -
3 Gender masculine m x x x x x
feminine f x x x x x
neuter n x x x
l-s common c x
- -------------- -------------- -
4 Number singular s x x x x x x
plural p x x x x x x
l-s invariant n x
- -------------- -------------- -
5 Case nominative n x
genitive g x
dative d x
accusative a x
oblique o
- -------------- -------------- -
6 Possessor singular s x x
plural p x x
= ============== ============== =
7 Quantif./or definite d x x
Defness indefinite i x x
- -------------- -------------- -
8 Wh Not-wh n x
Relative r x
Int/Ecl q x
- -------------- -------------- -
9 Poss-person First 1 x
Second 2 x
Third 3 x
- -------------- -------------- -
10 Poss-gender Masculine m x
Feminine f x
Neuter n x
- ------------ --------------- -
6. Articles (T)
Features used by the groups
IT DE ES FR NL EN
= ============ =============== =
P ATT VAL C
= ============ =============== =
1 Type definite d x x x
indefinite i x x x
------------- ---------------- -
2 Gender masculine m x x x
feminine f x x x
neuter n x x x
l-s. common c x
------------- ---------------- -
3 Number singular s x x x
plural p x x x
------------- ----------------- -
4 Case nominative n x
genitive g x
dative d x
accusative a x
= ============ ================ =
7. Adverbs (R)
Features used by the groups
IT DE ES FR NL EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type general g x x
particle p x x
l-s. degree d x
l-s. interrogative i x
l-s. conjunction c x
l-s. modal m x
l-s. pronom p x
l-s. temporal t x
l-s. place l x
- -------------- -------------- -
2 Degree positive p x x x x x
comparative c x x x x
superlative s x x x x
l-s. negative n x
= ============== ============== ==
3 Function mod x
spe x
- -------------- -------------- --
4 Wh-ness interrogative q x
relative r x
no n x
- -------------- -------------- --
8. Adpositions (S)
Features used by the groups
IT DE ES FR NL EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type preposition p x x x x x x
postposition t x x x
circumposition c x
l-s. part1 a x
l-s. part2 z x
- -------------- -------------- -
2 Formation simple s x x x
compound c x x
= ============== ============== =
3 Gender masculine m x
femenine f x
common c x
- -------------- -------------- -
4 Number singular s x
plural p x
- -------------- -------------- -
9. Conjunctions (C)
Features used by the groups
IT DE ES FR NL EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type coordinating c x x x x x x
subordinating s x x x x x x
l-spc. compar v x x
l-spc. infinitive i x
l-spc. part1 a x
l-spc. part2 z x
= ============== ============== =
2 ctype finite f x
that t x
subjunctive s x
- -------------- -------------- -
3 coord-posit. initial i x
non-initial n x
- -------------- -------------- -
10. Numerals (M)
Features used by the groups
IT DE ES FR NL EN
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type cardinal c x x x x
ordinal o x x x
- -------------- -------------- -
2 Gender masculine m x x x
feminine f x x x
neuter n
- -------------- -------------- -
3 Number singular s x x x
plural p x x x
- -------------- -------------- -
5 Case nominative n
genitive g
dative d
accusative a
= ============== ============== =
Categories used by the groups
IT DE ES FR NL EN
11. Interjections (I) x x x x x
12. Unique membership class (U)
13. Residual (X) x x x
14. Particle (Q) x
15. Punctuation (F) x x
16. Abreviations (Y) x
Current enconding practices use widely different naming conventions for
corpus tags. We can find different sets of labels also for the same
language - for example SUBSMS, SBMS, NCMS, Nms, etc. can represent
"Common noun, masc. sing." in different systems for the very same
language.
It has been found, as already mentioned, that corpus tags are strongly
committed to the tool and to the language. Therefore, each language will
have its own set based on different considerations. However it was
considered helpful to suggest some naming conventions for the sake of
harmonization. The following is an attempt done by the French partners
to give simple general guidelines for achieving a coherent naming
convention within the project.
This is not a formal system, and may lead to ambiguities. In order to
have a final set of tags a thourough testing must be performed as
experimentation is going to show the behaviour of a given set. Also
considerations coming from the decision taken with respect to the need
and usefulness of special devices for automatic conversion are expected
to have some impact in the concrete tags given for a language. Thus,
the tags proposed by each group for the time being must be considered
temptative until the end of the experimentation phase.
In this section the MULTEXT set of lexicon
specifications is applied to
Italian (Calzolari and Monachini 1994).
The language-specific values added for Italian are highligthed with
the code `l-spec'.
Furthermore, a preliminary tagset for Italian is proposed. This is based on the tagset used by our tagger, but also takes into account the criteria expressed above for the construction of the tagset, and the results of a first cycle of experimentations on the MULTEXT tagger.
A table containing the the translation of the tag into the regular expression and its definition is presented, i.e.
TAG Reg.expr. Definition
NMS Ncms- Common noun, masc.sing.
A table displaying the mapping between lexicon specifications and corpus
tags is provided, along with an examplification.
5.1.1 Nouns (N)
---------------
5.1.1.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type common libro c
proper Gianni p
------------ ----------- ----------- ----
Gender masculine uomo m
feminine donna f
l-spec common insegnante c
------------ ----------- ----------- ----
Number singular uomini s
plural donne p
l-spec invariant attivita' n
------------ ----------- ----------- ----
Case (n.a.) (n.a.) -
============ =========== =========== ====
5.1.1.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
NMS Ncms- Common noun, masc. sing.
NMP Ncmp- Common noun, masc. plur.
NMN Ncmn- Common noun, masc. invar.
NFS Ncfs- Common noun, fem. sing.
NFP Ncfp- Common noun, fem. plur.
NFN Ncfn- Common noun, fem. invar.
NNS Nccs- Common noun, comm. sing.
NNP Nccp- Common noun, comm. plur.
NNN Nccn- Common noun, comm. invar.
NP Np..- Proper noun
======= ================== ====================================
5.1.1.3 Combinations
========= ======= =============================================
Lexicon Corpus Example
========= ======= =============================================
Ncms- NMS libro
Ncmp- NMP libri
Ncmn- NMN re, caffe' (il/i)
Ncfs- NFS casa
Ncfp- NFP case
Ncfn- NFN attivita' (la/le)
Nccs- NNS insegnante (un/una)
Nccp- NNP insegnanti (gli/le)
Nccn- NNN sosia (il/la, i/le)
Np..- NP Mario, Maria, Borboni
========= ======= =============================================
5.1.1.4 Some obsevations for the corpus tagset
The idea of the French group to tag Proper Nouns simply with NP
(collapsing the information on gender and number) seems the best
solution.
5.1.2 Verb (V)
--------------
5.1.2.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type main amare m
auxiliary avere a
------------ ----------- ----------- ----
Mood/VForm indicative amo i
subjunctive ami s
imperative ama m
conditional amerei c
infinitive amare n
participle amato p
gerund amando g
------------ ----------- ----------- ----
Tense present amo p
imperfect amavo i
future amero' f
past amai s
------------ ----------- ----------- ----
Person first amo 1
second ami 2
third ama 3
------------ ----------- ----------- ----
Number singular amo s
plural amiamo p
------------ ----------- ----------- ----
Gender masculine amato m
feminine amata f
l-spec common amante c
============ =========== =========== ====
5.1.2.2 Corpus
========= ======================= ======================================
Tag Regular Expression Definition
========= ======================= ======================================
VAS1IP Vaip1s- Aux. Verb, 1st pers.sing., pres.indic.
VAS2IP Vaip2s- Aux. Verb, 2nd pers.sing., pres.indic.
VAS3IP Vaip3s- Aux. Verb, 3rd pers.sing., pres.indic.
VAP2IP Vaip2p- Aux. Verb, 2nd pers.plur., pres.indic.
VAP3IP Vaip3p- Aux. Verb, 3rd pers.plur., pres.indic.
VAP1ICP Va[is]p1p- Aux. Verb, 1stpers.plur.,pres.indic/cong
VAY^2IP Vaip(1s|3p)- Aux. Verb, 1st sing./3rd plur., pres.indic
VAS1II Vaii1s- Aux. Verb, 1st pers.sing., impf.indic.
VAS2II Vaii2s- Aux. Verb, 2nd pers.sing., impf.indic.
VAS3II Vaii3s- Aux. Verb, 3rd pers.sing., impf.indic.
VAP1II Vaii1p- Aux. Verb, 1st pers.plur., impf.indic.
VAP2II Vaii2p- Aux. Verb, 2nd pers.plur., impf.indic.
VAP3II Vaii3p- Aux. Verb, 3rd pers.plur., impf.indic.
VAS1IF Vaif1s- Aux. Verb, 1st pers.sing., fut. indic.
VAS2IF Vaif2s- Aux. Verb, 2nd pers.sing., fut. indic.
VAS3IF Vaif3s- Aux. Verb, 3rd pers.sing., fut. indic.
VAP1IF Vaif1p- Aux. Verb, 1st pers.plur., fut. indic.
VAP2IF Vaif2p- Aux. Verb, 2nd pers.plur., fut. indic.
VAP3IF Vaif3p- Aux. Verb, 3rd pers.plur., fut. indic.
VAS1IR Vais1s- Aux. Verb, 1st pers.sing., past indic.
VAS2IR Vais2s- Aux. Verb, 2nd pers.sing., past indic.
VAS3IR Vais3s- Aux. Verb, 3rd pers.sing., past indic.
VAP1IR Vais1p- Aux. Verb, 1st pers.plur., past indic.
VAP3IR Vais3p- Aux. Verb, 3rd pers.plur., past indic.
VAP2ICR Va(is)|(si)2p- Aux. Verb, 2nd p.pl., past indic./pres.cong
VASXCP Vacp.s- Aux. Verb, 1/2/3 p. sing., pres.subj.
VAP2CMP Va[sm]p2p- Aux. Verb, 2nd pers.plur., pres.subj./imper.
VAP3CP Vasp3p- Aux. Verb, 3rd pers.plur., pres.subj.
VAS^3CI Vasi^3s- Aux. Verb, 1/2 pers.sing., impf.subj.
VAS3CI Vasi3s- Aux. Verb, 3rd pers.sing., impf.subj.
VAP1CI Vasi1p- Aux. Verb, 1st pers.plur., impf.subj.
VAP3CI Vasi3p- Aux. Verb, 3rd pers.plur., impf.subj.
VAS2MP Vamp2s- Aux. Verb, 2nd pers.sing., pres.impr.
VAS2MPE Vamp2s-y Aux. Verb, 2nd pers.sing., pres.impr. + clit.
VAP2MPE Vamp2p-y Aux. Verb, 2nd pers.plur., pres.impr. + clit.
VAS1DP Vacp1s- Aux. Verb, 1st pers.sing., pres.cond.
VAS2DP Vacp2s- Aux. Verb, 2nd pers.sing., pres.cond.
VAS3DP Vacp3s- Aux. Verb, 3rd pers.sing., pres.cond.
VAP1DP Vacp1p- Aux. Verb, 1st pers.plur., pres.cond.
VAP2DP Vacp2p- Aux. Verb, 2nd pers.plur., pres.cond.
VAP3DP Vacp3p- Aux. Verb, 3rd pers.plur., pres.cond.
VAF Vanp--- Aux. Verb, infinitive
VAFE Vanp--cy Aux. Verb, infinitive + clitic
VANSPP Vapp-sc Aux. Verb, comm.sing., pres.part.
VANPPP Vapp-pc Aux. Verb, comm.plur., pres.part.
VAMSPR Vaps-sm Aux. Verb, masc.sing., past part.
VAMPPR Vaps-pm Aux. Verb, masc.plur., past part.
VAFSPR Vaps-sf Aux. Verb, femm.sing., past part.
VAFPPR Vaps-pf Aux. Verb, femm.plur., past part.
VAMSPRE Vaps-smy Aux. Verb, masc.sing., past part. + clitic
VAMPPRE Vaps-pmy Aux. Verb, masc.plur., past part. + clitic
VAFSPRE Vaps-sfy Aux. Verb, femm.sing., past part. + clitic
VAFPPRE Vaps-pfy Aux. Verb, femm.plur., past part. + clitic
VAG Vagp--- Aux. Verb, gerund
VAGE Vagp---y Aux. Verb, gerund + clitic
VS1IP Vmip1s- Main Verb, 1st pers.sing., pres.indic
VS3IP Vmip3s- Main Verb, 3rd pers.sing., pres.indic
VP3IP Vmip3p- Main Verb, 3rd pers.plur., pres.indic
VP1ICP Vm[is]p1p Main Verb,1stpers.plur.,pres.indic/cong
VP2IMPP Vm([im]p2p-)|(ps-pf) M.V., 2nd pl., pres.indic/imper|pstprt f.pl.
VP2IMP Vm([im]p2p)- Main Verb, 2nd pl., pres.indic/imper
VSXICP Vm(sp.s)|(ip2s)- M.V., 1/2/3 sg.,pres.subj.|2ndsg. pres.indic.
VS^1IMP Vm[im]^1s- Main Verb, not 1stsg.,pres.indic./imper.
VS2IMP Vm[im]p2s- Main Verb, 2nd sg., pres.indic/imper
VP2IMCPP Vm([ims]p2p-)|(ps-pf) M.V., 2pl., pr.ind/imp/sub|pst.prt f.pl.
VS1II Vmii1s- Main Verb, 1st pers.sing., impf.indic.
VS2II Vmii2s- Main Verb, 2nd pers.sing., impf.indic.
VS3II Vmii3s- Main Verb, 3rd pers.sing., impf.indic.
VP1II Vmii1p- Main Verb, 1st pers.plur., impf.indic.
VP2II Vmii2p- Main Verb, 2nd pers.plur., impf.indic.
VP3II Vmii3p- Main Verb, 3rd pers.plur., impf.indic.
VS1IF Vmif1s- Main Verb, 1st pers.sing., fut. indic.
VS2IF Vmif2s- Main Verb, 2nd pers.sing., fut. indic.
VS3IF Vmif3s- Main Verb, 3rd pers.sing., fut. indic.
VP1IF Vmif1p- Main Verb, 1st pers.plur., fut. indic.
VP2IF Vmif2p- Main Verb, 2nd pers.plur., fut. indic.
VP3IF Vmif3p- Main Verb, 3rd pers.plur., fut. indic.
VS1IR Vmis1s- Main Verb, 1st pers.sing., past indic.
VS2IR Vmis2s- Main Verb, 2nd pers.sing., past indic.
VS3IR Vmis3s- Main Verb, 3rd pers.sing., past indic.
VP1IR Vmis1p- Main Verb, 1st pers.plur., past indic.
VP3IR Vmis3p- Main Verb, 3rd pers.plur., past indic.
VP2ICR Vm(is)|(si)2p- Main Verb, 2nd p.pl., past indic./pres.subj.
VP2CP Vmsp2p- Main Verb, 2nd pers.plur., pres.subj. amiate
VP3CP Vmsp3p- Main Verb, 3rd pers.plur., pres.subj. amino
VSXCP Vmcp.s- Main Verb, 1/2/3 p. sing., pres.subj.
VS^3CI Vmsi^3s- Main Verb, 1/2 pers.sing., impf.subj.
VS3CI Vmsi3s- Main Verb, 3rd pers.sing., impf.subj.
VP1CI Vmsi1p- Main Verb, 1st pers.plur., impf.subj.
VP3CI Vmsi3p- Main Verb, 3rd pers.plur., impf.subj.
VS2MPE Vmmp2s-y Main Verb, 2nd pers.sing., pres.impr. + clit.
VP2MPE Vmmp2p-y Main Verb, 2nd pers.plur., pres.impr. + clit.
VS1DP Vmcp1s- Main Verb, 1st pers.sing., pres.cond.
VS2DP Vmcp2s- Main Verb, 2nd pers.sing., pres.cond.
VS3DP Vmcp3s- Main Verb, 3rd pers.sing., pres.cond.
VP1DP Vmcp1p- Main Verb, 1st pers.plur., pres.cond.
VP2DP Vmcp2p- Main Verb, 2nd pers.plur., pres.cond.
VP3DP Vmcp3p- Main Verb, 3rd pers.plur., pres.cond.
VF Vmnp--- Main Verb, infinitive
VFE Vmnp---y Main Verb, infinitive + clitic
VNSPP Vmpp-sc Main Verb, comm.sing., pres.part.
VNPPP Vmpp-pc Main Verb, comm.plur., pres.part.
VMSPR Vmps-sm Main Verb, masc.sing., past part.
VMPPR Vmps-pm Main Verb, masc.plur., past part.
VFSPR Vmps-sf Main Verb, femm.sing., past part.
VFPPR Vmps-pf Main Verb, femm.plur., past part.
VMSPRE Vmps-smy Main Verb, masc.sing., past part. +c
VMPPRE Vmps-pmy Main Verb, masc.plur., past part. +c
VFSPRE Vmps-sfy Main Verb, femm.sing., past part. +c
VFPPRE Vmps-pfy Main Verb, femm.plur., past part. +c
VG Vmgp--- Main Verb, gerund
VGE Vmgp---y Main Verb, gerund + clitic
-------------------- more collapsed tagset -----------------------
VA1P Va[iscm][pifs]1p-- Aux. verb, 1st person plur.
VA1S Va[iscm][pifs]1s-- Aux. verb, 1st person sing.
VA2P Va[iscm][pifs]2p-- Aux. verb, 2nd person plur.
VA2S Va[iscm][pifs]2s-- Aux. verb, 2nd person sing.
VA3P Va[iscm][pifs]3p-- Aux. verb, 3rd person plur.
VA3S Va[iscm][pifs]3s-- Aux. verb, 3rd person sing.
VAFPPS Vaps-pf- Aux. verb, fem. plur., past part.
VAFSPS Vaps-sf- Aux. verb, fem. sing., past part.
VAMPPS Vaps-pm- Aux. verb, masc. plur., past part.
VAMSPS Vaps-sm- Aux. verb, masc. sing., past part.
VAN Vanp---- Aux. verb, infinitive
VAFE Vanp---- Aux. Verb, infinitive + enclitic
VAG Vagp---- Aux. Verb, gerund
VAGE Vagp---- Aux. Verb, gerund + enclitic
VAPP Vapp-..- Aux. verb, pres. participle
V1P Vm[iscm][pifs]1p-- Main Verb, 1st person plur.
V1S Vm[iscm][pifs]1s-- Main Verb, 1st person sing.
V2P Vm[iscm][pifs]2p-- Main Verb, 2nd person plur.
V2S Vm[iscm][pifs]2s-- Main Verb, 2nd person sing.
V3P Vm[iscm][pifs]3p-- Main Verb, 3rd person plur.
V3S Vm[iscm][pifs]3s-- Main Verb, 3rd person sing.
VFPPS Vmps-pf- Main Verb, fem. plur., past part.
VFSPS Vmps-sf- Main Verb, fem. sing., past part.
VMPPS Vmps-pm- Main Verb, masc. plur., past part.
VMSPS Vmps-sm- Main Verb, masc. plur., past part.
VF Vmnp---- Main Verb, infinitive
VFE Vmnp----y Main Verb, infinitive + enclitic
VG Vmgp---- Main Verb, gerund
VGE Vmgp----y Main Verb, gerund + enclitic
VPP Vmpp-..- Main Verb, pres. participle
--------------------- more collapsed end ----------------------
====== =================== ===================================
5.1.2.3 Combinations
============ ======== =============================================
Lexicon Corpus Example
============ ======== =============================================
Vaip1s- +++ VAS1IP ho
Vaip2s- VAS2IP hai, sei
Vaip3s- VAS3IP ha, e'
Vaip2p- VAP2IP avete, siete
Vaip3p- +++ VAP3IP hanno
Vaip1p- VAP1ICP abbiamo, siamo
Vaip1s- +++ VAY^2IP sono
Vaip3p- +++ VAY^2IP sono
Vaii1s- VAS1II avevo, ero
Vaii2s- VAS2II avevi, eri
Vaii3s- VAS3II aveva, era
Vaii1p- VAP1II avevamo, eravamo
Vaii2p- VAP2II avevate, eravate
Vaii3p- VAP3II avevano, erano
Vaif1s- VAS1IF avro', saro'
Vaif2s- VAS2IF avrai, sarai
Vaif3s- VAS3IF avra', sara'
Vaif1p- VAP1IF avremo, saremo
Vaif2p- VAP2IF avrete, sarete
Vaif3p- VAP3IF avranno, saranno
Vais1s- VAS1IR ebbi, fui
Vais2s- VAS2IR avesti, fosti
Vais3s- VAS3IR ebbe, fu
Vais1p- VAP1IR avemmo, fummo
Vais3p- VAP3IR ebbero, furono
Vais2s- VAP2ICR aveste, foste
Vasp1s- VASXCP abbia, sia
Vasp2s- VASXCP abbia, sia
Vasp3s- VASXCP abbia, sia
Vasp1p- VAP1ICP abbiamo, siamo
Vasp2p- VAP2CMP abbiate, siate
Vasp3p- VAP3CP abbiano, siano
Vasi1s- VAS^3CI avessi, fossi
Vasi2s- VAS^3CI avessi, fossi
Vasi3s- VAS3CI avesse, fosse
Vasi1p- VAP1CI avessimo, fossimo
Vasi2s- VAP2ICR aveste, foste
Vasi3p- VAP3CI avessero, fossero
Vamp2s- VAS2MP abbi, sii
Vamp2s-y VAS2MPE abbilo, siilo
Vamp2p- VAP2CMP abbiate, siate
Vamp2p-y VAP2MPE abbiatelo, siatelo
Vacp1s- VAS1DP avrei, sarei
Vacp2s- VAS2DP avresti, saresti
Vacp3s- VAS3DP avrebbe, sarebbe
Vacp1p- VAP1DP avremmo, saremmo
Vacp2p- VAP2DP avreste, sareste
Vacp3p- VAP3DP avrebbero, sarebbero
Vanp--- VAF avere, essere
Vanp--cy VAFE averlo, esserlo
Vapp-sc VANSPP avente, essente
Vapp-pc VANPPP aventi, essenti
Vaps-sm VAMSPR avuto, stato
Vaps-pm VAMPPR avuti, stati
Vaps-sf VAFSPR avuta, stata
Vaps-pf VAFPPR avute, state
Vaps-smy VAMSPRE avutolo
Vaps-pmy VAMPPRE avutili
Vaps-sfy VAFSPRE avutala
Vaps-pfy VAFPPRE avuteli
Vagp--- VAG avendo, essendo
Vagp---y VAGE avendolo, essendolo
Vmip1s- VS1IP amo, leggo, servo
Vmip2s- +++ VSXICP ami
Vmip2s- +++ VS2IMP leggi, servi
Vmip3s- --- VS^1IMP ama
Vmip3s- --- VS3IP legge, serve
Vmip1p- VP1ICP amiamo, leggiamo, serviamo
Vmip2p- *** VP2IMPP amate, servite
Vmip2p- *** VP2IMP leggete
Vmip2p- *** VP2IMCPP premiate
Vmip3p- VP3IP amano, leggono, servono
Vmii1s- VS1II amavo,
Vmii2s- VS2II amavi,
Vmii3s- VS3II amava
Vmii1p- VP1II amavano
Vmii2p- VP2II amavate
Vmii3p- VP3II amavano
Vmif1s- VS1IF amero'
Vmif2s- VS2IF amerai
Vmif3s- VS3IF amera'
Vmif1p- VP1IF ameremo
Vmif2p- VP2IF amerete
Vmif3p- VP3IF ameranno
Vmis1s- VS1IR amai
Vmis2s- VS2IR amasti
Vmis3s- VS3IR amo'
Vmis1p- VP1IR amammo
Vmis2p- VP2ICR amaste, leggeste, serviste
Vmis3p- VP3IR amarono
Vmsp1s- +++ VSXCP legga
Vmsp1s- +++ VSXICP ami
Vmsp2s- --- VSXCP legga
Vmsp2s- --- VSXICP ami
Vmsp3s- *** VSXCP legga
Vmsp3s- *** VSXICP ami
Vmsp1p- VP1ICP amiamo, leggiamo, serviamo
Vmsp2p- """ VP2CP amiate, leggiate, serviate
Vmsp2p- """ VP2ICMPP premiate
Vmsp3p- VP3CP amino, leggano, servano
Vmsi1s- VS^3CI amassi, leggessi, servissi
Vmsi2s- VS^3CI amassi, leggessi, servissi
Vmsi3s- VS3CI amasse, leggesse, servisse
Vmsi1p- VP1CI amassimo
Vmsi2p- VP2ICR amaste, leggeste, serviste
Vmsi3p- VP3CI amassero
Vmmp2s- +++ VS^1IMP ama
Vmmp2s- +++ VS2IMP leggi, servi
Vmmp2p- --- VP2IMPP amate
Vmmp2p- --- VP2IMP leggete, servite
Vmmp2p- --- VP2IMCPP premiate
Vmmp2s-y VS2MPe amalo, leggilo, servilo
Vmmp2p-y VP2MPe amatelo, leggetelo, servitelo
Vmcp1s- VS1DP amerei
Vmcp2s- VS2DP ameresti
Vmcp3s- VS3DP amarebbe
Vmcp1p- VP1DP ameremmo
Vmcp2p- VP2DP amereste
Vmcp3p- VP3DP amerebero
Vmnp--- VF amare
Vmnp---y VFE amarlo
Vmpp-sc VNSPP amante
Vmpp-pc VNPPP amanti
Vmps-sm VMSPR amato, letto, servito
Vmps-pm VMPPR amati, letti, serviti
Vmps-sf VFSPR amata, letta, servita
Vmps-pf +++ VP2IMCPP premiate
Vmps-pf +++ VP2IMPP amate, servite
Vmps-pf +++ VFPPR lette
Vmps-smy VMSPRE amatolo
Vmps-pmy VMPPRE amatili
Vmps-sfy VFSPRE amatala
Vmps-pfy VFPPRE amatele
Vmgp--- VG amando
Vmgp---y VGE amandolo
----------------------- more collapsed tagset ------------------
Vaip1s- VA1S ho
Vaip2s- VA2S hai
Vaip3s- VA3S ha
Vaip1p- VA1P abbiamo
Vaip2p- VA2P avete
Vaip3p- VA3P hanno
Vaii1s- VA1S avevo
Vaii2s- VA2S avevi
Vaii3s- VA3S aveva
Vaii1p- VA1P avevamo
Vaii2p- VA2P avevate
Vaii3p- VA3P avevano
Vaif1s- VA1S avro'
Vaif2s- VA2S avrai
Vaif3s- VA3S avra'
Vaif1p- VA1P avremo
Vaif2p- VA2P avrete
Vaif3p- VA3P avranno
Vais1s- VA1S ebbi
Vais2s- VA2S avesti
Vais3s- VA3S ebbe
Vais1p- VA1P avemmo
Vais2p- VA2P aveste
Vais3p- VA3P ebbero
Vasp1s- VA1S abbia
Vasp2s- VA2S abbia
Vasp3s- VA3S abbia
Vasp1p- VA1P abbiamo
Vasp2p- VA2P abbiate
Vasp3p- VA3P abbiano
Vasi1s- VA1S avessi
Vasi2s- VA2S avessi
Vasi3s- VA3S avesse
Vasi1p- VA1P avessimo
Vasi2p- VA2P aveste
Vasi3p- VA3P avessero
Vamp2s- VA2S abbi
Vamp2p- VA2P abbiate
Vacp1s- VA1S avrei
Vacp2s- VA2S avresti
Vacp3s- VA3S avrebbe
Vacp1p- VA1P avremmo
Vacp2p- VA2P avreste
Vacp3p- VA3P avrebbero
Vanp--- VAF avere
Vanp---y VAFE averlo
Va-cspp VANSPP avente
Va-cppp VANPPP aventi
Va-msps VAMSPR avuto
Va-mpps VAMPPR avuti
Va-fsps VAFSPR avuta
Va-fpps VAFPPR avute
Va-gp-- VAG avendo
Va-gp--y VAGE avendolo
Vmip1s- V1S amo
Vmip2s- V2S ami
Vmip3s- V3S ama
Vmip1p- V1P amiamo
Vmip2p- V2P amate
Vmip3p- V3P amano
Vmii1s- V1S amavo
Vmii2s- V2S amavi
Vmii3s- V3S amava
Vmii1p- V1P amavamo
Vmii2p- V2P amavate
Vmii3p- V3P amavano
Vmif1s- V1S amero'
Vmif2s- V2S amerai
Vmif3s- V3S amera'
Vmif1p- V1P ameremo
Vmif2p- V2P amerete
Vmif3p- V3P ameranno
Vmis1s- V1S amai
Vmis2s- V2S amasti
Vmis3s- V3S amo'
Vmis1p- V1P amammo
Vmis2p- V2P amaste
Vmis3p- V3P amarono
Vmsp1s- V1S ami
Vmsp2s- V2S ami
Vmsp3s- V3S ami
Vmsp1p- V1P amiamo
Vmsp2p- V2P amiate
Vmsp3p- V3P amino
Vmsi1s- V1S amassi
Vmsi2s- V2S amassi
Vmsi3s- V3S amasse
Vmsi1p- V1P amassimo
Vmsi2p- V2P amaste
Vmsi3p- V3P amassero
Vmmp2s- V2S ama
Vmmp2p- V2P amate
Vmcp1s- V1S amerei
Vmcp2s- V2S ameresti
Vmcp3s- V3S amerebbe
Vmcp1p- V1P ameremmo
Vmcp2p- V2P amereste
Vmcp3p- V3P amerebbero
Vmnp--- VF amare
Vmnp---y VFE amarlo
Vm-cspp VNSPP amante
Vm-cppp VNPPP amanti
Vm-msps VMSPR amato
Vm-mpps VMPPR amati
Vm-fsps VFSPR amata
Vm-fpps VFPPR amate
Vm-gp-- VG amando
Vm-gp--y VGE amandolo
========= ======= ========================================
5.1.2.4 Some observations for corpus tagset
An observation concerns the special marking for the auxiliaries: the
taggers are in general not able to disambiguate the cases in which the
auxiliaries are used as full verbs ("io ho un cane" , "i bambini sono
nel prato") from the cases when they are auxiliaries. The distinction
of the auxiliaries is used only in order to isolate 'avere' and
'essere' from the other verbs.
For verbs, two different sets of tags are proposed, the first more
fine-grained for more accurate distinctions and the latter more
coarse-grained, which follows the approach proposed by the French
group.
The collapsing proposed by the French group of Moods and Tenses, if
considered wrt to the performances of our tagger, appears restrictive:
for many unambiguous tenses and moods, the Italian tagger is able to
formulate the correct analysis (e.g. conditional, subjunctive
imperfect, indicative past etc.) and these distinctions are, in our
opinion, worth being maintained. It has to be noticed that the
ambiguities between verb forms depend also on different lexical verbs.
In Italian, the major ambiguities concerns the 2nd sing and plur of
the present indicative and imperative, ama-amate; leggi-leggete.
However, this is again not a general rule.
Another very common ambiguity is between the 2nd pers. of the
indicative and the 1st, 2nd, 3rd person of the present subjunctive.
Therefore not always it is possible to decide unambiguosly on the
person.
Some more frequent typical homographies in Italian are listed below:
VP1ICP amiamo
VP2IMP leggete
VP2IMPP amate
VP2IMCPP premiate
VP2ICR amaste
VS^3CI amassi
VSXCP legga
VAY^2IP sono
VSXICP ami
VS2IMP leggi
VS^1IMP ama
VS^1IMP ama
In the design of corpus tagsets for verbs careful attention should be
given to the enclitic phenomenon: at present our tagger is able to
recognize the presence of the clitics which is signalled by the
addition of the mark "+E" (plus clitic) to the regular verb tag.
5.1.3 Adjectives (A)
--------------------
5.1.3.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type - - -
------------------------------------ ----
Degree positive buono p
comparative migliore c
superlative buonissimo s
------------------------------------ ----
Gender masculine buono m
feminine buona f
l-spec common dolce c
------------ --- - ----- ----------- ----
Number singular buono s
plural buoni p
l-spec invariant pari n
------------ --- - ----- ----------- ----
Case (n.a.) (n.a.) -
============ =========== =========== ====
5.1.3.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
AFP A-.fp- Adjective fem. plur.
AFS A-.fs- Adjective fem. sing.
AFN A-.fn- Adjective fem. invar.
AMP A-.mp- Adjective masc. plur.
AMS A-.ms- Adjective masc. sing.
AMN A-.mn- Adjective masc. invar.
AMP A-.mp- Adjective comm. plur.
AMS A-.ms- Adjective comm. sing.
AMN A-.mn- Adjective comm. invar.
======= ================== ====================================
5.1.3.3 Combinations
========= ======= =============================================
Lexicon Corpus Example
========= ======= =============================================
A-pms- AMS vero
A-pmp- AMP veri
A-pmn- AMN oggetto (complemento/i oggetto: grammatical language)
A-pfs- AFS vera
A-pfp- AFP vere
A-pfn- AFN valore (clausola valore: juridical language)
A-pcs- ANS dolce (biscotto, torta)
A-pcp- ANP dolci (biscotti, dolci)
A-pcn- ANN pari (risultato/i, somma/e)
A-sms- AMS verissimo
A-smp- AMP verissimi
A-sfs- AFS verissima
A-sfp- AFP verissime
========= ======= =============================================
5.1.3.4 Observations
The comparative Degree applies only to a close set of adjectives (e.g.
maggiore, migliore, etc). All other adjectives form their comparatives
with "piu'" + adjective (e.g., piu' forte). Superlative is also an
analytical form (il piu' forte), but can be also synthetically formed:
grandissimo, massimo.
5.1.4. Pronouns
---------------
5.1.4.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type personal io p
demonstrat. quello d
indefinite chiunque i
possessive mio s
interrog. chi t
relative che r
exclamative quanto e
------------ ----------- ----------- ----
Person first io 1
second tu 2
third egli 3
------------ ----------- ----------- ----
Gender masculine questo m
feminine questa f
l-spec common io c
------------ ----------- ----------- ----
Number singular questo s
plural questi p
l-spec invariant che n
------------ ----------- ----------- ----
Case (n.a.) (n.a.) -
------------ ----------- ----------- ----
Possessor - - -
============ =========== =========== ====
5.1.4.2 Corpus
======= ========= ====================================
Tag Reg.Expr. Definition
======= ========= ====================================
PDMS Pd-ms-- Demonstrative pronoun masc.sing.
PDMP Pd-mp-- Demonstrative pronoun masc.plur.
PDFS Pd-fs-- Demonstrative pronoun femm.sing.
PDFP Pd-fp-- Demonstrative pronoun femm.plur.
PDNS Pd-cs-- Demonstrative pronoun comm.sing.
PDNP Pd-cp-- Demonstrative pronoun comm.plur.
PIMS Pi-ms-- Indefinite pronoun masc.sing.
PIMP Pi-mp-- Indefinite pronoun masc.plur.
PIFS Pi-fs-- Indefinite pronoun femm.sing.
PIFP Pi-fp-- Indefinite pronoun femm.plur.
PINS Pi-cs-- Indefinite pronoun comm.sing.
PINP Pi-cp-- Indefinite pronoun comm.plur.
PPMS Ps.ms-- Possessive pronoun, masc.sing.
PPMP Ps.mp-- Possessive pronoun, masc.plur.
PPFS Ps.fs-- Possessive pronoun, femm.sing.
PPFP Ps.fp-- Possessive pronoun, femm.plur.
PPNP Ps.cp-- Possessive pronoun, comm.plur.
PWNS P[tre]-cs-- Interr./Rel./Escl. pronoun, comm.sing.
PWNP P[tre]-cp-- Interr./Rel./Escl. pronoun, comm.plur.
PWNN P[tre]-cn-- Interr./Rel./Escl. pronoun, comm.plur.
PWMS P[tre]-ms-- Interr./Rel./Escl. pronoun, masc.sing.
PWMP P[tre]-mp-- Interr./Rel./Escl. pronoun, masc.plur.
PWFS P[tre]-fs-- Interr./Rel./Escl. pronoun, femm.sing.
PWFP P[tre]-fp-- Interr./Rel./Escl. pronoun, femm.plur.
PQNS1 Pp1cs-- Personal pronoun, 1st pers., comm.sing.
PQNS2 Pp2cs-- Personal pronoun, 2nd pers., comm.sing.
PQMS3 Pp3ms-- Personal pronoun, 3rd pers., masc.sing.
PQFS3 Pp3fs-- Personal pronoun, 3rd pers., femm.sing.
PQNN3 Pp3cn-- Personal pronoun, 3rd pers., comm.inv.
PQNP1 Pp1cp-- Personal pronoun, 1st pers., comm.plur.
PQNP2 Pp2cp-- Personal pronoun, 2nd pers., comm.plur.
PQNP3 Pp3cp-- Personal pronoun, 3rd pers., comm.plur.
PQMP3 Pp3mp-- Personal pronoun, 3rd pers., masc.plur.
PQFP3 Pp3fp-- Personal pronoun, 3rd pers., femm.plur.
--------- more collapsed ----------------
PFP P..fp--- Pronoun, fem. plur.
PFS P..fs--- Pronoun, fem. plur.
PMP P..mp--- Pronoun, masc. plur.
PMS P..ms--- Pronoun, masc. sing.
PNS P..cs--- Pronoun, comm. sing.
PNP P..cp--- Pronoun, comm. plur.
PNN P..cn--- Pronoun, comm. inv.
---------- more collapsed end -----------
====== ========== ==========================================
5.1.4.3 Combinations
========= ======= =============================================
Lexicon Corpus Example
========= ======= =============================================
Pd-ms-- PDMS quello, costui
Pd-mp-- PDMP quelli
Pd-fs-- PDFS quella
Pd-fp-- PDFP quelle
Pd-cs-- PDNS cio'
Pd-cp-- PDNP coloro
Pi-ms-- PIMS ognuno
Pi-mp-- PIMP alcuni
Pi-fs-- PIFS ognuna
Pi-fp-- PIFP alcune
Pi-cs-- PINS chiunque, tale
Pi-cp-- PINP tali
Ps1ms-- PPMS mio, nostro
Ps1mp-- PPMP miei
Ps1fs-- PPFS mia
Ps1fp-- PPFP mie
Ps2ms-- PPMS tuo, vostro
Ps2mp-- PPMP tuoi
Ps2fs-- PPFS tua
Ps2fp-- PPFP tue
Ps3ms-- PPMS suo
Ps3mp-- PPMP suoi
Ps3fs-- PPFS sua
Ps3fp-- PPFP sue
Ps3cp-- PPNP loro
Pt-cs-- PWNS chi? quale?
Pt-cp-- PWNP quali?
Pt-cn-- PWNN che?
Pt-ms-- PWMS quanto?
Pt-mp-- PWMP quanti?
Pt-fs-- PWFS quanta?
Pt-fp-- PWFP quante?
Pr-cn-- PWNN cui
Pr-ms-- PWMS quanto
Pr-mp-- PWMP quanti
Pr-fs-- PWFS quanta
Pr-fp-- PWFP quante
Pr-cs-- PWNS chi, quale
Pr-cp-- PWNP quali
Pe-ms-- PWMS quanto!
Pe-mp-- PWMP quanti!
Pe-fs-- PWFS quanta!
Pe-fp-- PWFP quante!
Pe-cs-- PWNS quale!
Pe-cp-- PWNP quali!
Pe-cn-- PWNN che!
Pp1cs-- PQNS1 io, me, mi
Pp2cs-- PQNS2 tu, te, ti,
Pp3ms-- PQMS3 egli, lui, esso, gli, lo
Pp3fs-- PQFS3 ella, lei, essa, le, la
Pp3cn-- PQNN3 si
Pp1cp-- PQNP1 noi, ci
Pp2cp-- PQNP2 voi, vi
Pp3cp-- PQNP3 loro,
Pp3mp-- PQMP3 essi, li
Pp3fp-- PQFP3 esse, le
--------------------- more collapsed -----------------------------
P..fp--- PFP mie, queste, quante etc.
P..fs--- PFS mia, questa, quanta etc.
P..mp--- PMP miei, questi, quanti etc.
P..ms--- PMS mio, questo, quanto etc.
P..cs--- PNS quale
P..cp--- PNP quali
P..cn--- PNN che, cui, altrui
-------------------- more collapsed end --------------------
==============================
5.1.4.4 Observations
For pronouns, the strategy of proposing two different tagsets, the one
more collapsed and the other more fine-grained is followed.
As far as the pronominal paradigm is concerned, Case is not encoded at
present in our DMI (Calzolari et al. 1983).
Personal pronouns are not lemmatized: 'gli' is not considered the
dative form of the base pronoun 'egli' (he), but constitutes a
separate entry.
The Italian pronominal paradigm is the following:
'forme toniche' (strong forms): subj (io, egli), compl (me, lui)
ama me / da' a me -- dir-obj/prep-obj --
(he loves me / he gives to me)
ama lui / da' a lui -- dir-obj/prep-obj --
(she loves him / she gives to him)
'forme atone' (weak forms): - compl (mi, gli/lo)
mi da' / mi ama -- ind-obj/dir-obj --
(he gives me / he loves me)
gli da' -- ind-obj --
(he gives him)
lo ama -- dir-obj --
(she loves him)
This paradigm can be mapped on the Case system proposed by the French
group, in the following way:
io, egli = subj = nom
mi/me = dir-obj/ind-obj/prep-obj = obj -] acc, dat, prep+obl
lui = dir-obj/prep-obj = obj -] acc, prep+obl
gli = ind-obj = dat
lo = dir-obj = acc
5.1.5 Determiners (Pronominal Adjectives) (D)
---------------------------------------------
5.1.5.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type demonstrat. questo d
indefinite ogni i
possessive mio s
interrogat. che t
exclamative quanto e this value has been added
relative quanto r this value has been added
------------ ----------- ----------- ----
Person first mio 1
second tuo 2
third suo 3
------------ ----------- ----------- ----
Gender masculine questo m
feminine questa f
l-spec common ogni c
------------ ----------- ----------- ----
Number singular quello s
plural quelli p
l-spec invariant altrui n
------------ ----------- ----------- ----
Case (n.a.) (n.a.) -
------------ ----------- ----------- ----
Possessor - - -
============ =========== =========== ====
5.1.5.2 Corpus
======= ============ ============================================
Tag Regular exp. Definition
======= ============ ============================================
DDNS Dd-ns-- Demonstrative pron.adj. comm.inv.
DDNP Dd-np-- Demonstrative pron.adj. comm.plur.
DDMS Dd-ms-- Demonstrative pron.adj. masc.sing.
DDMP Dd-mp-- Demonstrative pron.adj. masc.plur.
DDFS Dd-fs-- Demonstrative pron.adj. femm.sing.
DDFP Dd-fp-- Demonstrative pron.adj. femm.plur.
DIMS Di-ms-- Indefinite pron.adj. masc.sing.
DIMP Di-mp-- Indefinite pron.adj. masc.plur.
DIFS Di-fs-- Indefinite pron.adj. femm.sing.
DIFP Di-fp-- Indefinite pron.adj. femm.plur.
DINS Di-cs-- Indefinite pron.adj. comm.sing.
DINP Di-cp-- Indefinite pron.adj. comm.plur.
DPMS Ds.ms-- Possessive pron.adj., masc.sing.
DPMP Ds.mp-- Possessive pron.adj., masc.plur.
DPFS Ds.fs-- Possessive pron.adj., femm.sing.
DPFP Ds.fp-- Possessive pron.adj., femm.plur.
DPNN Ds-cn-- Possessive pron.adj., comm.inv.
DWNN D[tre]-cn-- Interr/Relat./escl. pron.adj., comm.inv.
DWMS D[tre]-ms-- Interr/Relat./escl. pron.adj., masc.sing.
DWMP D[tre]-mp-- Interr/Relat./escl. pron.adj., masc.plur.
DWFS D[tre]-fs-- Interr/Relat./escl. pron.adj., femm.sing.
DWFP D[tre]-fp-- Interr/Relat./escl. pron.adj., femm.plur.
DWNS D[tre]-cs-- Interr/Relat./escl. pron.adj., comm.sing.
DWNP D[tre]-cp-- Interr/Relat./escl. pron.adj., comm.plur.
--------- more collapsed ----------------
DFP D..fp--- Determiner, fem. plur.
DFS D..fs--- Determiner, fem. plur.
DMP D..mp--- Determiner, masc. plur.
DMS D..ms--- Determiner, masc. sing.
DNS D..cs--- Determiner, comm. sing.
DNP D..cp--- Determiner, comm. plur.
DNN D..cn--- Determiner, comm. inv.
--------- more collapsed end --------------
======= ============ ================================================
5.1.5.3 Combinations
========= ========== =================================
Lexicon Corpus Example
========= ========== =================================
Dd-cs-- DDNS tale
Dd-cp-- DDNP tali
Dd-ms-- DDMS quello
Dd-mp-- DDMP quelli
Dd-fs-- DDFS quella
Dd-fp-- DDFP quelle
Di-ms-- DIMS nessun
Di-mp-- DIMP alcuni
Di-fs-- DIFS nessuna
Di-fp-- DIFP alcune
Di-cs-- DINS ogni
Di-cp-- DINP quali
Ds1ms-- DPMS mio, nostro
Ds1mp-- DPMP miei
Ds1fs-- DPFS mia
Ds1fp-- DPFP mie
Ds2ms-- DPMS tuo, vostro
Ds2mp-- DPMP tuoi
Ds2fs-- DPFS tua
Ds2fp-- DPFP tue
Ds3ms-- DPMS suo
Ds3mp-- DPMP suoi
Ds3fs-- DPFS sua
Ds3fp-- DPFP sue
Ds-cn-- DPNN altrui
Dr-cn-- DWNN cui
Dr-ms-- DWMS quanto
Dr-mp-- DWMP quanti
Dr-fs-- DWFS quante
Dr-fp-- DWFP quanti
Dr-cs-- DWNS quale
Dr-cp-- DWNP quale
Dt-cn-- DWNN che
Dt-ms-- DWMS quanto
Dt-mp-- DWMP quanti
Dt-fs-- DWFS quante
Dt-fp-- DWFP quanti
Dt-cs-- DWNS quale
Dt-cp-- DWNP quale
De-cn-- DWNN che
De-cp-- DWNP quali
De-cs-- DWNS quale
De-ms-- DWMS quanto
De-mp-- DWMP quanti
De-fs-- DWFS quanta
De-fp-- DWFP quante
----------------------- more collapsed -----------------------------
D..fp--- DFP mie, queste, quante etc.
D..fs--- DFS mia, questa, quanta etc.
D..mp--- DMP miei, questi, quanti etc.
D..ms--- DMS mio, questo, quanto etc.
D..cs--- DNS quale
D..cp--- DNP quali
D..cn--- DNN altrui
----------------------- more collapsed end -----------------------
========= ========== =================================
5.1.5.4 Combinations
On the basis of the strategy adopted for Pronouns, also for Determiners
two tagsets are proposed.
5.1.6 Articles (T)
------------------
5.1.6.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type definite il d
indefinite un i
------------ ----------- ----------- ----
Gender masculine il m
feminine la f
l-spec common l' c
------------ ----------- ----------- ----
Number singular la s
plural le p
------------ ----------- ----------- ----
Case (n.a.) (n.a.) -
============ =========== =========== ====
5.1.6.2. Corpus
======== ========== ==========================================
Tag Reg.Expr. Definition
======== ========== ==========================================
RMS Tdms- Article, definite, masc.sing.
RMP Tdmp- Article, definite, masc.plur.
RFS Tdfs- Article, definite, femm.sing.
RFP Tdfp- Article, definite, femm.plur.
RNS Tdcs- Article, definite, comm.sing.
RIMS Tims- Article, indefinite, masc.sing.
RIFS Tifs- Article, indefinite, femm.sing.
======== ========== ==========================================
5.1.6.3. Combinations
========= ======== ==========================================
Lexicon Corpus Example
========= ======== ==========================================
Tdms- RMS il, lo
Tdmp- RMP i, gli
Tdfs- RFS la
Tdfp- RFP le
Tdcs- RNS l' (amico/a)
Tims- RIMS un, uno
Tifs- RIFS una, un'
================== ==========================================
5.1.7 Adverbs (R)
-----------------
5.1.7.1 Lexicon
============ ====== ===== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type - - -
------------ ----------- ----------- ----
Degree positive bene p
superlative benissimo s
============ =========== =========== ====
5.1.7.2 Corpus
======= ================== ===========================
Tag Regular Expression Definition
======= ================== ===========================
B R-p Adverb positive
BS R-s Adverb superaltive
======= ================== ===========================
5.1.7.3 Combinations
========= =========== ============================
Lexicon Corpus Example
========= =========== ============================
R-p B fortemente
R-s BS fortissimamente
========= =========== ============================
5.1.7.4. Observations
The feature Type is not encoded in the Italian lexicon.
5.1.8. Adposition (S)
---------------------
5.1.8.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type preposition di, a, da p
------------ ----------- ----------- ----
Formation simple di s
compound dello c
------------ ----------- ----------- ----
Gender masculine dello m This attribute and values
feminine alla f have been added
l-spec common dell' c
------------ ----------- ----------- ----
Number singular al s This attribute and values
plural ai p have been added
============ =========== =========== ====
5.1.8.2 Corpus
======= ================== =====================
Tag Regular Expression Definition
======= ================== =====================
E Sp- Preposition simple
EA Spc.. Preposition compound
======= ================== =====================
5.1.8.3 Combinations
========= ================ =======================
Lexicon Corpus Example
========= ================ =======================
Sp E di
Spcfs EA della
Spcfp EA delle
Spcms EA del, dello
Spcmp EA dei, degli
Spccn EA dell'
========= ================ =======================
5.1.8.4 Observations
The Italian policy for encoding fused prepositions foresees to attach
the morphological information of the article to the preposition tag.
5.1.9 Conjunctions (C)
----------------------
5.1.9.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type coordinat. e c
subordinat. perche' s
============ =========== =========== ====
5.1.9.2 Corpus
======= ================== =========================
Tag Regular Expression Definition
======= ================== =========================
CC Cc Coordinative conjunction
CS Cs Subordinative conjunction
======= ============================================
5.1.9.3 Combinations
========= =========== ============================
Lexicon Corpus Example
========= =========== ============================
Cc CC ma
Cs CS perche'
========= =========== ============================
5.1.10 Numerals (M)
-------------------
5.1.10.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type cardinal cento c
ordinal primo o
------------ ----------- ----------- ----
Gender masculine primo m
feminine prima f
------------ ----------- ----------- ----
Number singular secondo s
plural secondi p
------------ ----------- ----------- ----
Case (n.a.) (n.a.) -
============ =========== =========== ====
6.4.10.2 Corpus
======= ================== ============================
Tag Regular Expression Definition
======= ================== ============================
NMS M.ms- Numeral, masc.sing.
NFS M.fs- Numeral, femm.sing.
NMP M.mp- Numeral, masc.plur.
NFP M.fp- Numeral, femm.plur.
N Mc--- Numeral cardinal
======= ================================================
5.1.10.3 Combinations
========= ========= ===============================
Lexicon Corpus
========= ========= ===============================
M.ms- NMS primo
M.fs- NFS prima
M.mp- NMP primi
M.fp- NFP prime
Mc--- N zero, cento
========= ========= ===============================
5.1.11 Interjection (I)
-----------------------
5.1.11.1. Corpus
======= =========== =====================================
Tag Reg. Expr. Definition
======= =========== =====================================
I I Interjection
======= =========== =====================================
5.1.11.2. Combinations
======= =========== =====================================
Lexicon Corpus Example
======= =========== =====================================
I I oh
======= =========== =====================================
5.1.12 Unique membership class (U)
----------------------------------
None
5.1.13. Residual (X)
--------------------
5.1.13.2 Corpus
======= =================== ====================
Tag Regular Expression Definition
======= =================== ====================
NY ??? "Guessed" Noun
AY ??? "Guessed" Adjective
======= =================== ====================
5.1.13.3 Combinations
========= ========= ===============================
Lexicon Corpus Example
========= ========= ===============================
??? NY bit
??? AY computerizzato
========= ========= ===============================
5.1.13.4 Observations
At corpus level, we have the tag SY which is used to mark symbols,
letters, acronyms, foreign words, toponyms etc., in general unknown
words, for which a "guess" is provided.
5.1.13 Punctuation
========= ============================
Tag Example
========= ============================
punct .,;:?! etc.
========= ============================
The application of the MULTEXT encoding scheme to German has been
carried out by the German group (Steiner and Lemnitzer 1994).
It has been attempted to keep as close as possible to the conventions.
However, some deviations were unavoidable. This concerns:
a. The extension of value sets for some attributes
b. The addition of some minor classes, described in a separate section
(see Add on classes)
c. The additon or deletion of an attribute
We will try to justify the changes, or mark them as language-specific.
However, some features will be topics for further discussion.
5.2.1 Nouns (N)
---------------
5.2.1.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type common Buch c
proper Peter p
------------ ----------- ----------- ----
Gender masculine Mann m
feminine Frau f
neuter Kind n
------------ ---------- ----------- ----
Number singular Mann s
plural Frauen p
------------ ----------- ----------- ----
Case nominative Kind n
genitive Kindes g
dative Kinde d
accusative Kind a
============ =========== =========== ====
5.2.1.2 Corpus
======== ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
NCMSN Ncmsn Common noun, masc. sing., nominative
NCMSG Ncmsg Common noun, masc. sing., genitive
NCMSD Ncmsd Common noun, masc. sing., dative
NCMSA Ncmsa Common noun, masc. sing., accusative
NCMPN Ncmpn Common noun, masc. plur., nominative
NCMPG Ncmpg Common noun, masc. plur., genitive
NCMPD Ncmpd Common noun, masc. plur., dative
NCMPA Ncmpa Common noun, masc. plur., accusative
NCFSN Ncfsn Common noun, fem. sing., nominative
NCFSG Ncfsg Common noun, fem. sing., genitive
NCFSD Ncfsd Common noun, fem. sing., dative
NCFSA Ncfsa Common noun, fem. sing., accusative
NCFPN Ncfpn Common noun, fem. plur., nominative
NCFPG Ncfpg Common noun, fem. plur., genitive
NCFPD Ncfpd Common noun, fem. plur., dative
NCFPA Ncfpa Common noun, fem. plur., accusative
NCNSN Ncnsn Common noun, neut. sing., nominative
NCNSG Ncnsg Common noun, neut. sing., genitive
NCNSD Ncnsd Common noun, neut. sing., dative
NCNSA Ncnsa Common noun, neut. sing., accusative
NCNPN Ncnpn Common noun, neut. plur., nominative
NCNPG Ncnpg Common noun, neut. plur., genitive
NCNPD Ncnpd Common noun, neut. plur., dative
NCNPA Ncnpa Common noun, neut. plur., accusative
NPMSN Npmsn Proper noun, masc. sing., nominative
NPMSG Npmsg Proper noun, masc. sing., genitive
NPMSD Npmsd Proper noun, masc. sing., dative
NPMSA Npmsa Proper noun, masc. sing., accusative
NPMPN Npmpn Proper noun, masc. plur., nominative
NPMPG Npmpg Proper noun, masc. plur., genitive
NPMPD Npmpd Proper noun, masc. plur., dative
NPMPA Npmpa Proper noun, masc. plur., accusative
NPFSN Npfsn Proper noun, fem. sing., nominative
NPFSG Npfsg Proper noun, fem. sing., genitive
NPFSD Npfsd Proper noun, fem. sing., dative
NPFSA Npfsa Proper noun, fem. sing., accusative
NPFPN Npfpn Proper noun, fem. plur., nominative
NPFPG Npfpg Proper noun, fem. plur., genitive
NPFPD Npfpd Proper noun, fem. plur., dative
NPFPA Npfpa Proper noun, fem. plur., accusative
NPNSN Npnsn Proper noun, neut. sing., nominative
NPNSG Npnsg Proper noun, neut. sing., genitive
NPNSD Npnsd Proper noun, neut. sing., dative
NPNSA Npnsa Proper noun, neut. sing., accusative
NPNPN Npnpn Proper noun, neut. plur., nominative
NPNPG Npnpg Proper noun, neut. plur., genitive
NPNPD Npnpd Proper noun, neut. plur., dative
NPNPA Npnpa Proper noun, neut. plur., accusative
======= ================== ====================================
5.2.1.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Ncmsn NCMSN (der) Hund
Ncmsg NCMSG (des) Hundes
Ncmsd NCMSD (dem) Hunde
Ncmsa NCMSA (den) Hund
Ncmpn NCMPN (die) Hunde
Ncmpg NCMPG (der) Hunde
Ncmpd NCMPD (den) Hunden
Ncmpa NCMPA (die) Hunde
Ncfsn NCFSN (die) Frau
Ncfsn NCFSG (der) Frau
Ncfsd NCFSD (der) Frau
Ncfsa NCFSA (die) Frau
Ncfpn NCFPN (die) Frauen
Ncfpg NCFPG (der) Frauen
Ncfpd NCFPD (den) Frauen
Ncfpa NCFPA (die) Frauen
Ncnsn NCNSN (das) Kind
Ncnsg NCNSG (des) Kindes
Ncnsd NCNSD (dem) Kinde
Ncnsa NCNSA (das) Kind
Ncnpn NCNPN (die) Kinder
Ncnpg NCNPG (der) Kinder
Ncnpd NCNPD (den) Kindern
Ncnpa NCNPA (die) Kinder
Npmsn NPMSN Peter
Npmsg NPMSG Peters
Npmsd NPMSD Peter
Npmsa NPMSA Peter
Npmpn NPMPN Einsteins
Npmpg NPMPG Einsteins
Npmpd NPMPD Einsteins
Npmpa NPMPA Einsteins
Npfsn NPFSN Sabine
Npfsg NPFSG Sabines
Npfsd NPFSD Sabine
Npfsa NPFSA Sabine
Npfpn NPFPN Pyren"aen
Npfpg NPFPG Pyren"aen
Npfpd NPFPD Pyren"aen
Npfpa NPFPA Pyren"aen
Npnsn NPNSN Bayern
Npnsg NPNSG Bayerns
Npnsd NPNSD Bayern
Npnsa NPNSA Bayern
Npnpn NPNPN Bayerns
Npnpg NPNPG Bayerns
Npnpd NPNPD Bayerns
Npnpa NPNPA Bayerns
========= ======== ==============================================
5.2.2 Verbs (V)
---------------
5.2.2.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type main gehen m
modal sollen o this value has been added
auxiliary haben a
------------ ----------- ----------- ----
Mood indicative geht i
subjunctive gehe s
imperative geht m
infinitive gehen n
inf. with inc. this value has been added
particle wegzugehen u
participle gehend p
------------ ----------- ----------- ----
Tense present geht p
imperfect ging i
------------ ----------- ----------- ----
Person first bin 1
second bist 2
third ist 3
------------ ----------- ----------- ----
Number singular geht s
plural gehen p
------------ ----------- ----------- ----
Gender /// /// -
------------ ----------- ----------- ----
Clitic no hat n
yes hats y
============ =========== =========== ====
Notes:
a. Gender
There is no distinction in gender for the third person singular.
5.2.2.2. Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
VAIP1PN Vaip1p-n Aux. verb, 1st pers. pl. ind. pres., nonclitic
VAIP1PY Vaip1p-y Aux. verb, 1st pers. pl. ind. pres., clitic
VAII1PN Vaii1p-n Aux. verb, 1st pers. pl. ind. imp., nonclitic
VAII1PY Vaii1p-y Aux. verb, 1st pers. pl. ind. imp., clitic
VASP1PN Vasp1p-n Aux. verb, 1st pers. pl. subj. pres., nonclit
VASP1PY Vasp1p-y Aux. verb, 1st pers. pl. subj. pres., clitic
VASI1PN Vasi1p-n Aux. verb, 1st pers. pl. subj. imp., nonclitic
VASI1PY Vasi1p-y Aux. verb, 1st pers. pl. subj. imp., clitic
VAIP1SN Vaip1s-n Aux. verb, 1st pers. sg. ind. pres., nonclitic
VAIP1SY Vaip1s-y Aux. verb, 1st pers. sg. ind. pres., clitic
VAII1SN Vaii1s-n Aux. verb, 1st pers. sg. ind. imp., nonclitic
VAII1SY Vaii1s-y Aux. verb, 1st pers. sg. ind. imp., clitic
VASP1SN Vasp1s-n Aux. verb, 1st pers. sg. subj. pres., nonclit
VASP1SY Vasp1s-y Aux. verb, 1st pers. sg. subj. pres., clitic
VASI1SN Vasi1s-n Aux. verb, 1st pers. sg. subj. imp., nonclitic
VASI1SY Vasi1s-y Aux. verb, 1st pers. sg. subj. imp., clitic
VAIP2PN Vaip2p-n Aux. verb, 2nd pers. pl. ind. pres., nonclitic
VAIP2PY Vaip2p-y Aux. verb, 2nd pers. pl. ind. pres., clitic
VAII2PN Vaii2p-n Aux. verb, 2nd pers. pl. ind. imp., nonclitic
VAII2PY Vaii2p-y Aux. verb, 2nd pers. pl. ind. imp., clitic
VASP2PN Vasp2p-n Aux. verb, 2nd pers. pl. subj. pres., nonclit
VASP2PY Vasp2p-y Aux. verb, 2nd pers. pl. subj. pres., clitic
VASI2PN Vasi2p-n Aux. verb, 2nd pers. pl. subj. imp., nonclitic
VASI2PY Vasi2p-y Aux. verb, 2nd pers. pl. subj. imp., clitic
VAM2PN Vam-2p-n Aux. verb, 2nd pers. pl. imperative, nonclitic
VAM2PY Vam-2p-y Aux. verb, 2nd pers. pl. imperative, clitic
VAIP2SN Vaip2s-n Aux. verb, 2nd pers. sg. ind. pres., nonclitic
VAIP2SY Vaip2s-y Aux. verb, 2nd pers. sg. ind. pres., clitic
VAII2SN Vaii2s-n Aux. verb, 2nd pers. sg. ind. imp., nonclitic
VAII2SY Vaii2s-y Aux. verb, 2nd pers. sg. ind. imp., clitic
VASP2SN Vasp2s-n Aux. verb, 2nd pers. sg. subj. pres., nonclit
VASP2SY Vasp2s-y Aux. verb, 2nd pers. sg. subj. pres., clitic
VASI2SN Vasi2s-n Aux. verb, 2nd pers. sg. subj. imp., nonclitic
VASI2SY Vasi2s-y Aux. verb, 2nd pers. sg. subj. imp., clitic
VAM2SN Vam-2s-n Aux. verb, 2nd pers. sg. imperative, nonclitic
VAM2SY Vam-2s-y Aux. verb, 2nd pers. sg. imperative, clitic
VAIP3PN Vaip3p-n Aux. verb, 3rd pers. pl. ind. pres., nonclitic
VAIP3PY Vaip3p-y Aux. verb, 3rd pers. pl. ind. pres., clitic
VAII3PN Vaii3p-n Aux. verb, 3rd pers. pl. ind. imp., nonclitic
VAII3PY Vaii3p-y Aux. verb, 3rd pers. pl. ind. imp., clitic
VASP3PN Vasp3p-n Aux. verb, 3rd pers. pl. subj. pres., nonclitic
VASP3PY Vasp3p-y Aux. verb, 3rd pers. pl. subj. pres., clitic
VASI3PN Vasi3p-n Aux. verb, 3rd pers. pl. subj. imp., nonclitic
VASI3PY Vasi3p-y Aux. verb, 3rd pers. pl. subj. imp., clitic
VAIS3SN Vaip3s-n Aux. verb, 3rd pers. sg. ind. pres., nonclitic
VAIS3SY Vaip3s-y Aux. verb, 3rd pers. sg. ind. pres., clitic
VAII3SN Vaii3s-n Aux. verb, 3rd pers. sg. ind. imp., nonclitic
VAII3SY Vaii3s-y Aux. verb, 3rd pers. sg. ind. imp., clitic
VASP3SN Vasp3s-n Aux. verb, 3rd pers. sg. subj. pres., nonclit
VASP3SY Vasp3s-y Aux. verb, 3rd pers. sg. subj. pres., clitic
VASI3SN Vasi3s-n Aux. verb, 3rd pers. sg. subj. imp., nonclitic
VASI3SY Vasi3s-y Aux. verb, 3rd pers. sg. subj. imp., clitic
VAPS Vaps---- Aux. verb, past part.
VAN Van----- Aux. verb, infinitive
VAPP Vapp---- Aux. verb, pres. participle
VOIP1PN Voip1p-n Mod. verb, 1st pers. pl. ind. pres., nonclitic
VOIP1PY Voip1p-y Mod. verb, 1st pers. pl. ind. pres., clitic
VOII1PN Voii1p-n Mod. verb, 1st pers. pl. ind. imp., nonclitic
VOII1PY Voii1p-y Mod. verb, 1st pers. pl. ind. imp., clitic
VOSP1PN Vosp1p-n Mod. verb, 1st pers. pl. subj. pres., nonclit
VOSP1PY Vosp1p-y Mod. verb, 1st pers. pl. subj. pres., clitic
VOSI1PN Vosi1p-n Mod. verb, 1st pers. pl. subj. imp., nonclitic
VOSI1PY Vosi1p-y Mod. verb, 1st pers. pl. subj. imp., clitic
VOIP1SN Voip1s-n Mod. verb, 1st pers. sg. ind. pres., nonclitic
VOIP1SY Voip1s-y Mod. verb, 1st pers. sg. ind. pres., clitic
VOII1SN Voii1s-n Mod. verb, 1st pers. sg. ind. imp., nonclitic
VOII1SY Voii1s-y Mod. verb, 1st pers. sg. ind. imp., clitic
VOSP1SN Vosp1s-n Mod. verb, 1st pers. sg. subj. pres., nonclit
VOSP1SY Vosp1s-y Mod. verb, 1st pers. sg. subj. pres., clitic
VOSI1SN Vosi1s-n Mod. verb, 1st pers. sg. subj. imp., nonclitic
VOSI1SY Vosi1s-y Mod. verb, 1st pers. sg. subj. imp.,clitic
VOIP2PN Voip2p-n Mod. verb, 2nd pers. pl. ind. pres., nonclitic
VOIP2PY Voip2p-y Mod. verb, 2nd pers. pl. ind. pres., clitic
VOII2PN Voii2p-n Mod. verb, 2nd pers. pl. ind. imp., nonclitic
VOII2PY Voii2p-y Mod. verb, 2nd pers. pl. ind. imp., clitic
VOSP2PN Vosp2p-n Mod. verb, 2nd pers. pl. subj. pres., nonclit
VOSP2PY Vosp2p-y Mod. verb, 2nd pers. pl. subj. pres., clitic
VOSI2PN Vosi2p-n Mod. verb, 2nd pers. pl. subj. imp., nonclitic
VOSI2PY Vosi2p-y Mod. verb, 2nd pers. pl. subj. imp., clitic
VOIP2SN Voip2s-n Mod. verb, 2nd pers. sg. ind. pres., nonclitic
VOIP2SY Voip2s-y Mod. verb, 2nd pers. sg. ind. pres., clitic
VOII2SN Voii2s-n Mod. verb, 2nd pers. sg. ind. imp., nonclitic
VOII2SY Voii2s-y Mod. verb, 2nd pers. sg. ind. imp., clitic
VOSP2SN Vosp2s-n Mod. verb, 2nd pers. sg. subj. pres., nonclit
VOSP2SY Vosp2s-y Mod. verb, 2nd pers. sg. subj. pres., clitic
VOSI2SN Vosi2s-n Mod. verb, 2nd pers. sg. subj. imp., nonclitic
VOSI2SY Vosi2s-y Mod. verb, 2nd pers. sg. subj. imp., clitic
VOIP3PN Voip3p-n Mod. verb, 3rd pers. pl. ind. pres., nonclitic
VOIP3PY Voip3p-y Mod. verb, 3rd pers. pl. ind. pres., clitic
VOII3PN Voii3p-n Mod. verb, 3rd pers. pl. ind. imp., nonclitic
VOII3PY Voii3p-y Mod. verb, 3rd pers. pl. ind. imp., clitic
VOSP3PN Vosp3p-n Mod. verb, 3rd pers. pl. subj. pres., nonclit
VOSP3PY Vosp3p-y Mod. verb, 3rd pers. pl. subj. pres., clitic
VOSI3PN Vosi3p-n Mod. verb, 3rd pers. pl. subj. imp., nonclitic
VOSI3PY Vosi3p-y Mod. verb, 3rd pers. pl. subj. imp., clitic
VOIS3SN Voip3s-n Mod. verb, 3rd pers. sg. ind. pres., nonclitic
VOIS3SY Voip3s-y Mod. verb, 3rd pers. sg. ind. pres., clitic
VOII3SN Voii3s-n Mod. verb, 3rd pers. sg. ind. imp., nonclitic
VOII3SY Voii3s-y Mod. verb, 3rd pers. sg. ind. imp., clitic
VOSP3SN Vosp3s-n Mod. verb, 3rd pers. sg. subj. pres., nonclit
VOSP3SY Vosp3s-y Mod. verb, 3rd pers. sg. subj. pres., clitic
VOSI3SN Vosi3s-n Mod. verb, 3rd pers. sg. subj. imp., nonclitic
VOSI3SY Vosi3s-y Mod. verb, 3rd pers. sg. subj. imp., clitic
VOPS Vops---- Mod. verb, past part.
VON Von----- Mod. verb, infinitive
VOPP Vopp---- Mod. verb, pres. participle
VMIP1PN Vmip1p-n Main verb, 1st pers. pl. ind. pres., nonclitic
VMIP1PY Vmip1p-y Main verb, 1st pers. pl. ind. pres., clitic
VMII1PN Vmii1p-n Main verb, 1st pers. pl. ind. imp., nonclitic
VMII1PY Vmii1p-y Main verb, 1st pers. pl. ind. imp., clitic
VMSP1PN Vmsp1p-n Main verb, 1st pers. pl. subj. pres., nonclit
VMSP1PY Vmsp1p-y Main verb, 1st pers. pl. subj. pres., clitic
VMSI1PN Vmsi1p-n Main verb, 1st pers. pl. subj. imp., nonclitic
VMSI1PY Vmsi1p-y Main verb, 1st pers. pl. subj. imp., clitic
VMIP1SN Vmip1s-n Main verb, 1st pers. sg. ind. pres., nonclitic
VMIP1SY Vmip1s-y Main verb, 1st pers. sg. ind. pres., clitic
VMII1SN Vmii1s-n Main verb, 1st pers. sg. ind. imp., nonclitic
VMII1SY Vmii1s-y Main verb, 1st pers. sg. ind. imp., clitic
VMSP1SN Vmsp1s-n Main verb, 1st pers. sg. subj. pres., nonclit
VMSP1SY Vmsp1s-y Main verb, 1st pers. sg. subj. pres., clitic
VMSI1SN Vmsi1s-n Main verb, 1st pers. sg. subj. imp., nonclitic
VMSI1SY Vmsi1s-y Main verb, 1st pers. sg. subj. imp., clitic
VMIP2PN Vmip2p-n Main verb, 2nd pers. pl. ind. pres., nonclitic
VMIP2PY Vmip2p-y Main verb, 2nd pers. pl. ind. pres., clitic
VMII2PN Vmii2p-n Main verb, 2nd pers. pl. ind. imp., nonclitic
VMII2PY Vmii2p-y Main verb, 2nd pers. pl. ind. imp., clitic
VMSP2PN Vmsp2p-n Main verb, 2nd pers. pl. subj. pres., nonclit
VMSP2PY Vmsp2p-y Main verb, 2nd pers. pl. subj. pres., clitic
VMSI2PN Vmsi2p-n Main verb, 2nd pers. pl. subj. imp., nonclitic
VMSI2PY Vmsi2p-y Main verb, 2nd pers. pl. subj. imp., clitic
VMM2PN Vmm-2p-n Main verb, 2nd pers. pl. imperative, nonclitic
VMM2PY Vmm-2p-y Main verb, 2nd pers. pl. imperative, clitic
VMIP2SN Vmip2s-n Main verb, 2nd pers. sg. ind. pres., nonclitic
VMIP2SY Vmip2s-y Main verb, 2nd pers. sg. ind. pres., clitic
VMII2SN Vmii2s-n Main verb, 2nd pers. sg. ind. imp., nonclitic
VMII2SY Vmii2s-y Main verb, 2nd pers. sg. ind. imp., clitic
VMSP2SN Vmsp2s-n Main verb, 2nd pers. sg. subj. pres., nonclit
VMSP2SY Vmsp2s-y Main verb, 2nd pers. sg. subj. pres., clitic
VMSI2SN Vmsi2s-n Main verb, 2nd pers. sg. subj. imp., nonclitic
VMSI2SY Vmsi2s-y Main verb, 2nd pers. sg. subj. imp., clitic
VMM2SN Vmm-2s-n Main verb, 2nd pers. sg. imperative, nonclitic
VMM2SY Vmm-2s-y Main verb, 2nd pers. sg. imperative, clitic
VMIP3PN Vmip3p-n Main verb, 3rd pers. pl. ind. pres., nonclitic
VMIP3PY Vmip3p-y Main verb, 3rd pers. pl. ind. pres., clitic
VMII3PN Vmii3p-n Main verb, 3rd pers. pl. ind. imp., nonclitic
VMII3PY Vmii3p-y Main verb, 3rd pers. pl. ind. imp., clitic
VMSP3PN Vmsp3p-n Main verb, 3rd pers. pl. subj. pres., nonclit
VMSP3PY Vmsp3p-y Main verb, 3rd pers. pl. subj. pres., clitic
VMSI3PN Vmsi3p-n Main verb, 3rd pers. pl. subj. imp., nonclitic
VMSI3PY Vmsi3p-y Main verb, 3rd pers. pl. subj. imp., clitic
VMIS3SN Vmip3s-n Main verb, 3rd pers. sg. ind. pres., nonclitic
VMIS3SY Vmip3s-y Main verb, 3rd pers. sg. ind. pres., clitic
VMII3SN Vmii3s-n Main verb, 3rd pers. sg. ind. imp., nonclitic
VMII3SY Vmii3s-y Main verb, 3rd pers. sg. ind. imp., clitic
VMSP3SN Vmsp3s-n Main verb, 3rd pers. sg. subj. pres., nonclit
VMSP3SY Vmsp3s-y Main verb, 3rd pers. sg. subj. pres., clitic
VMSI3SN Vmsi3s-n Main verb, 3rd pers. sg. subj. imp., nonclitic
VMSI3SY Vmsi3s-y Main verb, 3rd pers. sg. subj. imp., clitic
VMPS Vmps---- Main verb, past part.
VMN Vmn----- Main verb, infinitive
VMU Vmu----- Main verb, infinitive with incorp. particle
VMPP Vmpp---- Main verb, pres. participle
======= ================== ====================================
5.2.2.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Vaip1p-n VAIP1PN sind
Vaip1p-y VAIP1PY sinds
Vaii1p-n VAII1PN waren
Vaii1p-y VAII1PY warens
Vasp1p-n VASP1PN seien
Vasp1p-y VASP1PY seiens
Vasi1p-n VASI1PN w"aren
Vasi1p-y VASI1PY w"arens
Vaip1s-n VAIP1SN bin
Vaip1s-y VAIP1SY bins
Vaii1s-n VAII1SN war
Vaii1s-y VAII1SY wars
Vasp1s-n VASP1SN sei
Vasp1s-y VASP1SY seis
Vasi1s-n VASI1SN w"are
Vasi1s-y VASI1SY w"ares
Vaip2p-n VAIP2PN seid
Vaip2p-y VAIP2PY seids
Vaii2p-n VAII2PN wart
Vaii2p-y VAII2PY warts
Vasp2p-n VASP2PN seiet
Vasp2p-y VASP2PY seiets
Vasi2p-n VASI2PN w"aret
Vasi2p-y VASI2PY w"arets
Vam-2p-n VAM2PN seid
Vam-2p-y VAM2PY seids
Vaip2s-n VAIP2SN bist
Vaip2s-y VAIP2SY bists
Vaii2s-n VAII2SN warst
Vaii2s-y VAII2SY warsts
Vasp2s-n VASP2SN seist
Vasp2s-y VASP2SY seists
Vasi2s-n VASI2SN w"arest
Vasi2s-y VASI2SY w"arests
Vam-2s-n VAM2SN sei
Vam-2s-y VAM2SY seis
Vaip3p-n VAIP3PN sind
Vaip3p-y VAIP3PY sinds
Vaii3p-n VAII3PN waren
Vaii3p-y VAII3PY warens
Vasp3p-n VASP3PN seien
Vasp3p-y VASP3PY seiens
Vasi3p-n VASI3PN w"aren
Vasi3p-y VASI3PY w"arens
Vaip3s-n VAIS3SN ist
Vaip3s-y VAIS3SY ists
Vaii3s-n VAII3SN war
Vaii3s-y VAII3SY wars
Vasp3s-n VASP3SN sei
Vasp3s-y VASP3SY seis
Vasi3s-n VASI3SN w"are
Vasi3s-y VASI3SY w"ares
Vaps---- VAPS gehabt
Van----- VAN haben
Vapp---- VAPP habend
Voip1p-n VOIP1PN sollen
Voip1p-y VOIP1PY sollens
Voii1p-n VOII1PN sollten
Voii1p-y VOII1PY solltens
Vosp1p-n VOSP1PN sollen
Vosp1p-y VOSP1PY sollens
Vosi1p-n VOSI1PN sollten
Vosi1p-y VOSI1PY solltens
Voip1s-n VOIP1SN soll
Voip1s-y VOIP1SY solls
Voii1s-n VOII1SN sollte
Voii1s-y VOII1SY solltes
Vosp1s-n VOSP1SN solle
Vosp1s-y VOSP1SY solles
Vosi1s-n VOSI1SN sollte
Vosi1s-y VOSI1SY solltes
Voip2p-n VOIP2PN sollt
Voip2p-y VOIP2PY sollts
Voii2p-n VOII2PN solltet
Voii2p-y VOII2PY solltets
Vosp2p-n VOSP2PN sollet
Vosp2p-y VOSP2PY sollets
Vosi2p-n VOSI2PN solltet
Vosi2p-y VOSI2PY solltets
Vom-2p-n VOM2PN sollt
Vom-2p-y VOM2PY sollts
Voip2s-n VOIP2SN sollst
Voip2s-y VOIP2SY sollsts
Voii2s-n VOII2SN solltest
Voii2s-y VOII2SY solltests
Vosp2s-n VOSP2SN sollest
Vosp2s-y VOSP2SY sollests
Vosi2s-n VOSI2SN solltest
Vosi2s-y VOSI2SY solltests
Vom-2s-n VOM2SN soll
Vom-2s-y VOM2SY solls
Voip3p-n VOIP3PN sollen
Voip3p-y VOIP3PY sollens
Voii3p-n VOII3PN sollten
Voii3p-y VOII3PY solltens
Vosp3p-n VOSP3PN sollen
Vosp3p-y VOSP3PY sollens
Vosi3p-n VOSI3PN sollten
Vosi3p-y VOSI3PY solltens
Voip3s-n VOIS3SN soll
Voip3s-y VOIS3SY solls
Voii3s-n VOII3SN sollte
Voii3s-y VOII3SY solltes
Vosp3s-n VOSP3SN solle
Vosp3s-y VOSP3SY solles
Vosi3s-n VOSI3SN sollte
Vosi3s-y VOSI3SY solltes
Vops---- VOPS gesollt
Von----- VON sollen
Vopp---- VOPP sollend
Vmip1p-n VMIP1PN schreiben
Vmip1p-y VMIP1PY schreibens
Vmii1p-n VMII1PN schrieben
Vmii1p-y VMII1PY schriebens
Vmsp1p-n VMSP1PN schreiben
Vmsp1p-y VMSP1PY schreibens
Vmsi1p-n VMSI1PN schrieben
Vmsi1p-y VMSI1PY schriebens
Vmip1s-n VMIP1SN schreibe
Vmip1s-y VMIP1SY schreibes
Vmii1s-n VMII1SN schrieb
Vmii1s-y VMII1SY schriebs
Vmsp1s-n VMSP1SN schreibe
Vmsp1s-y VMSP1SY schreibes
Vmsi1s-n VMSI1SN schriebe
Vmsi1s-y VMSI1SY schriebes
Vmip2p-n VMIP2PN schreibt
Vmip2p-y VMIP2PY schreibts
Vmii2p-n VMII2PN schriebt
Vmii2p-y VMII2PY schriebts
Vmsp2p-n VMSP2PN schreibet
Vmsp2p-y VMSP2PY schreibets
Vmsi2p-n VMSI2PN schriebet
Vmsi2p-y VMSI2PY schriebets
Vmm-2p-n VMM2PN schreibt
Vmm-2p-y VMM2PY schreibts
Vmip2s-n VMIP2SN schreibst
Vmip2s-y VMIP2SY schreibsts
Vmii2s-n VMII2SN schriebst
Vmii2s-y VMII2SY schriebsts
Vmsp2s-n VMSP2SN schreibest
Vmsp2s-y VMSP2SY schreibests
Vmsi2s-n VMSI2SN schriebest
Vmsi2s-y VMSI2SY schriebests
Vmm-2s-n VMM2SN schreib
Vmm-2s-y VMM2SY schreibs
Vmip3p-n VMIP3PN schreiben
Vmip3p-y VMIP3PY schreibens
Vmii3p-n VMII3PN schrieben
Vmii3p-y VMII3PY schriebens
Vmsp3p-n VMSP3PN schreiben
Vmsp3p-y VMSP3PY schreibens
Vmsi3p-n VMSI3PN schrieben
Vmsi3p-y VMSI3PY schriebens
Vmip3s-n VMIS3SN schreibt
Vmip3s-y VMIS3SY schreibts
Vmii3s-n VMII3SN schrieb
Vmii3s-y VMII3SY schriebs
Vmsp3s-n VMSP3SN schreibe
Vmsp3s-y VMSP3SY schreibes
Vmsi3s-n VMSI3SN schriebe
Vmsi3s-y VMSI3SY schriebes
Vm---ps- VMPS gegangen
Vm---n-- VMN schreiben
Vm---u-- VMU wegzuschreiben
Vm---pp- VMPP schreibend
========================== =============
Note:
Adding participles here would imply the addition of adjective
features. Morphologically at least, participles behave like
adjectives. A good - but not very elegant - solution would therefore
be to handle participles as a type of adjective.
Secondly there are ambigous cases, e.g.
Er ist ger"uhrt.
where ger"uhrt is a participle, but might be tagged either as a verb
or an adjective, depending on the context. This would be violating the
assumption for applicativeness. The best solution at hand is to treat
these forms as ambiguous with respect to membership in a word class
(adjective and verb, respectively). However, this is a mixture of
morphosyntactic with distributional criteria, and therefore
unsatisfactory.
5.2.3 Adjectives (A)
--------------------
5.2.3.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type qualificat. gut f
ordinal zweites o
cardinal zwei c this value has been added
possessive mein s
part1 lachende 1 this value has been added
part2 gesungene 2 this value has been added
------------ ----------- ----------- ----
Degree positive gut p
comparative besser c
superlative beste s
------------ ----------- ----------- ----
Gender masculine guter m
feminine gute f
neuter gutes n
------------ ----------- ----------- ----
Number singular guter s
plural gute p
------------ ----------- ----------- ----
Case nominative guter n
genitive guten g
dative guten d
accusative guten a
============ =========== =========== ====
Notes
a. We decided to include cardinal as well as ordinal numbers.
Therefore there is no special class for numerals.
b. Although we have doubts concerning the "possessive adjectives" and
would prefer to add the form
(Der Ball ist) mein
and
das ist meins
to the possessive Determiner or Pronouns because 'mein' originally is
a possessive pronoun, we recognize that the present definition of
these categories does not allow the addition of the value
'predicative'. This treatment is therefore a compromise.
c. German adjectives can be used in an attributive or a predicative
mode. Predicative adjectives are not marked for gender, case or number
with the exception of possessive and ordinal adjectives.
d. It would be necessary to specify the inflection type. The
inflection type reflects the place of an adjective in an NP (following
a determiner, an article, or none of these). Values are strong, mixed,
and weak. However, we have left this feature in this generic version.
e. part1 refers to adjectives that are derived from present
participles.
part2 refers to adjectives that are derived from past participles.
5.2.3.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
AMSN A..msn Adjective masc. sing. nominative
AMSG A..msg Adjective masc. sing. genitive
AMSD A..msd Adjective masc. sing. dative
AMSA A..msa Adjective masc. sing. accusative
AMPN A..mpn Adjective masc. plur. nominative
AMPG A..mpg Adjective masc. plur. genitive
AMPD A..mpd Adjective masc. plur. dative
AMPA A..mpa Adjective masc. plur. accusative
AFSN A..fsn Adjective fem. sing. nominative
AFSG A..fsg Adjective fem. sing. genitive
AFSD A..fsd Adjective fem. sing. dative
AFSA A..fsa Adjective fem. sing. accusative
AFPN A..fpn Adjective fem. plur. nominative
AFPG A..fpg Adjective fem. plur. genitive
AFPD A..fpd Adjective fem. plur. dative
AFPA A..fpa Adjective fem. plur. accusative
ANSN A..fsn Adjective neut. sing. nominative
ANSG A..fsg Adjective neut. sing. genitive
ANSD A..fsd Adjective neut. sing. dative
ANSA A..fsa Adjective neut. sing. accusative
ANPN A..fpn Adjective neut. plur. nominative
ANPG A..fpg Adjective neut. plur. genitive
ANPD A..fpd Adjective neut. plur. dative
ANPA A..fpa Adjective neut. plur. accusative
A A[q12][pc]--- Adjective, predic. without gender mark,comparable
AP A[cp]--- Adjective, predic. without g.m., not comparable
AP. A[op]p.-- Adjective, predic. with gender mark and num.mark
AS A[q12]s--- Adjective, predicative superlative
======= ================== ====================================
5.2.3.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Aqpmsn AMSN gute
Aqpmsg AMSG guten
Aqpmsd AMSD guten
Aqpmsa AMSA guten
Aqpmpn AMPN guten
Aqpmpg AMPG guten
Aqpmpd AMPD guten
Aqpmpa AMPA guten
Aqpfsn AFSN gute
Aqpfsg AFSG guten
Aqpfsd AFSD guten
Aqpfsa AFSA gute
Aqpfpn AFPN guten
Aqpfpg AFPG guten
Aqpfpd AFPD guten
Aqpfpa AFPA guten
Aqpnsn ANSN gute
Aqpnsg ANSG guten
Aqpnsd ANSD guten
Aqpnsa ANSA gute
Aqpnpn ANPN guten
Aqpnpg ANPG guten
Aqpnpd ANPD guten
Aqpnpa ANPA guten
Aqcmsn AMSN bessere
Aqcmsg AMSG besseren
Aqcmsd AMSD besseren
Aqcmsa AMSA besseren
Aqcmpn AMPN besseren
Aqcmpg AMPG besseren
Aqcmpd AMPD besseren
Aqcmpa AMPA besseren
Aqcfsn AFSN bessere
Aqcfsg AFSG besseren
Aqcfsd AFSD besseren
Aqcfsa AFSA bessere
Aqcfpn AFPN besseren
Aqcfpg AFPG besseren
Aqcfpd AFPD besseren
Aqcfpa AFPA besseren
Aqcnsn ANSN bessere
Aqcnsg ANSG besseren
Aqcnsd ANSD besseren
Aqcnsa ANSA bessere
Aqcnpn ANPN besseren
Aqcnpg ANPG besseren
Aqcnpd ANPD besseren
Aqcnpa ANPA besseren
Aqsmsn AMSN beste
Aqsmsg AMSG besten
Aqsmsd AMSD besten
Aqsmsa AMSA besten
Aqsmpn AMPN besten
Aqsmpg AMPG besten
Aqsmpd AMPD besten
Aqsmpa AMPA besten
Aqsfsn AFSN beste
Aqsfsg AFSG besten
Aqsfsd AFSD besten
Aqsfsa AFSA beste
Aqsfpn AFPN besten
Aqsfpg AFPG besten
Aqsfpd AFPD besten
Aqsfpa AFPA besten
Aqsnsn ANSN beste
Aqsnsg ANSG besten
Aqsnsd ANSD besten
Aqsnsa ANSA beste
Aqsnpn ANPN besten
Aqsnpg ANPG besten
Aqsnpd ANPD besten
Aqsnpa ANPA besten
Aopmsn AMSN zweite
Aopmsg AMSG zweiten
Aopmsd AMSD zweiten
Aopmsa AMSA zweiten
Aopmpn AMPN zweiten
Aopmpg AMPG zweiten
Aopmpd AMPD zweiten
Aopmpa AMPA zweiten
Aopfsn AFSN zweite
Aopfsg AFSG zweiten
Aopfsd AFSD zweiten
Aopfsa AFSA zweite
Aopfpn AFPN zweiten
Aopfpg AFPG zweiten
Aopfpd AFPD zweiten
Aopfpa AFPA zweiten
Aopnsn ANSN zweite
Aopnsg ANSG zweiten
Aopnsd ANSD zweiten
Aopnsa ANSA zweite
Aopnpn ANPN zweiten
Aopnpg ANPG zweiten
Aopnpd ANPD zweiten
Aopnpa ANPA zweiten
Acpmpn AMPN zwei
Acpmpg AMPG zwei
Acpmpd AMPD zwei
Acpmpa AMPA zwei
Acpfpn AFPN zwei
Acpfpg AFPG zwei
Acpfpd AFPD zwei
Acpfpa AFPA zwei
Acpnpn ANPN zwei
Acpnpg ANPG zwei
Acpnpd ANPD zwei
Acpnpa ANPA zwei
A1pmsn AMSN beruhigende
A1pmsg AMSG beruhigenden
A1pmsd AMSD beruhigenden
A1pmsa AMSA beruhigenden
A1pmpn AMPN beruhigenden
A1pmpg AMPG beruhigenden
A1pmpd AMPD beruhigenden
A1pmpa AMPA beruhigenden
A1pfsn AFSN beruhigende
A1pfsg AFSG beruhigenden
A1pfsd AFSD beruhigenden
A1pfsa AFSA beruhigende
A1pfpn AFPN beruhigenden
A1pfpg AFPG beruhigenden
A1pfpd AFPD beruhigenden
A1pfpa AFPA beruhigenden
A1pnsn ANSN beruhigende
A1pnsg ANSG beruhigenden
A1pnsd ANSD beruhigenden
A1pnsa ANSA beruhigende
A1pnpn ANPN beruhigenden
A1pnpg ANPG beruhigenden
A1pnpd ANPD beruhigenden
A1pnpa ANPA beruhigenden
A1cmsn AMSN beruhigendere
A1cmsg AMSG beruhigenderen
A1cmsd AMSD beruhigenderen
A1cmsa AMSA beruhigenderen
A1cmpn AMPN beruhigenderen
A1cmpg AMPG beruhigenderen
A1cmpd AMPD beruhigenderen
A1cmpa AMPA beruhigenderen
A1cfsn AFSN beruhigendere
A1cfsg AFSG beruhigenderen
A1cfsd AFSD beruhigenderen
A1cfsa AFSA beruhigendere
A1cfpn AFPN beruhigenderen
A1cfpg AFPG beruhigenderen
A1cfpd AFPD beruhigenderen
A1cfpa AFPA beruhigenderen
A1cnsn ANSN beruhigendere
A1cnsg ANSG beruhigenderen
A1cnsd ANSD beruhigenderen
A1cnsa ANSA beruhigendere
A1cnpn ANPN beruhigenderen
A1cnpg ANPG beruhigenderen
A1cnpd ANPD beruhigenderen
A1cnpa ANPA beruhigenderen
A1smsn AMSN beruhigendste
A1smsg AMSG beruhigendsten
A1smsd AMSD beruhigendsten
A1smsa AMSA beruhigendsten
A1smpn AMPN beruhigendsten
A1smpg AMPG beruhigendsten
A1smpd AMPD beruhigendsten
A1smpa AMPA beruhigendsten
A1sfsn AFSN beruhigendste
A1sfsg AFSG beruhigendsten
A1sfsd AFSD beruhigendsten
A1sfsa AFSA beruhigendste
A1sfpn AFPN beruhigendsten
A1sfpg AFPG beruhigendsten
A1sfpd AFPD beruhigendsten
A1sfpa AFPA beruhigendsten
A1snsn ANSN beruhigendste
A1snsg ANSG beruhigendsten
A1snsd ANSD beruhigendsten
A1snsa ANSA beruhigendste
A1snpn ANPN beruhigendsten
A1snpg ANPG beruhigendsten
A1snpd ANPD beruhigendsten
A1snpa ANPA beruhigendsten
A2pmsn AMSN geachtete
A2pmsg AMSG geachteten
A2pmsd AMSD geachteten
A2pmsa AMSA geachteten
A2pmpn AMPN geachteten
A2pmpg AMPG geachteten
A2pmpd AMPD geachteten
A2pmpa AMPA geachteten
A2pfsn AFSN geachtete
A2pfsg AFSG geachteten
A2pfsd AFSD geachteten
A2pfsa AFSA geachtete
A2pfpn AFPN geachteten
A2pfpg AFPG geachteten
A2pfpd AFPD geachteten
A2pfpa AFPA geachteten
A2pnsn ANSN geachtete
A2pnsg ANSG geachteten
A2pnsd ANSD geachteten
A2pnsa ANSA geachtete
A2pnpn ANPN geachteten
A2pnpg ANPG geachteten
A2pnpd ANPD geachteten
A2pnpa ANPA geachteten
A2cmsn AMSN geachtetere
A2cmsg AMSG geachteteren
A2cmsd AMSD geachteteren
A2cmsa AMSA geachteteren
A2cmpn AMPN geachteteren
A2cmpg AMPG geachteteren
A2cmpd AMPD geachteteren
A2cmpa AMPA geachteteren
A2cfsn AFSN geachtetere
A2cfsg AFSG geachteteren
A2cfsd AFSD geachteteren
A2cfsa AFSA geachtetere
A2cfpn AFPN geachteteren
A2cfpg AFPG geachteteren
A2cfpd AFPD geachteteren
A2cfpa AFPA geachteteren
A2cnsn ANSN geachtetere
A2cnsg ANSG geachteteren
A2cnsd ANSD geachteteren
A2cnsa ANSA geachtetere
A2cnpn ANPN geachteteren
A2cnpg ANPG geachteteren
A2cnpd ANPD geachteteren
A2cnpa ANPA geachteteren
A2smsn AMSN geachtetste
A2smsg AMSG geachtetsten
A2smsd AMSD geachtetsten
A2smsa AMSA geachtetsten
A2smpn AMPN geachtetsten
A2smpg AMPG geachtetsten
A2smpd AMPD geachtetsten
A2smpa AMPA geachtetsten
A2sfsn AFSN geachtetste
A2sfsg AFSG geachtetsten
A2sfsd AFSD geachtetsten
A2sfsa AFSA geachtetste
A2sfpn AFPN geachtetsten
A2sfpg AFPG geachtetsten
A2sfpd AFPD geachtetsten
A2sfpa AFPA geachtetsten
A2snsn ANSN geachtetste
A2snsg ANSG geachtetsten
A2snsd ANSD geachtetsten
A2snsa ANSA geachtetste
A2snpn ANPN geachtetsten
A2snpg ANPG geachtetsten
A2snpd ANPD geachtetsten
A2snpa ANPA geachtetsten
Aqp A gut
Aqc A besser
A1p A beruhigend
A1c A beruhigender
A2p A geachtet
A2c A geachteter
Acp ACP zwei
Asp AP mein
Aopms AP. zweiter
Aopfs AP. zweite
Aopns AP. zweites
Aop-p AP. zweite
Aspms AP. meiner
Aspfs AP. meine
Aspns AP. meines
Asp-p AP. meine
Aqs AS besten
A1s AS beruhigendsten
A2s AS geachtetesten
========= ======= =============================================
5.2.4 Pronouns (P)
------------------
5.2.4.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type personal ich p
demonstrat. dieser d
indefinite kein i
interrog. wer t
relative der r
reflexive sich x
------------ ----------- ----------- ----
Person first ich 1
second du 2
third es 3
------------ ----------- ----------- ----
Gender masculine dieser m
feminine diese f
neutre dieses n
------------ ----------- ----------- ----
Number singular dieser s
plural diese p
------------ ----------- ----------- ----
Case nominative dieser n
genitive dieses g
dative diesem d
accusative diesen a
------------ ----------- ----------- ----
Possessor - - -
============ =========== =========== ====
Notes
a. Possessive. In German there are no possessives in pronominal use,
so we do not need the attribute possessor here.
5.2.4.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
PP1SN Pp1-sn Personal pron., 1st pers. sing., nomin.
PP1SG Pp1-sg Personal pron., 1st pers. sing., gen.
P1SD P[px]1-sd Personal pron., 1st pers. sing., dat.
P1SA P[px]1-sa Personal pron., 1st pers. sing., acc.
PP2SN Pp2-sn Personal pron., 2nd pers. sing., nomin.
PP2SG Pp2-sg Personal pron., 2nd pers. sing., gen.
P2SD P[px]2-sd Personal pron., 2nd pers. sing., dat.
P2SN P[px]2-sa Personal pron., 2nd pers. sing., acc.
PP3MSN Pp3msn Personal pron., 3rd pers., masc., sing.,nomin.
PP3MSG Pp3msg Personal pron., 3rd pers., masc., sing., gen.
PP3MSD Pp3msd Personal pron., 3rd pers., masc., sing., dat.
PP3MSA Pp3msa Personal pron., 3rd pers., masc., sing., acc.
PP3FSN Pp3fsn Personal pron., 3rd pers., fem., sing., nomin.
PP3FSG Pp3fsg Personal pron., 3rd pers., fem., sing., gen.
PP3FSD Pp3fsd Personal pron., 3rd pers., fem., sing., dat.
PP3FSA Pp3fsa Personal pron., 3rd pers., fem., sing., acc.
PP3NSN Pp3nsn Personal pron., 3rd pers., neut., sing., nomin.
PP3NSG Pp3nsg Personal pron., 3rd pers., neut., sing., gen.
PP3NSD Pp3nsd Personal pron., 3rd pers., neut., sing., dat.
PP3NSA Pp3nsa Personal pron., 3rd pers., neut., sing., acc.
PP1PN Pp1-pn Personal pron., 1st pers. plur., nomin.
PP1PG Pp1-pg Personal pron., 1st pers. plur., gen.
P1PD P[px]1-pd Personal/Refl. pron., 1st pers. plur., dat.
P1PA P[px]1-pa Personal/Refl. pron., 1st pers. plur., acc.
PP2PN Pp2-pn Personal pron., 2nd pers. plur., nomin.
PP2PG Pp2-pg Personal pron., 2nd pers. plur., gen.
P2PD P[px]2-pd Personal/Refl. pron., 2nd pers. plur., dat.
P2PA P[px]2-pn Personal/Refl. pron., 2nd pers. plur., acc.
PP3PN Pp3-pn Personal pron., 3rd pers. plur., nomin.
PP3PG Pp3-pg Personal pron., 3rd pers. plur., gen.
PP3PD Pp3-pd Personal pron., 3rd pers. plur., dat.
PP3PA Pp3-pn Personal pron., 3rd pers. plur., acc.
PDMSN Pd-msn Dem. pronoun, masc., sing., nominative
PDMSG Pd-msg Dem. pronoun, masc. sing. genitive
PDMSD Pd-msd Dem. pronoun, masc. sing. dative
PDMSA Pd-msa Dem. pronoun, masc. sing. accusative
PDFSN Pd-fsn Dem. pronoun, fem. sing. nominativ
PDFSG Pd-fsg Dem. pronoun, fem. sing. genitive
PDFSD Pd-fsd Dem. pronoun, fem. sing. dative
PDFSA Pd-fsa Dem. pronoun, fem. sing. accusative
PDNSN Pd-nsn Dem. pronoun, neut. sing. nominativ
PDNSG Pd-nsg Dem. pronoun, neut. sing. genitive
PDNSD Pd-nsd Dem. pronoun, neut. sing. dative
PDNSA Pd-nsa Dem. pronoun, neut. sing. accusative
PDPN Pd--pn Dem. pronoun, plur. nominative
PDPG Pd--pg Dem. pronoun, plur. genitive
PDPD Pd--pd Dem. pronoun, plur. dative
PDPA Pd--pa Dem. pronoun, plur. accusative
PIMSN Pi-msn Indef. pronoun, masc. sing. nominative
PIMSG Pi-msg Indef. pronoun, masc. sing. genitive
PIMSD Pi-msd Indef. pronoun, masc. sing. dative
PIMSA Pi-msa Indef. pronoun, masc. sing. accusative
PIFSN Pi-fsn Indef. pronoun, fem. sing. nominativ
PIFSG Pi-fsg Indef. pronoun, fem. sing. genitive
PIFSD Pi-fsd Indef. pronoun, fem. sing. dative
PIFSA Pi-fsa Indef. pronoun, fem. sing. accusative
PINSN Pi-nsn Indef. pronoun, neut. sing. nominativ
PINSG Pi-nsg Indef. pronoun, neut. sing. genitive
PINSD Pi-nsd Indef. pronoun, neut. sing. dative
PINSA Pi-nsa Indef. pronoun, neut. sing. accusative
PIPN Pi--pn Indef. pronoun, plur. nominative
PIPG Pi--pg Indef. pronoun, plur. genitive
PIPD Pi--pd Indef. pronoun, plur. dative
PIPA Pi--pa Indef. pronoun, plur. accusative
PTN Pt--n Interrogative pronoun, nom.
PTG Pt--g Interrogative pronoun, gen.
PTD Pt--d Interrogative pronoun, dat.
PTA Pt--a Interrogative pronoun, acc.
PRMSN Pr-msn Rel. pronoun, masc. sing. nominative
PRMSG Pr-msg Rel. pronoun, masc. sing. genitive
PRMSD Pr-msd Rel. pronoun, masc. sing. dative
PRMSA Pr-msa Rel. pronoun, masc. sing. accusative
PRFSN Pr-fsn Rel. pronoun, fem. sing. nominativ
PRFSG Pr-fsg Rel. pronoun, fem. sing. genitive
PRFSD Pr-fsd Rel. pronoun, fem. sing. dative
PRFSA Pr-fsa Rel. pronoun, fem. sing. accusative
PRNSN Pr-nsn Rel. pronoun, neut. sing. nominativ
PRNSG Pr-nsg Rel. pronoun, neut. sing. genitive
PRNSD Pr-nsd Rel. pronoun, neut. sing. dative
PRNSA Pr-nsa Rel. pronoun, neut. sing. accusative
PRPN Pr--pn Rel. pronoun, plur. nominative
PRPG Pr--pg Rel. pronoun, plur. genitive
PRPD Pr--pd Rel. pronoun, plur. dative
PRPA Pr--pa Rel. pronoun, plur. accusative
PX3 Px3-.[da] Refl. pronoun, 3rd pers.
======= ================== ====================================
5.2.4.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Pp1-sn PP1SN ich
Pp1-sg PP1SG meiner
Pp1-sd P1SD mir
Pp1-sa P1SA mich
Px1-sd P1SD mir
Px1-sa P1SA mich
Pp2-sn PP2SN du
Pp2-sg PP2SG deiner
Pp2-sd P2SD dich
Pp2-sa P2SA dir
Px2-sd P2SD dich
Px2-sa P2SA dir
Pp3msn PP3MSN er
Pp3msg PP3MSG seiner
Pp3msd P3SD ihm
Pp3msa P3SA ihn
Pp3fsn PP3FSN sie
Pp3fsg PP3FSG ihrer
Pp3fsd P3SD sie
Pp3fsa P3SA sie
Pp3nsn PP3NSN es
Pp3nsg PP3NSG seiner
Pp3nsd P3SD ihm
Pp3nsa P3SA es
Pp1-pn PP1PN wir
Pp1-pg PP1PG unser
Pp1-pd P1PD uns
Pp1-pa P1PA uns
Px1-pd P1PD uns
Px1-pa P1PA uns
Pp2-pn PP2PN ihr
Pp2-pg PP2PG eurer
Pp2-pd P2PD euch
Pp2-pn P2PA euch
Px2-pd P2PD euch
Px2-pn P2PA euch
Pp3-pn PP3PN sie
Pp3-pg PP3PG ihrer
Pp3-pd PP3PD ihnen
Pp3-pn PP3PN sie
Pd-ms PDMSN dieser
Pd-msg PDMSG dieses
Pd-msd PDMSD diesem
Pd-msa PDMSA diesen
Pd-fsn PDFSN diese
Pd-fsg PDFSG dieser
Pd-fs PDFSD dieser
Pd-fsa PDFSA diese
Pd-nsn PDNSN dieses
Pd-nsg PDNSG dieses
Pd-nsd PDNSD diesem
Pd-nsa PDNSA dieses
Pd--pn PDPN diese
Pd--pg PDPG dieser
Pd--pd PDPD diesen
Pd--pa PDPA diese
Pi-msn PIMSN keiner
Pi-msg PIMSG keines
Pi-msd PIMSD keinem
Pi-msa PIMSA keinen
Pi-fsn PIFSN keine
Pi-fsg PIFSG keiner
Pi-fsd PIFSD keiner
Pi-fsa PIFSA keine
Pi-nsn PINSN keines
Pi-nsg PINSG keines
Pi-nsd PINSD keinem
Pi-nsa PINSA keines
Pi--pn PIPN keine
Pi--pg PIPG keiner
Pi--pd PIPD keinem
Pi--pa PIPA keinen
Pt--n PTN wer
Pt--g PTG wessen
Pt--d PTD wem
Pt--a PTA was
Pr-msn PRMSN der
Pr-msg PRMSG dessen
Pr-msd PRMSD dem
Pr-msa PRMSA den
Pr-fsn PRFSN die
Pr-fsg PRFSG deren
Pr-fsd PRFSD der
Pr-fsa PRFSA die
Pr-nsn PRNSN das
Pr-nsg PRNSG dessen
Pr-nsd PRNSD dem
Pr-nsa PRNSA das
Pr--pn PRPN die
Pr--pg PRPG deren
Pr--pd PRPD denen
Pr--pa PRPA die
Px3-sa P3SA sich
Px3-sd P3SD sich
========= ======= =============================================
5.2.5. Determiners (D)
----------------------
5.2.5.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type demonstrat. dieser d
indefinite kein i
possessive mein s
interrog. welche t
------------ ----------- ----------- ----
Person first mein 1
second dein 2
third sein 3
------------ ----------- ----------- ----
Gender masculine dieser m
feminine diese f
neutre dieses n
------------ ----------- ----------- ----
Number singular dieser s
plural diese p
------------ ----------- ----------- ----
Case nominative dieser n
genitive dieses g
dative diesem d
accusative diesen a
------------ ---------- ----------- ----
Possessor - - -
============= ============ =========== ====
Note:
We included "person" as a feature in the lexical list, but we doubt
whether this treatment is useful, at least seen from a German point of
view. We prefer to treat the six possessive determiners which are
derived from the personal pronouns as different lexemes.
Otherwise, the feature "Number" would have to be specified twice.
First, to mark the proper attributes of the basic pronoun (e.g. 1.
person plural for "unser") and second to mark the features of
agreement with the head of the NP ("unser Leben" vs. "unsere Kinder").
Furthermore, the person information is of semantic character for the
possessive determiner, in contrast to the underlying personal pronoun,
where the person information is also a feature of agreement with the
verb or VP of a sentence.
The attribute 'possessor' is not grammatically relevant.
5.2.5.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
DDMSN Dd-msn- Dem. determiner, masc., sing., nominative
DDMSG Dd-msg- Dem. determiner, masc. sing. genitive
DDMSD Dd-msd- Dem. determiner, masc. sing. dative
DDMSA Dd-msa- Dem. determiner, masc. sing. accusative
DDFSN Dd-fsn- Dem. determiner, fem. sing. nominativ
DDFSG Dd-fsg- Dem. determiner, fem. sing. genitive
DDFSD Dd-fsd- Dem. determiner, fem. sing. dative
DDFSA Dd-fsa- Dem. determiner, fem. sing. accusative
DDNSN Dd-nsn- Dem. determiner, neut. sing. nominativ
DDNSG Dd-nsg- Dem. determiner, neut. sing. genitive
DDNSD Dd-nsd- Dem. determiner, neut. sing. dative
DDNSA Dd-nsa- Dem. determiner, neut. sing. accusative
DDDN Dd--pn- Dem. determiner, plur. nominative
DDDG Dd--pg- Dem. determiner, plur. genitive
DDDD Dd--pd- Dem. determiner, plur. dative
DDDA Dd--pa- Dem. determiner, plur. accusative
DIMSN Di-msn- Indef. determiner, masc. sing. nominative
DIMSG Di-msg- Indef. determiner, masc. sing. genitive
DIMSD Di-msd- Indef. determiner, masc. sing. dative
DIMSA Di-msa- Indef. determiner, masc. sing. accusative
DIFSN Di-fsn- Indef. determiner, fem. sing. nominativ
DIFSG Di-fsg- Indef. determiner, fem. sing. genitive
DIFSD Di-fsd- Indef. determiner, fem. sing. dative
DIFSA Di-fsa- Indef. determiner, fem. sing. accusative
DINSN Di-nsn- Indef. determiner, neut. sing. nominativ
DINSG Di-nsg- Indef. determiner, neut. sing. genitive
DINSD Di-nsd- Indef. determiner, neut. sing. dative
DINSA Di-nsa- Indef. determiner, neut. sing. accusative
DIPN Di--pn- Indef. determiner, plur. nominative
DIPG Di--pg- Indef. determiner, plur. genitive
DIPD Di--pd- Indef. determiner, plur. dative
DIPA Di--pa- Indef. determiner, plur. accusative
DPMSN Dp-msn- Poss. determiner, masc. sing. nominative
DPMSG Dp-msg- Poss. determiner, masc. sing. genitive
DPMSD Dp-msd- Poss. determiner, masc. sing. dative
DPMSA Dp-msa- Poss. determiner, masc. sing. accusative
DPFSN Dp-fsn- Poss. determiner, fem. sing. nominativ
DPFSG Dp-fsg- Poss. determiner, fem. sing. genitive
DPFSD Dp-fsd- Poss. determiner, fem. sing. dative
DPFSA Dp-fsa- Poss. determiner, fem. sing. accusative
DPNSN Dp-nsn- Poss. determiner, neut. sing. nominativ
DPNSG Dp-nsg- Poss. determiner, neut. sing. genitive
DPNSD Dp-nsd- Poss. determiner, neut. sing. dative
DPNSA Dp-nsa- Poss. determiner, neut. sing. accusative
DPDN Dp--pn- Poss. determiner, plur. nominative
DPDG Dp--pg- Poss. determiner, plur. genitive
DPDD Dp--pd- Poss. determiner, plur. dative
DPDA Dp--pa- Poss. determiner, plur. accusative
DTMSN Dt-msn- Interrog. determiner, masc. sing. nominative
DTMSG Dt-msg- Interrog. determiner, masc. sing. genitive
DTMSD Dt-msd- Interrog. determiner, masc. sing. dative
DTMSA Dt-msa- Interrog. determiner, masc. sing. accusative
DTFSN Dt-fsn- Interrog. determiner, fem. sing. nominativ
DTFSG Dt-fsg- Interrog. determiner, fem. sing. genitive
DTFSD Dt-fsd- Interrog. determiner, fem. sing. dative
DTFSA Dt-fsa- Interrog. determiner, fem. sing. accusative
DTNSN Dt-nsn- Interrog. determiner, neut. sing. nominativ
DTNSG Dt-nsg- Interrog. determiner, neut. sing. genitive
DTNSD Dt-nsd- Interrog. determiner, neut. sing. dative
DTNSA Dt-nsa- Interrog. determiner, neut. sing. accusative
DTDN Dt--pn- Interrog. determiner, plur. nominative
DTDG Dt--pg- Interrog. determiner, plur. genitive
DTDD Dt--pd- Interrog. determiner, plur. dative
DTDA Dt--pa- Interrog. determiner, plur. accusative
5.2.5.3 Combinations
======= ======= =============================================
Lexique Corpus Example
======= ======= =============================================
Dd-msn- DDMSN dieser
Dd-msg- DDMSG dieses
Dd-msd- DDMSD diesem
Dd-msa- DDMSA diesen
Dd-fsn- DDFSN diese
Dd-fsg- DDFSG dieser
Dd-fs- DDFSD dieser
Dd-fsa- DDFSA diese
Dd-nsn- DDNSN dieses
Dd-nsg- DDNSG dieses
Dd-nsd- DDNSD diesem
Dd-nsa- DDNSA dieses
Dd--pn- DDPN diese
Dd--pg- DDPG dieser
Dd--pd- DDPD diesen
Dd--pa- DDPA diese
Di-msn- DIMSN keiner
Di-msg- DIMSG keines
Di-msd- DIMSD keinem
Di-msa- DIMSA keinen
Di-fsn- DIFSN keine
Di-fsg- DIFSG keiner
Di-fsd- DIFSD keiner
Di-fsa- DIFSA keine
Di-nsn- DINSN keines
Di-nsg- DINSG keines
Di-nsd- DINSD keinem
Di-nsa- DINSA keines
Di--pn- DIPN keine
Di--pg- DIPG keiner
Di--pd- DIPD keinen
Di--pa- DIPA keine
Dp-msn- DPMSN meiner
Dp-msg- DPMSG meines
Dp-msd- DPMSD meinem
Dp-msa- DPMSA meinen
Dp-fsn- DPFSN meine
Dp-fsg- DPFSG meiner
Dp-fsd- DPFSD meiner
Dp-fsa- DPFSA meine
Dp-nsn- DPNSN meines
Dp-nsg- DPNSG meines
Dp-nsd- DPNSD meinem
Dp-nsa- DPNSA meines
Dp--pn- DPPN meine
Dp--pg- DPPG meiner
Dp--pd- DPPD meinen
Dp--pa- DPPA meine
Dt-msn- DTMSN welcher
Dt-msg- DTMSG welches
Dt-msd- DTMSD welchem
Dt-msa- DTMSA welchen
Dt-fsn- DTFSN welche
Dt-fsg- DTFSG welcher
Dt-fsd- DTFSD welcher
Dt-fsa- DTFSA welche
Dt-nsn- DTNSN welches
Dt-nsg- DTNSG welches
Dt-nsd- DTNSD welchem
Dt-nsa- DTNSA welches
Dt--pn- DTPN welche
Dt--pg- DTPG welcher
Dt--pd- DTPD welchen
Dt--pa- DTPA welche
Note:
We deliberately treat the Articles as an extra class, but related to
the Determiners. The syntactic behaviour of Articles and Determiners
in NPs is different, as can be seen by the adjective in the following
examples:
Ein kleines Haus
Welches kleine Haus ?
This difference supports our attitude towards separating articles and
determiners generally.
5.2.6 Articles (T)
-------------------
5.2.6.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type definite der d
indefinite ein i
------------ ----------- ----------- ----
Gender masculine der m
feminine die f
neuter das n
------------ ---------- ----------- ----
Number singular ein s
plural die p
------------ ----------- ----------- ----
Case nominative der n
genitive dessen g
dative dem d
accusative den a
============ =========== =========== ====
5.2.6.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
TD Td..- definite article
TI Ti.s- indefinite article
======= ================== ====================================
5.2.6.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Tdmsn TD der
Tdmsg TD des
Tdmsd TD dem
Tdmsa TD den
Tdfsn TD die
Tdfsg TD der
Tdfsd TD der
Tdfsa TD die
Tdnsn TD das
Tdnsn TD des
Tdnsd TD dem
Tdnsa TD das
Tdpn TD die
Tdpg TD der
Tdpd TD den
Tdpa TD die
Timsn TI einer
Timsg TI eines
Timsd TI einem
Timsa TI einen
Tifsn TI eine
Tifsn TI einer
Tifsd TI einer
Tifsa TI eine
Tinpn TI ein
Tinpg TI eines
Tinpd TI einem
Tinpa TI ein
========= ======= =============================================
5.2.7 Adverbs (R)
5.2.7.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type general frischweg g
degree sogar d this value has been added
interrog worum i this value has been added
conjunction mithin c this value has been added
modal scheinbar m this value has been added
pronom so p this value has been added
temporal heute t this value has been added
place hier l this value has been added
------------ ----------- ----------- ----
Degree positive hoch p
comparative h"oher c
superlative h"ochst s
============ =========== =========== ====
5.2.7.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
RG R[gdcmtl]. General adverb
RP Rp- pronominal adverb
RI Ri- interrogative adverb
======= ================== ====================================
5.2.7.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Rgp RG frischweg
Rp RP so
Rgs RG h"ochst
Rgc RG h"oher
Ri RI worum
========= ======= =============================================
5.2.8 Adpositions (S)
----------------------
5.2.8.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type pre an p
post wegen t this value has been added
circum von - an c this value has been added
part1 von a this value has been added
part2 an z this value has been added
------------- ----------- ----------- ----
Formation clitic ans c
simple an s
============ =========== =========== ====
In German, most prepositions precede the NP. However, there is a good
deal of them which follow the NP ("entlang") or enclose it ("von" NP
"an"). This behaviour is unpredictable and must therefore be marked
lexically. This is done by increasing the values of the "Type"
attribute. This extension can be considered as language specific, as
long as there is no evidence from other languages which supports this
distinction. For practical reasons of text segmentation, the two parts
of a circumposition have to be distinguished by different tags.
The attribute "formation" allows us to deal with clitic contraction of
preposition and article of the following NP ("zum", "ans")
5.2.8.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
SPS Sps pre-position, simple
STS Sts post-position, simple
SPC Spc pre-position, clitic
SC Sas circumposition, partI, simple
SC Szs circumposition, partII, simple
======= ================== ====================================
5.2.8.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Sps SPS an
Sts STS wegen
Spc SPC ans
Sas SC von
Szs SC an
========= ======= =============================================
5.2.9. Conjunctions (C)
-----------------------
5.2.9.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type coordinat. oder c
subordinat. als s
compar als v this value has been added
infinitive um i this value has been added
part1 entweder a this value has been added
part2 oder z this value has been added
============ =========== =========== ====
Comparative conjunctions are restricted to constitutents as arguments.
Subordinate conjunctions introducing an infinite clause lead to an
infinite verb form in sentence final position. The infinitive particle
may be incorporated, as in "wegzugehen". These morphosyntactic
features lead to an extension of the feature-Type values, which may be
considered as language specific.
There are also a few complex conjunctions which appear at the
beginning of both phrases which are conjoined by it ("entweder -
oder"). This feature has to be marked lexically, again by increasing
the values of the "Type" attribute. For practical reason of text
segmentation, the two parts of a complex conjunction have to be
distinguished by different tags.
5.2.9.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
CC Cc Cooordinative cunjunction
CS Cs Subordinative cunjunction
CI Ci Subord. conjunctions introd. an infinit. clause
CV Cv Comparative Conjunction
CA Ca Conjunction Part I
CZ Cz Conjunction Part II
======= ================== ====================================
5.2.9.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Cc CC aber
Cs CS als
Ci CI um
Cv CV als
========= ======= =============================================
5.2.10 Numerals (M)
--------------------
We do not support the class "Numerals". Possible elements of this
class will be treated as nouns, adjectives, or adverbs.
5.2.11 Interjection (I)
-------------------------
5.2.11.1 Lexicon
========= ===========
Tag Example
========= ===========
I oh
========= ===========
5.2.11.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
I I Interjection
======= ================== ====================================
5.2.11.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
I I oh
========= ======= =============================================
5.2.12 Add on classes
------------------------
5.2.12.1 Particle (Q)
5.2.12.1.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type infinitive zu i
superlative am s
verbal pref. hinzu v
============ =========== =========== ====
5.2.12.1.2 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
QS Qs superlative particle
QI Qi infinitive particle
QV Qv verbal prefix
======= ================== ====================================
Superlative particles precede the superlative form of adjectives, if
used predicatively (er ist am gr"o"sten), and adverbs (applies also to
Dutch).
Infinitive particles are found in each of the Germanic languages to be
described (Danish, Dutch, English, German). This should be accounted
for somewhere in the generic set of classes.
Verbal prefixes are a particular German (and Dutch) phenomenon and can
be considered to be language specific.
5.2.12.1.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Qi QI zu
Qs QS am
Qv QV hinzu
========= ======= =============================================
5.2.12.2 Punctuation (F)
--------------------------
5.2.12.2.1 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
FE Fe sentence final
FI Fi sentence internal
FA Fa Quot mark / parenthesis initial
FZ Fz Quot mark / parenthesis final
FB Fb Hyphen, underscore, dash
======= ================== ====================================
The use of Tags for punctuation sign is obvious for the stochastic
modelling of utterances. However, it is not clear whether they should
be treated lexically. Therefore, we can describe our corpus tags, but
not a lexical treatment of these units.
5.2.12.3 Abbreviations (Y)
-----------------------------
5.2.12.3.1 Lexicon
========= ===========
Tag Example
========= ===========
Y bzw.
========= ===========
5.2.12.3.2 Corpus
Lexical abbreviations do not have a particular corresponding Corpus
Tag, but are tagged according to the morphosyntactic behaviour of the
form written out in full.
5.2.12.3.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Y RG bzw.
Y NCFSN AG
======= ========= =============================================
5.2.12.4 Others (X)
--------------------
Other particular corpus tags can correspond to the common class of
Residual.
5.2.12.4.1 Corpus
======= ================== ====================================
Tag Regular expression Definition
======= ================== ====================================
SYM X Symbols (%, ' etc.)
EQ X Formulae (5x + 3y)
======= ================== ====================================
The application of the MULTEXT morphosyntactic enconding for lexical
descriptions and corpus tags to Spanish has been performed by the
Spanish group (Bel and Aguilar 1994), and has been revised during
phase B.
The proposed set of TAGS is not definitive. As repeteadly mentioned,
we understand that this set will have to be refined depending on the
results of application. These tags are the result of a comparison of
two tagset sources:
a. information supplied by the tool SAC (a corpus analysis tool which
includes a PoS rule based tagger and a lemmatizer) in form of
attribute-value pairs.
b. tagsets proposed by the CRATER project.
Therefore these tags have to be taken as a starting point for refinement.
5.3.1 Nouns (N)
----------------
5.3.1.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type common libro c
proper Juan p
------------ ----------- ----------- ----
Gender masculine hombre m
feminine mujer f
------------ ----------- ----------- ----
Number singular hombre s
plural mujeres p
------------ ----------- ----------- ----
Case /// /// -
============ =========== =========== ====
5.3.1.2 Corpus TAGS: comments
Crater tags for nouns also include semantic information such as
"A"(anthroponymous) or "T"(toponymous) for proper nouns, aswell as
"LOC"(ative), "MEA(sure)", etc. for common nouns. We will drop these
values in order to get our proposal which intends to be like the one
recommended in EAG-L1.
5.3.1.3 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Ncms- NCMS hombre
Ncmp- NCMP hombres
Ncfs- NCFS mujer
Ncfp- NCFP mujeres
Npms- NPMS Juan
Npmp- NPMP Paris
Npfs- NPFS Ana
Npfp- NPFP Pirineos
========= ======= =============================================
5.3.1.4 Conversion tables
========= ============
Reg.exp TAG
========= ============
Ncmp- NCMP
Ncms- NCMS
Ncfp- NCFP
Ncfs- NCFS
Nc.p- NCP
Ncf.- NCF
Ncm.- NCM
Nc.s- NCS
Np NP
5.3.2 Verbs (V)
----------------
5.3.2.1. Lexicon
============ =========== ========== ====
Attribute Value Example Code
============ =========== =========== ====
Status main comer m
auxiliar haber a
modal poder o
------------ ----------- ----------- ----
Mood indicative viene i
subjunctive venga s
imperative ven m
conditional vendri'a c
infinitive venir n
participle venido p
gerund viniendo g
------------ ----------- ----------- ----
Tense present vengo p
imperfect veni'as i
future vendre' f
past vino s
------------ ----------- ----------- ----
Person first soy 1
second eres 2
third es 3
------------ ----------- ----------- ----
Number singular viene s
plural venimos p
------------ ----------- ----------- ----
Gender masculine cantado m
feminine cantada f
------------ ----------- ----------- ----
Clitic both darselo t
accusative darlo a
dative darle d
============ =========== =========== ====
5.3.2.2 Corpus TAGS: comments
a. CRATER tags as well as our inhouse tags classify verb types into:
main/ser/estar/haber/modal. "Ser", "estar", "haber" (normally
considered auxiliaries) are more informative with respect to the
different constructions such as: perfect tenses, active/pasive
distinctions and the disambiguation between adjectives and past
participles.
Following French suggestion we could pass this information into a
language specific attribute as Lexical Class.
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Lex. class ser ser s | obviously
estar estar e | language-specific
haber haber h |
============ =========== =========== ====
5.3.2.3 Combinations
=========== ======= =============================================
Lexique Corpus Example
============ ======= =============================================
Van----t VANT haberselo
Van----d VAND haberle
Van----a VANA haberlo
Van---- VAN haber
Vap--pf VAPPF habidas
Vap--sf VAPSF habida
Vap--pm VAPPM habidos
Vap--sm VAPSM habido
Vag----t VAGT habiindoselo
Vag----d VAGD habiindole
Vag----a VAGA habiindolo
Vag---- VAG habiendo
=========== ========== =========== ==========
Example Lexique lemma Corpus
=========== ========== =========== ==========
llamarmeles Vmn----t llamar VMNT
llamarmele Vmn----t llamar VMNT
llamarmelas Vmn----t llamar VMNT
llamarmela Vmn----t llamar VMNT
llamarmelos Vmn----t llamar VMNT
llamarmelo Vmn----t llamar VMNT
llamarme Vmn----d llamar VMND
llamarteles Vmn----t llamar VMNT
llamartele Vmn----t llamar VMNT
llamartelas Vmn----t llamar VMNT
llamartela Vmn----t llamar VMNT
llamartelos Vmn----t llamar VMNT
llamartelo Vmn----t llamar VMNT
llamarte Vmn----d llamar VMND
llamarseles Vmn----t llamar VMNT
llamarsele Vmn----t llamar VMNT
llamarselas Vmn----t llamar VMNT
llamarsela Vmn----t llamar VMNT
llamarselos Vmn----t llamar VMNT
llamarselo Vmn----t llamar VMNT
llamarse Vmn----d llamar VMND
llamarnosles Vmn----t llamar VMNT
llamarnosle Vmn----t llamar VMNT
llamarnoslas Vmn----t llamar VMNT
llamarnosla Vmn----t llamar VMNT
llamarnoslos Vmn----t llamar VMNT
llamarnoslo Vmn----t llamar VMNT
llamarnos Vmn----d llamar VMND
llamarosles Vmn----t llamar VMNT
llamarosle Vmn----t llamar VMNT
llamaroslas Vmn----t llamar VMNT
llamarosla Vmn----t llamar VMNT
llamaroslos Vmn----t llamar VMNT
llamaroslo Vmn----t llamar VMNT
llamaros Vmn----d llamar VMND
llamarme Vmn----a llamar VMNA
llamarte Vmn----a llamar VMNA
llamarles Vmn----a llamar VMNA
llamarle Vmn----a llamar VMNA
llamarlas Vmn----a llamar VMNA
llamarla Vmn----a llamar VMNA
llamarlos Vmn----a llamar VMNA
llamarlo Vmn----a llamar VMNA
llamarnos Vmn----a llamar VMNA
llamaros Vmn----a llamar VMNA
llamar Vmn----- llamar VMN
llamadas Vmp--pf- llamar VMPPF
llamada Vmp--sf- llamar VMPSF
llamados Vmp--pm- llamar VMPPM
llamado Vmp--sm- llamar VMPSM
llamandomeles Vmg----t llamar VMGT
llamandomele Vmg----t llamar VMGT
llamandomelas Vmg----t llamar VMGT
llamandomela Vmg----t llamar VMGT
llamandomelos Vmg----t llamar VMGT
llamandomelo Vmg----t llamar VMGT
llamandome Vmg----d llamar VMGD
llamandoteles Vmg----t llamar VMGT
llamandotele Vmg----t llamar VMGT
llamandotelas Vmg----t llamar VMGT
llamandotela Vmg----t llamar VMGT
llamandotelos Vmg----t llamar VMGT
llamandotelo Vmg----t llamar VMGT
llamandote Vmg----d llamar VMGD
llamandoseles Vmg----t llamar VMGT
llamandosele Vmg----t llamar VMGT
llamandoselas Vmg----t llamar VMGT
llamandosela Vmg----t llamar VMGT
llamandoselos Vmg----t llamar VMGT
llamandoselo Vmg----t llamar VMGT
llamandose Vmg----d llamar VMGD
llamandonosles Vmg----t llamar VMGT
llamandonosle Vmg----t llamar VMGT
llamandonoslas Vmg----t llamar VMGT
llamandonosla Vmg----t llamar VMGT
llamandonoslos Vmg----t llamar VMGT
llamandonoslo Vmg----t llamar VMGT
llamandonos Vmg----d llamar VMGD
llamandonos Vmg----d llamar VMGD
llamandoosles Vmg----t llamar VMGT
llamandoosle Vmg----t llamar VMGT
llamandooslas Vmg----t llamar VMGT
llamandoosla Vmg----t llamar VMGT
llamandooslos Vmg----t llamar VMGT
llamandooslo Vmg----t llamar VMGT
llamandoos Vmg----d llamar VMGD
llamandome Vmg----a llamar VMGA
llamandote Vmg----a llamar VMGA
llamandoles Vmg----a llamar VMGA
llamandole Vmg----a llamar VMGA
llamandolas Vmg----a llamar VMGA
llamandola Vmg----a llamar VMGA
llamandolos Vmg----a llamar VMGA
llamandolo Vmg----a llamar VMGA
llamandonos Vmg----a llamar VMGA
llamandonos Vmg----a llamar VMGA
llamandoos Vmg----a llamar VMGA
llamando Vmg----- llamar VMG
llamo Vmip1s- llamar VMIP1S
llamas Vmip2s- llamar VMIP2S
llama Vmip3s- llamar VMIP3S
llamais Vmip2p- llamar VMIP2P
llaman Vmip3p- llamar VMIP3P
llamamos Vmip1p- llamar VMIP1P
llame Vmsp[13]s- llamar VMSPS
llames Vmsp2s- llamar VMSP2S
llamemos Vmsp1p- llamar VMSP1P
llamiis Vmsp2p- llamar VMSP2P
llamen Vmsp3p- llamar VMSP3P
llami Vmif1s- llamar VMIF1S
llamaste Vmif2s- llamar VMIF2S
llams Vmif3s- llamar VMIF3S
llamasteis Vmif2p- llamar VMIF2P
llamaron Vmif3p- llamar VMIF3P
llamamos Vmif1p- llamar VMIF1P
llamaba Vmii[13]s- llamar VMIIS
llamabas Vmii2s- llamar VMII2S
llamabamos Vmii1p- llamar VMII1P
llamabais Vmii2p- llamar VMII2P
llamaban Vmii3p- llamar VMII3P
llamara Vmsi[13]s- llamar VMSIS
llamaras Vmsi2s- llamar VMSI2S
llamaramos Vmsi1p- llamar VMSI1P
llamarais Vmsi2p- llamar VMSI2P
llamaran Vmsi3p- llamar VMSI3P
llamase Vmsi[13]s- llamar VMSIS
llamases Vmsi2s- llamar VMSI2S
llamasemos Vmsi1p- llamar VMSI1P
llamaseis Vmsi2p- llamar VMSI2P
llamasen Vmsi3p- llamar VMSI3P
llamari Vmis1s- llamar VMIS1S
llamaras Vmis2s- llamar VMIS2S
llamara Vmis3s- llamar VMIS3S
llamaremos Vmis1p- llamar VMIS1P
llamariis Vmis2p- llamar VMIS2P
llamaran Vmis3p- llamar VMIS3P
llamarma Vmc-[13]s- llamar VMCS
llamarmas Vmc-2s- llamar VMC2S
llamarmamos Vmc-1p- llamar VMC1P
llamarmais Vmc-2p- llamar VMC2P
llamarman Vmc-3p- llamar VMC3P
5.3.2.4. Conversion tables ========= ========== Reg.expr. TAG ========= ========== Van----t VANT Van----d VAND Van----a VANA Van---- VAN Vap--pf VAPPF Vap--sf VAPSF Vap--pm VAPPM Vap--sm VAPSM Vag----t VAGT Vag----d VAGD Vag----a VAGA Vag---- VAG Vaip1s- VAIP1S Vaip2s- VAIP2S Vaip3s- VAIP3S Vaip2p- VAIP2P Vaip3p- VAIP3P Vaip1p- VAIP1P Vasp[13]s- VASPS Vasp2s- VASP2S Vasp1p- VASP1P Vasp2p- VASP2P Vasp3p- VASP3P Vais1s- VAIS1S Vais2s- VAIS2S Vais3s- VAIS3S Vais2p- VAIS2P Vais3p- VAIS3P Vais1p- VAIS1P Vaii[13]s- VAIIS Vaii2s- VAII2S Vaii1p- VAII1P Vaii2p- VAII2P Vaii3p- VAII3P Vasi[13]s- VASIS Vasi2s- VASI2S Vasi1p- VASI1P Vasi2p- VASI2P Vasi3p- VASI3P Vasi[13]s- VASIS Vasi2s- VASI2S Vasi1p- VASI1P Vasi2p- VASI2P Vasi3p- VASI3P Vaif1s- VAIF1S Vaif2s- VAIF2S Vaif3s- VAIF3S Vaif1p- VAIF1P Vaif2p- VAIF2P Vaif3p- VAIF3P Vac[13]s- VACS Vac2s- VAC2S Vac1p- VAC1P Vac2p- VAC2P Vac3p- VAC3P Von----t VONT Von----d VOND Von----a VONA Von---- VON Vop--pf VOPPF Vop--sf VOPSF Vop--pm VOPPM Vop--sm VOPSM Vog----t VOGT Vog----d VOGD Vog----a VOGA Vog---- VOG Voip1s- VOIP1S Voip2s- VOIP2S Voip3s- VOIP3S Voip2p- VOIP2P Voip3p- VOIP3P Voip1p- VOIP1P Vosp[13]s- VOSPS Vosp2s- VOSP2S Vosp1p- VOSP1P Vosp2p- VOSP2P Vosp3p- VOSP3P Vois1s- VOIS1S Vois2s- VOIS2S Vois3s- VOIS3S Vois2p- VOIS2P Vois3p- VOIS3P Vois1p- VOIS1P Voii[13]s- VOIIS Voii2s- VOII2S Voii1p- VOII1P Voii2p- VOII2P Voii3p- VOII3P Vosi[13]s- VOSIS Vosi2s- VOSI2S Vosi1p- VOSI1P Vosi2p- VOSI2P Vosi3p- VOSI3P Vosi[13]s- VOSIS Vosi2s- VOSI2S Vosi1p- VOSI1P Vosi2p- VOSI2P Vosi3p- VOSI3P Voif1s- VOIF1S Voif2s- VOIF2S Voif3s- VOIF3S Voif1p- VOIF1P Voif2p- VOIF2P Voif3p- VOIF3P Voc[13]s- VOCS Voc2s- VOC2P Voc1p- VOC1P Voc2p- VOC2P Voc3p- VOC3P Vmn----t VMNT Vmn----d VMND Vmn----a VMNA Vmn---- VMN Vmp--pf VMPPF Vmp--sf VMPSF Vmp--pm VMPPM Vmp--sm VMPSM Vmg----t VMGT Vmg----d VMGD Vmg----a VMGA Vmg---- VMG Vmip1s- VMIP1S Vmip2s- VMIP2S Vmip3s- VMIP3S Vmip2p- VMIP2P Vmip3p- VMIP3P Vmip1p- VMIP1P Vmsp[13]s- VMSPS Vmsp2s- VMSP2S Vmsp1p- VMSP1P Vmsp2p- VMSP2P Vmsp3p- VMSP3P Vmis1s- VMIS1S Vmis2s- VMIS2S Vmis3s- VMIS3S Vmis2p- VMIS2P Vmis3p- VMIS3P Vmis1p- VMIS1P Vmii[13]s- VMIIS Vmii2s- VMII2S Vmii1p- VMII1P Vmii2p- VMII2P Vmii3p- VMII3P Vmsi[13]s- VMSIS Vmsi2s- VMSI2S Vmsi1p- VMSI1P Vmsi2p- VMSI2P Vmsi3p- VMSI3P Vmsi[13]s- VMSIS Vmsi2s- VMSI2S Vmsi1p- VMSI1P Vmsi2p- VMSI2P Vmsi3p- VMSI3P Vmif1s- VMIF1S Vmif2s- VMIF2S Vmif3s- VMIF3S Vmif1p- VMIF1P Vmif2p- VMIF2P Vmif3p- VMIF3P Vmc[13]s- VMCS Vmc2s- VMC2S Vmc1p- VMC1P Vmc2p- VMC2P Vmc3p- VMC3P =========== ======= ========================================
5.3.3 Adjectives (A)
---------------------
5.3.3.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type qualificat. bueno f
possessive vuestro s
------------ ----------- ----------- ----
Degree positive bueno p
comparative mejor c
superlative bueni'simo s
------------ ----------- ----------- ----
Gender masculine bueno m
feminine buena f
------------ ----------- ----------- ----
Number singular bueno s
plural buenas p
------------ ----------- ----------- ----
Case /// /// -
============ =========== =========== ====
Comments.
Possesive adjectives codification is still being considered.
5.3.3.2 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Afpms- AMS bueno
Afpmp- AMP buenos
Afpfs- AFS buena
Afpfp- AFP buenas
Afc.s- AS (el/la) mejor
Afc.p- AP (los/las) mejores
Afsms- AMS interesanti'simo
Afsmp- AMP interesanti'simos
Afsfs- AFS interesanti'sima
Afsfp- AFP interesanti'simas
========= ======= =============================================
5.3.3.3. Conversion tables
=========== ==========
Reg.Expr. Corpus
====== =========
Afp.p- APP
Afp.s- APS
Afpfp- APFP
Afpfs- APFS
Afpmp- APMP
Afpms- APMS
Afsfp- ASFP
Afsfs- ASFS
Afsmp- ASMP
Afsms- ASMS
Afc.p- ACP
Afc.s- ACS
5.3.4 Pronouns (P)
-------------------
5.3.4.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type personal yo p
demonstrat. este d
indefinite alguno i
possessive (el) tuyo s
interrog. que' t
relative que r
reflexive se x
------------ ----------- ----------- ----
Person first yo 1
second tu' 2
third e'l 3
------------ ----------- ----------- ----
Gender masculine esto, el m
feminine esta, ella f
------------ ----------- ----------- ----
Number singular alguno s
plural algunos p
------------ ----------- ----------- ----
Case nominative el n
dative le d
accusative lo a
oblique mi',conmigo o
------------ ----------- ----------- ----
Possessor singular mi'o s
plural nuestro p
============ =========== =========== ====
5.3.4.2. Corpus TAGS: comments
a. CRATER TAGs make a distinction when proximal or remote deixis
exists. We do not consider these values.
b. "se", which can be both dative and accusative depending on the
existence of an 'external' accusative:
Ella se lava (She washes herself)
Ella se lava las manos (She washes her hands)
There is also an oblique reflexive pronoun "si'".
TAGs would be:
Px3..-a PX3SA se
Px3..-d PX3SO se
Px3.s-o PX3SO si'
c. Relative pronouns are a good example of rearranging of linguistic
information when comparing with French. For example, due to in Spanish
there exists a relative pronoun which is also a possessive one (cuyo,
cuya, cuyos, cuyas, eng. "whose") possessor values have been marked.
5.3.4.3 Combinations
=========== ========== =========== ==========
Example Lexique lemma Corpus
=========== ========== =========== ==========
yo Pp1-sn- yo PP1SN
tu' Pp2-sn- tu' PP2SN
ustedes Pp3-p[no]- usted PP3PN
usted Pp3-s[no]- usted PP3SN
il Pp3ms[no]- il PP3MS
ellas Pp3fp[no]- il PP3FP
ella Pp3fs[no]- il PP3FS
ellos Pp3mp[no]- il PP3MP
ello Pp3ms[no]- il PP3MS
nosotras Pp1fp[no]- nosotros PP1FP
nosotros Pp1mp[no]- nosotros PP1MP
vosotras Pp2fp[no]- vosotros PP2FP
vosotros Pp2mp[no]- vosotros PP2MP
conmigo Pp1-so- conmigo PP1SO
mi' Pp1-so- mi' PP1SO
contigo Pp2-so- contigo PP2SO
ti Pp2-so- ti' PP2SO
si' Pp3-so- si' PP3SO
mismas Px.fpo- mismo PXFPO
misma Px.fso- mismo PXFSO
mismos Px.mpo- mismo PXMPO
mismo Px.mso- mismo PXMSO
me P[px]1.s[ad]- me P1S
te P[px]2.s[ad]- te P2S
se P[pxl]3..[ad]-se P3
les Pp3.pd- le PP3PD
le Pp3. sd-le PP3SD
las Pp3fpa- lo PP3FPA
la Pp3fsa- lo PP3FSA
los Pp3mpa- lo PP3MPA
lo Pp3msa- lo PP3MSA
nos P[pxl]1.p[ad]-nos P1P
os P[pxl]2-p[ad]-os P2P
istas Pd-fp-- iste PDFP
ista Pd-fs-- iste PDFS
istos Pd-mp-- iste PDMP
isto Pd-ms-- iste PDMS
iste Pd-ms-- iste PDMS
estas Pd-fp-- este PDFP
esta Pd-fs-- este PDFS
estos Pd-mp-- este PDMP
esto Pd-ms-- este PDMS
este Pd-ms-- este PDMS
isas Pd-fp-- ise PDFP
isa Pd-fs-- ise PDFS
isos Pd-mp-- ise PDMP
iso Pd-ms-- ise PDMS
ise Pd-ms-- ise PDMS
esas Pd-fp-- ese PDFP
esa Pd-fs-- ese PDFS
esos Pd-mp-- ese PDMP
eso Pd-ms-- ese PDMS
ese Pd-ms-- ese PDMS
aquillas Pd-fp-- aquil PDFP
aquilla Pd-fs-- aquil PDFS
aquillos Pd-mp-- aquil PDMP
aquillo Pd-ms-- aquil PDMS
aquellas Pd-fp-- aquel PDFP
aquella Pd-fs-- aquel PDFS
aquellos Pd-mp-- aquel PDMP
aquello Pd-ms-- aquel PDMS
aquel Pd-ms-- aquel PDS
cuales Pr--p-- cual PRP
cual Pr--s-- cual PRS
cuales Pt--p-- cual PTP
cual Pt--s-- cual PTS
cuyas Pr-f.-p cuyo PRFP
cuya Pr-f.-s cuyo PRFS
cuyos Pr-m.-p cuyo PRMP
cuyo Pr-m.-s cuyo PRMS
quienes Pr--p-- quien PRP
quien Pr--s-- quien PRS
quiines Pt--p-- quiin PTP
quiin Pt -s-- quiin PTS
que Pr--.-- que PR
qui Pt--.-- qui PI
suyas Ps3fp-. suyo PS3FP
suya Ps3fs-. suyo PS3FS
suyos Ps3mp-. suyo PS3MP
suyo Ps3ms-. suyo PS3MS
tuyas Ps2fp-s tuyo PS2FPS
tuya Ps2fs-s tuyo PS2FSS
tuyos Ps2mp-s tuyo PS2MPS
tuyo Ps2ms-s tuyo PS2MSS
mmas Ps1fp-s mmo PS1FPS
mma Ps1fs-s mmo PS1FSS
mmos Ps1mp-s mmo PS1MPS
mmo Ps1ms-s mmo PS1MSS
nuestras Ps1fp-p nuestro PS1FPP
nuestra Ps1fs-p nuestro PS1FSP
nuestros Ps1mp-p nuestro PS1MPP
nuestro Ps1ms-p nuestro PS1MSP
vuestras Ps2fp-p vuestro PS2FPP
vuestra Ps2fs-p vuestro PS2FSP
vuestros Ps2mp-p vuestro PS2MPP
vuestro Ps2ms-p vuestro PS2MSP
algunas Pi-fp-- algzn PIFP
alguna Pi-fs-- algzn PIFS
algunos Pi-mp-- algzn PIMP
alguno Pi-ms-- algzn PIMS
ningunas Pi-fp-- ningzn PIFP
ninguna Pi-fs-- ningzn PIFS
ningunos Pi-mp-- ningzn PIMP
ninguno Pi-ms-- ningzn PIMS
ambos Pi-mp-- ambos PIMP
ambas Pi-fp-- ambas PIFP
muchas Pi-fp-- mucho PIFP
mucha Pi-fs-- mucho PIFS
muchos Pi-mp-- mucho PIMP
mucho Pi-ms-- mucho PIMS
nada Pi----- nada PI
nadie Pi----- nadie PI
otras Pi-fp-- otro PIFP
otra Pi-fs-- otro PIFS
otros Pi-mp-- otro PIMP
otro Pi-ms-- otro PIMS
pocas Pi-fp-- poco PIFP
poca Pi-fs-- poco PIFS
pocos Pi-mp-- poco PIMP
poco Pi-ms-- poco PIMS
todas Pi-fp-- todo PIFP
toda Pi-fs-- todo PIFS
todos Pi-mp-- todo PIMP
todo Pi-ms-- todo PIMS
varios Pi-mp-- varios PIMP
varias Pi-mp-- varias PIMP
5.3.5 Determiners (D)
---------------------
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type demonstrat. este d
indefinite cierto i
possessive mi s
interrog. que' t
------------ ----------- ----------- ----
Person first mi 1
second tu 2
third su 3
------------ ----------- ----------- ----
Gender masculine el m
feminine la f
------------ ----------- ----------- ----
Number singular el s
plural los p
------------ ----------- ----------- ----
Possessor singular mi s
plural nuestro p
============ =========== =========== ====
5.3.5.2 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Ds1.ss-- DS1SS mi (taza - libro)
Ds1.ps-- DS1PS mis (tazas - libros)
Ds1fsp-- DS1FSP nuestra (taza)
Ds1fpp-- DS1FPP nuestras (tazas)
Ds1msp-- DS1MSP nuestro (libro)
Ds1mpp-- DS1MPP nuestros (libros)
Ds2.ss-- DS2SS tu (taza -libro)
Ds2.ps-- DS2PS tus (tazas -libros)
Ds2fsp-- DS2FSP vuestra (taza)
Ds2fpp-- DS2FPP vuestras (tazas)
Ds2msp-- DS2MSP vuestro (libro)
Ds2mpp-- DS2MPP vuestros (libros)
Ds3.s.-- DS3S su (taza - libro)
Ds3.p.-- DS3P sus (tazas -libros)
Dd-fs--- DDFS esta, esa, aquella
Dd-ms--- DDMS este, ese, aquel
Dd-fp--- DDFP estas, esas, aquellas
Dd-mp--- DDMP estos, esos, aquellos
Di------ DI cada, cualquier
Di-fs--- DIFS alguna, ninguna, cierta
Di-ms--- DIMS algu'n, ningu'n, cierto
Di-fp--- DIFP algunas, ningunas, ciertas
Di-mp--- DIMP algunos, ningunos, ciertos
Dt-fs--- DTFS cua'nta
Dt-ms--- DTMS cua'nto
Dt-fp--- DTFP cua'ntas
Dt-mp--- DTMP cua'ntos
========= ======= =============================================
5.3.5.3. Conversion tables
========= ============
Reg.exp TAG
========= ============
Dd-fp-- DDFP
Dd-fs-- DDFS
Dd-mp-- DDMP
Dd-ms-- DDMS
Di----- DI
Di-.s-- DIS
Di-fp-- DIFP
Di-fs-- DIFS
Di-mp-- DIMP
Di-ms-- DIMS
Ds1.p-s DS1PS
Ds1.s-s DS1SS
Ds1fp-p DS1FPP
Ds1fs-p DS1FSP
Ds1mp-p DS1MPP
Ds1ms-p DS1MSP
Ds2.p-s DS2PS
Ds2.s-s DS2SS
Ds2fp-p DS2FPP
Ds2fs-p DS2FSP
Ds2mp-p DS2MPP
Ds2ms-p DS2MSP
Ds3.p-. DS3P
Ds3.s-. DS3S
Dt-fp-- DTFP
Dt-fs-- DTFS
Dt-mp-- DTMP
Dt-ms-- DTMS
5.3.6 Articles (T)
==================
5.3.6.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type definite el d
indefinite un i
------------ ----------- ----------- ----
Gender masculine el m
feminine la f
------------ ----------- ----------- ----
Number singular el s
plural los p
------------ ----------- ----------- ----
Case /// /// ///
============ =========== =========== ====
========= ============
Reg.exp TAG
========= ============
Tifp- TIFP
Tifs- TIFS
Timp- TIMP
Tims- TIMS
Tdms- TDMS
Tdfp- TDFP
Tdfs- TDFS
Tdmp- TDMP
5.3.7 Adverbs (R)
------------------
5.3.7.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type general muy g
particle no p
------------ ----------- ----------- ----
Degree positive muy p
comparative ma's c
superlative muchi'simo s
============ =========== =========== ====
5.3.7.2 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Rgp RG mucho
Rgc RG ma's
Rgn RG nunca
Rpn RP no
========= ======= =============================================
========= ============
Reg.exp TAG
========= ============
R R
Rg RG
Rgp RG
Rgc RG
5.3.8 Adpositions (S)
----------------------
5.3.8.1. Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type preposition en, de p
------------ ----------- ----------- ----
Formation compound y
simple n
============ =========== =========== ====
5.3.7.2 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Sp SP en
========= ======= =============================================
5.3.9 Conjunctions (C)
-----------------------
5.3.9.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type coordinat. y c
subordinat. que s
============ =========== =========== ====
5.3.8.2 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
Cc C pero, o, y
Cs C que
========= ======= =============================================
5.3.10 Numerals (M)
--------------------
5.3.10.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type cardinal dos c
ordinal segundo o
------------ ----------- ----------- ----
Gender masculine un m
feminine una f
------------ ----------- ----------- ----
Number singular un s
plural dos p
------------ ----------- ----------- ----
Case /// /// -
============ =========== =========== ====
5.3.9.2 Combinations
======== ======== ========= ==========
Example Lexique lema Corpus
======== ======== ========= ==========
primeras Mofp- primero MOFP
primera Mofs- primero MOFS
primeros Momp- primero MOMP
primero Moms- primero MOMS
uno Mcms- uno MCMS
una Mcfs- uno MCFS
dos Mc.p- dos MCP
doscientas Mcfp- doscientos MCFP
doscientos Mcmp- doscientos MCMP
======== ======== ========= ==========
5.3.11 Interjection (I)
------------------------
5.3.11.1 Lexicon
========= ===========
Tag Example
========= ===========
I eh
========= ===========
5.3.11.2 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
I I eh, ah, oh
========= ======= =============================================
5.3.12 Unique membership class (U)
-----------------------------------
None.
5.3.13 Residual (X)
--------------------
5.3.13.1 Lexicon
============ ==============
Tag Example
============ ==============
X symbols, etc.
============ ==============
5.3.12.2 Combinations
========= ======= =============================================
Lexique Corpus Example
========= ======= =============================================
X X symbols, etc.
========= ======= =============================================
In this section the proposed encoding is applied to French (Veronis et al. 1994).
5.4.1 Nouns (N)
----------------
5.4.1.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type common livre c
proper Jean p
------------ ----------- ----------- ----
Gender masculine homme m
feminine femme f
------------ ----------- ----------- ----
Number singular hommes s
plural femmes p
------------ ----------- ----------- ----
Case /// /// -
============ =========== =========== ====
5.4.1.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
NCMS Ncms- Common noun, masc. sing.
NCMP Ncmp- Common noun, masc. plur.
NCFS Ncfs- Common noun, fem. sing.
NCFP Ncfp- Common noun, fem. plur.
NPMS Npms- Proper noun, masc. sing.
NPMP Npmp- Proper noun, masc. plur.
NPFS Npfs- Proper noun, fem. sing.
NPFP Npfp- Proper noun, fem. plur.
================== ====================================
5.4.1.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
Ncms- NCMS homme
Ncmp- NCMP hommes
Ncfs- NCFS femme
Ncfp- NCFP femmes
Npms- NPMS Jean
Npmp- NPMP Pays-bas
Npfs- NPFS Anne
Npfp- NPFP Pyrenees
====================================================
Note
It is not clear that all proper nouns should receive gender and number
information. In addition, even if this information exists, it might be
difficult to find it automatically in corpora for unknown proper
nouns. We must experiment to see if a single tag NP would be better.
5.4.2 Verbs (V)
----------------
5.4.2.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type main partir m
auxiliary avoir a
------------ ----------- ----------- ----
Mood/Vform indicative viens i
subjunctive vienne s
imperative viens m
conditional viendrais c
infinitive venir n
participle venu p
------------ ----------- ----------- ----
Tense present viens p
imperfect venais i
future viendrai f
past vins s
------------ ----------- ----------- ----
Person first suis 1
second es 2
third est 3
------------ ----------- ----------- ----
Number singular viens s
plural venons p
------------ ----------- ----------- ----
Gender masculine venu m
feminine venue f
------------ ----------- ----------- ----
Clitics /// /// -
============ =========== =========== ====
Notes
a. Conditional
Conditional is often considered as a tense rather than a mood.
Encoding decision may change on this.
b. Auxiliaries
It is possible that we need to discriminate between "avoir" and "e^tre"
auxiliaries. In that case, we could add a Lexical Class attribute:
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Lex. class e^tre e^tre e
avoir avoir a
Note : obviously language-specific
============ =========== =========== ====
c. Past participle
We decided to encode the past participle of the auxiliary verb
"e^tre", the copulative verbs (e.g. "sembler") and impersonal verbs
(e.g."falloir"), which do not agree in gender, with the value not
applicable (-) :
e'te' : Va---ps--, semble' : Vm---ps--
5.4.2.2 Corpus
====== =================== ============================
Tag Regular expression Definition
====== =================== ============================
VA1P Va[iscm][pifs]1p-- Aux. verb 1st person plur.
VA1S Va[iscm][pifs]1s-- Aux. verb 1st person sing.
VA2P Va[iscm][pifs]2p-- Aux. verb 2nd person plur.
VA2S Va[iscm][pifs]2s-- Aux. verb 2nd person sing.
VA3P Va[iscm][pifs]3p-- Aux. verb 3rd person plur.
VA3S Va[iscm][pifs]3s-- Aux. verb 3rd person sing.
VAPSPF Vaps-pf- Aux. verb past part. plur. fem.
VAPSSF Vaps-sf- Aux. verb past part. sing. fem.
VAPSPM Vaps-pm- Aux. verb past part. plur. masc.
VAPSSM Vaps-sm- Aux. verb past part. sing. masc.
VAPS Vaps---- Aux. verb past part.
VAPP Vapp---- Aux. verb pres. part.
VAN Van----- Aux. verb infinitive
VM1P Vm[iscm][pifs]1p-- Main.verb 1st person plur.
VM1S Vm[iscm][pifs]1s-- Main.verb 1st person sing.
VM2P Vm[iscm][pifs]2p-- Main.verb 2nd person plur.
VM2S Vm[iscm][pifs]2s-- Main.verb 2nd person sing.
VM3P Vm[iscm][pifs]3p-- Main.verb 3rd person plur.
VM3S Vm[iscm][pifs]3s-- Main.verb 3rd person sing.
VMPSPF Vmps-pf- Main.verb past part. plur. fem.
VMPSSF Vmps-sf- Main.verb past part. sing. fem.
VMPSPM Vmps-pm- Main.verb past part. plur. masc.
VMPSSM Vmps-sm- Main.verb past part. sing. masc.
VMPS Vmps---- Main.verb past part.
VMPP Vmpp---- Main.verb pres. part.
VMN Vmn----- Main.verb infinitive
================== ====================================
5.4.2.3 Combinations ======= ============================================= Lexique Corpus Example ======= ============================================= INFINITIVE Vmn----- VMN venir Van----- VAN e^tre, avoir PRESENT PARTICIPLE Vmpp---- VMPP venant Vapp---- VAPP e'tant, ayant PAST PARTICIPLE Vmps---- VM??PS semble' Vmps-pf- VMFPPS venues Vmps-sf- VMFSPS venue Vmps-pm- VMMPPS venus Vmps-sm- VMMSPS venu Vaps---- VA??PS e'te' Vaps-pf- VAFPPS eues Vaps-sf- VAFSPS eue Vaps-pm- VAMPPS eus Vaps-sm- VAMSPS eu INDICATIVE, PRESENT Vmip1s- VM1S viens Vmip2s- VM2S viens Vmip3s- VM3S vient Vmip1p- VM1P venons Vmip2p- VM2P venez Vmip3p- VM3P viennent Vaip1s- VA1S suis, ai aip2s- VA2S es, as Vaip3s- VA3S3 est, a Vaip1p- VA1P sommes, avons Vaip2p- VA2P e^tes, avez Vaip3p- VA3P sont, ont INDICATIVE, IMPERFECT Vaii1s- VA1S e'tais, avais Vaii2s- VA2S e'tais, avais Vaii3s- VA3S3 e'tait, avait Vaii1p- VA1P e'tions, avions Vaii2p- VA2P e'tiez, aviez Vaii3p- VA3P e'taient, avaient Vmii1s- VM1S venais Vmii2s- VM2S venais Vmii3s- VM3S venait Vmii1p- VM1P venions Vmii2p- VM2P veniez Vmii3p- VM3P venaient INDICATIVE, FUTURE Vaif1s- VA1S serai, aurai Vaif2s- VA2S seras, auras Vaif3s- VA3S3 sera, aura Vaif1p- VA1P serons, aurons Vaif2p- VA2P serez, aurez Vaif3p- VA3P seront, auront Vmif1s- VM1S viendrai Vmif2s- VM2SS viendras Vmif3s- VM3S3 viendra Vmif1p- VM1PS viendrons Vmif2p- VM2P viendrez Vmif3p- VM3PS viendront INDICATIVE, PERFECT Vais1s- VA1S fus, eus Vais2s- VA2S fus, eus Vais3s- VA3S3 fut, eut Vais1p- VA1P fu^mes, eu^mes Vais2p- VA2P fu^tes, eu^tes Vais3p- VA3P furent, eurent Vmis1s- VM1SS vins Vmis2s- VM2S vins Vmis3s- VM3S3 vint Vmis1p- VM1P vinmes Vmis2p- VM2P vintes Vmis3p- VM3P vinrent SUBJONCTIVE, PRESENT Vasp1s- VA1S sois, aie Vasp2s- VA2S sois, aies Vasp3s- VA3S soit, ait Vasp1p- VA1P soyons, ayons Vasp2p- VA2P soyez, ayez Vasp3p- VA3P soient, avaient, e'taient Vmsp1s- VM1S finisse Vmsp2s- VM2S finisse Vmsp3s- VM3S finisse Vmsp1p- VM1P finissions Vmsp2p- VM2P finissiez Vmsp3p- VM3P finissent SUBJONCTIVE, IMPERFECT Vasi1s- VA1S fusse, eusse Vasi2s- VA2S fusses, eusses Vasi3s- VA3S3 fu^t, eu^t Vasi1p- VA1P fussions, eussions Vasi2p- VA2P fussiez, eussiez Vasi3p- VA3P fussent, eussent Vmsi1s- VM1S finisse Vmsi2s- VM2S finisse Vmsi3s- VM3S finit Vmsi1p- VM1P finissions Vmsi2p- VM2P finissiez Vmsi3p- VM3P finissent CONDITIONAL, PRESENT Vacp1s- VA1S serais, aurais Vacp2s- VA2S serais, aurais Vacp3s- VA3S serait, aurait Vacp1p- VA1P serions, aurions Vacp2p- VA2P seriez, auriez Vacp3p- VA3P seraient, auraient Vmcp1s- VM1S viendrais Vmcp2s- VM2SS viendrais Vmcp3s- VM3SS viendrait Vmcp1p- VM1P viendrions Vmcp2p- VM2PS viendriez Vmcp3p- VM3P viendraient IMPERATIVE, PRESENT Vamp2s- VA2S sois, aie Vamp1p- VA1P soyons, ayons Vamp2p- VA2P soyez, ayez Vmmp2s- VM2S viens Vmmp1p- VM1P venons Vmmp2p- VM2P venez ====================================
5.4.3 Adjectives (A)
---------------------
5.4.3.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type qualificat. bon f
ordinal deuxie^me o
cardinal deux c
indefinite quelconque i
possessive mien s
------------ ----------- ----------- ----
Degree positive bon p
comparative meilleur c
------------ ----------- ----------- ----
Gender masculine bon m
feminine bonne f
------------ ----------- ----------- ----
Number singular bon s
plural bons p
------------ ----------- ----------- ----
Case /// /// -
Notes
a. Degree
We encode Degree for compatibility with other languages, but the
distinction positive/comparative applies only to two adjectives in
French: "bon" and "mauvais". All other adjectives form their
comparatives with "plus" + adjective (e.g., "plus grand"). Superlative
is also a compound form ("le" + comparative, e.g. "le plus grand").
b. Possessor
We could add attributes for person and number of possessor.
c. Cardinal
The use of this value in french is still being considered as it seems
perfectly redundant with the category numeral.
5.4.3.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
AFP A..fp- Adjective fem. plur.
AFS A..fs- Adjective fem. sing.
AMP A..mp- Adjective masc. plur.
AMS A..ms- Adjective masc. sing.
================== ====================================
5.4.3.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
Afcfp- AFP meilleures
Afcfs- AFS meilleure
Afcmp- AMP meilleurs
Afcms- AMS meilleur
Afpfp- AFP bonnes
Afpfs- AFS bonne
Afpmp- AMP bons
Afpms- AMS bon
Ai-fp- AFP certaines, me^mes, quelconques
Ai-fs- AFS certane, me^me, quelconque
Ai-mp- AMP certain, me^mes, quelconques
Ai-ms- AMS certain, me^me, quelconque
Ac-fp- AFP deux
Ac-fs- AFS une
Ac-mp- AMP deux
Ac-ms- AMS un
Ao-fp- AFP premi'eres
Ao-fs- AFS premi'ere
Ao-mp- AMP premiers
Ao-ms- AMS premier
======= =============================================
Lexique Corpus Example
======= =============================================
As-fp- AFP leurs, miennes,tiennes,siennes, no^tres,
vo^tres
As-fs- AFS leur, mienne, tienne, sienne, no^tre,vo^tre
As-mp- AMP leurs, miens, tiens, siens, no^tres,
vo^tres
As-ms- AMS leur, mien, tien, sien, no^tre, vo^tre
========================================================
5.4.4 Pronouns (P)
-------------------
5.4.4.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type personal je p
demonstrat. celui d
indefinite certain i
possessive le_mien s
interrog. lequel t
relative quel r
------------ ----------- ----------- ----
Person first je 1
second tu 2
third il 3
------------ ----------- ----------- ----
Gender masculine cet,il m
feminine cette,elle f
neutre ce n
------------ ----------- ----------- ----
Number singular certain s
plural certains p
------------ ----------- ----------- ----
Case nominative il n
object le, lui j
oblique moi o
------------ ----------- ----------- ----
Possessor singular mon s
plural nos p
============ =========== =========== ====
Notes
a. Possessive
Possessive pronouns are compound forms only ("le mien"). The form
"mien" is an adjective (see note supra).
b. Case
The case system proposed by EAGLES (nominative, accusative, dative,
oblique, etc.) does not map readily to French personal pronouns. The
usual typology is the following:
subject je, tu, il, elle, nous, vous, ils, elles
object me, te, le, la, lui, se, nous, vous, les, leur,
se
other moi, toi, lui, elle, soi, nous, vous, eux,
elles, soi
The category "other" corresponds to reinforcement of subject or object
("Moi, je le dis"), attribute ("C'est moi"), etc.
We could use the following mapping:
Nominative --> subject
Accusative --> direct object
Dative --> indirect object
Oblique --> other
However, this solution splits "object" in "direct" and "indirect", and
this distintion is valid only for the 3rd person pronouns in French
(direct: le, la, les; indirect: lui, leur). Encoding this distinction
would duplicate all other forms (direct:me, te, etc.; indirect: me,
te, etc.).
We have therefore added one value to the case system proposed: the
value "Object".
c. New values exclamative, reflexive, reciprocal
The addition of those new values for the attribute type has not yet
been considered in French. It is clear that the value "exclamative"
would be more useful for the Determiner category (where it is merged
with the interrogative value). As for reflexive and reciprocal values,
they may be redundant with the codes using the case-value "j"
(object), applied to personal pronouns.
d. Agglutination
The presence of disjuncted lexical units among pronouns has led to
their lexicalisation for the sake of consistency of some paradigms :
for instance the paradigm "auquel", "auxquels" and "auxquelles" is
completed with the unit _laquelle", which is not completely
satisfactory.
5.4.4.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
PDFP Pd-fp-- Demonstrative pronoun fem. plur.
PDFS Pd-fs-- Demonstrative pronoun fem. plur.
PDMP Pd-mp-- Demonstrative pronoun masc. plur.
PDMS Pd-[mn]s-- Demonstrative pronoun masc. sing.
PNFP Pn-fp-- Indefinite pronoun fem. plur.
PNFS Pn-fs-- Indefinite pronoun fem. plur.
PNMP Pn-mp-- Indefinite pronoun masc. plur.
PNMS Pn-ms-- Indefinite pronoun masc. sing.
PP1SN Pp1-sn- Personal pron.,1st pers.sing.,
nomin.
================== ====================================
Tag Regular expression Definition
================== ====================================
PP2SN Pp2-sn- Personal pron.,2nd pers. sing., nomin.
PP3SN Pp3.sn- Personal pron.,3rd pers.sing., nomin.
PP1PN Pp1-sn- Personal pron.,1st pers. plur., nomin.
PP2PN Pp2-sn- Personal pron., 2nd pers. plur.,nomin.
PP3PN Pp3.sn- Personal pron., 3rd pers. plur.,nomin.
PPJ Pp...j- Personal pron., object
PPO Pp...o- Personal pron., oblique
PQFP P[rt]fp-- Interr. or relat. pronoun,fem. plur.
PQFS P[rt]fs-- Interr. or relat. pronoun, fem. plur.
PQMP P[rt]mp-- Interr. or relat. pronoun, masc. plur.
PQMS P[rt]ms-- Interr. or relat. pronoun, masc. sing.
PSFP Ps.fp.- Possessive pronoun, fem. plur.
PSFS Ps.fs.- Possessive pronoun, fem. plur.
PSMP Ps.mp.- Possessive pronoun, masc. plur.
PSMS Ps.ms.- Possessive pronoun, masc. sing.
================== ====================================
5.4.4.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
Ps1fs-s PSFS la_mienne [mienne is not a pronoun]
Ps1fs-p PSFS la_no^tre
Ps1fp-s PSFP les_miennes
Ps1fp-p PSFP les_no^tres
Ps1ms-s PSMS le_mien
Ps1ms-p PSMS le_no^tre
Ps1mp-s PSMP les_miens
Ps1mp-p PSMP les_no^tres
Ps2fs-s PSFS la_tienne
Ps2fs-p PSFS la_vo^tre
Ps2fp-s PSFP les_tiennes
Ps2mp-p PSMP les_vo^tres
Ps2ms-s PSMS le_tien
Ps2ms-p PSMS le_vo^tre
Ps2mp-s PSMP les_tiens
Ps2fp-p PSFP les_vo^tres
Ps3fs-s PSFS la_sienne
Ps3fs-p PSFS la_leur
Ps3fp-s PSFP les_siennes
Ps3fp-p PSFP les_leurs
Ps3ms-s PSMS le_sien
Ps3ms-p PSMS le_leur
Ps3mp-s PSMP les_siens
Ps3mp-p PSMP les_leurs
Pp1-sn- PP1SN je
Pp2-sn- PP2SN tu
Pp3msn- PP3SN il, on
Pp3fsn- PP3SN elle
Pp1-pn- PP1PN nous
Pp2-pn- PP2PN vous
Pp3mpn- PP3PN ils
Pp3fpn- PP3PN elles
Pp1-sj- PPJ me (-moi after imperative)
Pp2-sj- PPJ te (-toi after imperative)
Pp3msj- PPJ le, se, lui
Pp3fsj- PPJ la, se, lui
Pp3n-j- PPJ en, y
Pp1-pj- PPJ nous
Pp2-pj- PPJ vous
Pp3mpj- PPJ les, se, leur
Pp3fpj- PPJ les, se, leur
Pp1-so- PPO moi
Pp2-so- PPO toi
Pp3mso- PPO lui, soi
Pp3fso- PPO elle, soi
Pp1-po- PPO nous
Pp2-po- PPO vous
Pp3mpo- PPO eux, soi
Pp3fpo- PPO elles, soi
Pd-fp-- PDFP celles, celles-ci, celles-la'
Pd-fs-- PDFS celle, celle-ci, celle-la'
Pd-mp-- PDMP ceux, ceux-ci, ceux-la'
Pd-ms-- PDMS celui, celui-ci, celui-la'
Pd-n--- PDMS ce, ceci, cela, ca
Pi-fp-- PNFP quelques-unes, certaines...
Pi-fs-- PNFS aucune, nulle, certaine...
Pi-mp-- PNMP quelques-uns, certains...
Pi-ms-- PNMS aucun, nul, quelqu'un, certain...
Pr-fp-- PQFP lesquelles,desquelles,auxquelles,qui,
que, quoi, dont,
Pr-fs-- PQFS laquelle, qui, que, quoi, dont, ou^
Pr-mp-- PQMP lesquels, desquels, auxquels, qui, que,
quoi, dont, ou^
Pr-ms-- PQMS lequel, duquel, auquel, qui, que, quoi,
dont, ou^
Pt----- PQ?? quoi
Pt-fp-- PQFP lesquelles, desquelles, auxquelles, qui,
que
Pt-fs-- PQFS laquelle, qui, que
Pt-mp-- PQMP lesquels, desquels, auxquels, qui
Pt-ms-- PQMS lequel, duquel, auquel, qui, que
============================================================
5.4.5 Determiners (D)
----------------------
5.4.5.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type article le a
demonstrat. ce d
indefinite certain i
possessive mon s
interrog. quel t
------------ ----------- ----------- ----
Person first ma 1
second ta 2
third sa 3
------------ ----------- ----------- ----
Gender masculine le m
feminine la f
------------ ----------- ----------- ----
Number singular le s
plural les p
------------ ----------- ----------- ----
Case /// /// -
------------ ----------- ----------- ----
Possessor singular mon s
plural nos p
------------ ----------- ----------- ----
Quantif. definite le d
indefinite un i
============ =========== =========== ====
5.4.5.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
DFP D..fp--. Determiner, fem. plur.
DFS D..fs--. Determiner, fem. plur.
DMP D..mp--. Determiner, masc. plur.
DMS D..ms--. Determiner, masc. sing.
================== ====================================
5.4.5.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
Ds1fss-- DFS ma (tasse)
Ds1mss-- DMS mon (livre)
Ds1fps-- DFP mes (tasses)
Ds1mps-- DMP mes (livres)
Ds2fss-- DFS ta (tasse)
Ds2mss-- DMS ton (livre)
Ds2fps-- DFP tes (tasses)
Ds2mps-- DMP tes (livres)
Ds3fss-- DFS sa (tasse)
Ds3mss-- DMS son (livre)
Ds3fps-- DFP ses (tasses)
Ds3mps-- DMP ses (livres)
Ds1fsp-- DFS notre (tasse)
Ds1msp-- DMS notre (livre)
Ds1fpp-- DFP nos (tasses)
Ds1mpp-- DMP nos (livres)
Ds2fsp-- DFS votre (tasse)
Ds2msp-- DMS votre (livre)
Ds2fpp-- DFP vos (tasses)
Ds2mpp-- DMP vos (livres)
Ds3fsp-- DFS leur (tasse)
Ds3msp-- DMS leur (livre)
Ds3fpp-- DFP leurs (tasses)
Ds3mpp-- DMP leurs (livres)
Dd-fs--- DFS cette
Dd-ms--- DMS cet, ce
Dd-fp--- DFP ces
Dd-mp--- DMP ces
Di-fs--- DFS aucune, nulle, certaine, toute, chacune...
Di-ms--- DMS aucun, nul, certain, tout, chacun...
Di-fp--- DFP certaines, toutes...
Di-mp--- DMP certains, tous...
Dt-fs--- DFS quelle
Dt-ms--- DMS quel
Dt-fp--- DFP quelles
Dt-mp--- DMP quels
Da-fs--d DFS la
Da-ms--d DMS le
Da-fp--d DFP les
Da-mp--d DMP les
Da-fs--i DFS une
Da-ms--i DMS un
Da-fp--i DFP des
Da-mp--i DMP des
======= =============================================
5.4.6 Adverbs (R)
------------------
5.4.6.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type general fortement g
particle ne, pas p
------------ ----------- ----------- ----
Degree positive fortement p
comparative davantage c
negative ne, pas n
Note
We encode degree for compatibility with other languages, but as for
adjectives, the comparative feature is not very productive in French.
It applies only to "beaucoup" (comp.= "davantage"), "bien" (comp.=
"mieux"), "mal" (comp.= "pis") and "peu" (comp. = "moins"). The
comparative for other adverbs is marked by "plus" + adverb (e.g. "plus
fortement"). The superlative is usually marked by "le" + comparative
(e.g. le plus fortement).
5.4.6.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
R Rg.- General adverb
R-NE Rpn ne
R-PAS Rpn pas
================== ====================================
Note
It seems necessary in French to distinguish the two parts of the
negation ("ne ... pas"), because they play an important role in
disambiguation. However, this violates the applicative principle (the
categories in the corpus should be broader than the categories in the
lexicon). Here the same lexical category (Rpn) would split in two
corpus tags (R-NE, R-PAS). As a result, the regular expression Rpn
cannot be used to define the corpus tags unambiguously.
We could add an attribute "Lexical Class" to discriminate between the
two particles, as, if needed, for the distinction between the "e^tre"
and "avoir" auxiliaries (see note supra).
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Lex. class ne ne, n' n
pas pas, plus p
============ =========== =========== ====
However, the auxiliary distinction applied to many languages (English:
be/have; Italian: essere/avere, etc.), whereas the negation problem
seems specific to French. It seems therefore heavy to impose an
attribute "Lexical Class" for all languages in the Adverb category.
Another point of view would be to consider that, in fact, some lexical
subcategorization will be needed for one category or another in each
language, and add a "Lexical Class" attribute to all the
part-of-speech categories in a systematic way.
5.4.6.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
Rgp R beaucoup
Rgc R davantage
Rpn R-NE ne
Rpn R-PAS pas, plus
======= =============================================
5.4.7 Adpositions (S)
----------------------
5.4.7.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type preposition en, de p
============ =========== =========== ====
5.4.7.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
SP Sp Preposition
================== ====================================
5.4.7.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
Sp SP en
======= =============================================
5.4.8 Conjunctions (C)
-----------------------
5.4.8.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type coordinat. et c
subordinat. que s
============ =========== =========== ====
5.4.8.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
CC Cc Cooordinative conjunction
CS Cs Subordinative conjunction
================== ====================================
5.4.8.3 Combinations
======= =============================================
Lexique Corpus Example
====== =============================================
Cc CC mais, ou
Cs CS que
======= =============================================
5.4.9 Numerals (M)
-------------------
5.4.9.1 Lexicon
============ =========== =========== ====
Attribute Value Example Code
============ =========== =========== ====
Type cardinal deux c
------------ ----------- ----------- ----
Gender masculine un m
feminine une f
------------ ----------- ----------- ----
Number singular un s
plural deux p
------------ ----------- ----------- ----
Case /// /// -
============ =========== =========== ====
Note:
Ordinals are simple adjectives in French. They can never be
determiners (e.g. *premie're fois e'tait la bonne).
Traditionnal grammars usually distinguish un/article and un/numeral.
However, it is very difficult to find linguistic tests that enable to
discriminate between the two.
cf. J'ai vu un chat (article)
J'ai vu un chat et deux chiens (numeral?)
We will not keep this distinction in the corpus tags.
Only un/une have a gender. All other numerals are invariant in gender.
We will encode gender as not applicable rather than introduce a
systematic homography.
5.4.9.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
M Mc-s- Cardinal numeral, sing.
M Mc-p- Cardinal numeral, plur.
================== ====================================
5.4.9.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
Mcms- DMS un
Mcfs- DFS une
Mc-s- M zero
Mc-p- M deux, trois
======= =============================================
5.4.10 Interjection (I)
------------------------
5.4.10.1 Lexicon
========= ===========
Tag Example
========= ===========
I eh
========= ===========
5.4.10.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
I I Interjection
================== ====================================
5.4.10.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
I I eh
======= =============================================
5.4.11 Unique membership class (U)
-----------------------------------
None.
5.4.12 Residual (X)
--------------------
5.4.12.1 Lexicon
============ ==============
Tag c Example
============ ==============
X c symbols, etc.
============ ==============
5.4.12.2 Corpus
================== ====================================
Tag Regular expression Definition
================== ====================================
X X Residual
================== ====================================
5.4.12.3 Combinations
======= =============================================
Lexique Corpus Example
======= =============================================
X X symbols, etc.
======= =============================================
The application to English has been carried out by the MULTEXT Group
at ISSCO (ISSCO 1994).
Note that this application has been carried out on the basis of the
attributes and the values as presented in the preceding version of
this deliverable (MULTEXT WP1.6 A2 version).
Notation:
Trailing place-holders have been omitted.
It should be borne in mind that the need for place-holders is an artefact of the linear representation of lexical descriptions. A number of extensions have been made to the various proposals circulated. It is not clear how language-specific they are, but they represent phenomena that are plausibly relevant for various text-processing tasks.
5.5.1 Nouns (N)
---------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type Common c
Proper p
- -------------- -------------- -
2 Number Singular s
Plural p
- -------------- -------------- -
3 Gender Masculine m
Feminine f
Neuter n
= ============== ============== =
Notes:
Case is not relevant for English.
Gender is probably unnecessary for most purposes. We can assume it may
be of interest in constructions with pronouns. Many nouns are (or may
be) unmarked for number: fish, sheep, aircraft.
Examples:
Ncsn house
Ncpn houses
Npsn Thames
Nppn Alps
Ncpf women
Ncsm man
Nc=n sheep
5.5.2 Verbs (V)
---------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type Main v
Auxiliary a
Modal m
- -------------- -------------- -
2 Form Indicative i Form, not Mood
Imperative m
Subjunctive s
Base b base, not infinitive
Past Prt p Past Prt, not participle
Present Prt g Present Prt added (not gerund)
- -------------- -------------- -
3 Tense Present s
Past d
- -------------- -------------- -
4 Number Singular s
Plural p
- -------------- -------------- -
5 Person First 1
Second 2
Third 3
= ============== ============== =
Notes:
Voice is not lexical.
Attributes have been reordered to minimize sequence lengths (assuming
the proposal about trailing "-"s) - Tense only applies to Finite
verbs,
Number only to Present, and Person only to Singular. Modals have been
included as a distinct subcategory.
Finiteness as an attribute is redundant - predictable from verb-form
and past/present.
These tags do not attempt to represent distinctions found in the
various compound verb-forms. These are composed of a sequence of
auxiliary and non-finite verb as follows:
future will/shall + base
conditional would/should + base
passive be + past participle
perfect have + past participle
past perfect have-past + past participle
present continuous be + present participle
infinitive to + base
So there is no "aspect" attribute or "future" value, for example.
Examples:
go Vvm, Vvb, Vvs, Vvisp, Vviss1, Vviss2
goes Vviss3
going Vvg
gone Vvp
went Vvid
have Vab, Vas, Vaip, Vaiss1, Vaiss2
has Vaiss3
had Vap Vaid
having Vag
be Vab, Vam
am Vaiss1
are Vaiss2, Vaisp
is Vaiss3
was Vaids1, Vaids3
were Vaids2, Vaidp
been Vap
being Vag
will Vmi
would Vmi
5.5.3 Adjectives (A)
--------------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Degree Positive p
Comparative c
Superlative s
- -------------- -------------- -
2 Position Attributive a Position added
Predicative p
= ============== ============== =
Notes:
Gender, number and case irrelevant for English.
Attributive/predicative distinction reflects positional constraints -
some adjectives ('mere', 'utter', etc.) only appear in prenominal
position, while others ('awake', 'devoid', etc.) only appear in
predicative position. Most can appear in either.
Since many English comparatives & superlatives are formed with
more/most, "positive" cannot be interpreted as "neither comparative
nor superlative". See "Adverbs".
Examples:
big Ap
bigger Ac
biggest As
more peculiar Dscn+Ap
most remarkable Dssn+A
awake App
mere Apa
5.5.4 Pronoun (P)
-----------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Pron.-Type General g General added
Demonstrative d
Possessive s
Personal p
Reflexive x
- -------------- -------------- -
2 WH Not-WH n WH added
Relative r
Int q
- -------------- -------------- -
3 Number Singular s
Plural p
- -------------- -------------- -
4 Person First 1
Second 2
Third 3
- -------------- -------------- -
5 Gender Masculine m
Feminine f
Neuter n
- -------------- -------------- -
6 Case Nominative n
Accusative a
- -------------- -------------- -
7 Poss-Number Singular s
Plural p
- -------------- -------------- -
8 Poss-Person First 1 Poss-Person added
Second 2
Third 3
- -------------- -------------- -
9 Poss-Gender Masculine m Poss-Gender added
Feminine f
Neuter n
= ============== ============== =
Notes:
"General" pronouns are those which are not personal, possessive,
demonstrative or reflexive. The choice of these four categories is
based on distributional facts, though at a rather high level of
abstraction. They enter into anaphoric dependencies which are
signalled morphosyntactically and are therefore (in principle) more
amenable to automatic detection. Most general pronouns do not,
although they too sometimes encode number information.
"WH" attribute added to allow for combination of possessive and WH in
"whose".
Examples:
Pgn some, all, ...
Pgns each, something, nothing, everything, -body, ...
Pgnp both, ...
Pdns this, that
Pdnp these, those
Psn---s1 mine
Psn----2 yours
Psn---s3m his
Psn---s3f hers
Psn---s3n its
Psn---p1 ours
Psn---p3 theirs
Psq whose
Ppns1-n I
Ppn-2 you
Ppns3mn he
Ppns3fn she
Ppns3n it
Ppnp1-n we
Ppnp3-n they
Ppns1-a me
Ppns3ma him
Ppns3fa her
Ppnp1-a us
Ppnp3-a them
Prns1 myself
Prns2 yourself
Prns3m himself
Prns3f herself
Prns3n itself
Prnp1 ourselves
Prnp1 yourselves
Prnp1 themselves
Ppr which
Ppq which, what
5.5.5 Articles/Determiners (R)
------------------------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type Def-article t
Indef-article a
Demonstrative d
Possessive s
General g General added
- -------------- -------------- -
2 WH Not-WH n WH added
Relative r
Int/Excl q
- -------------- -------------- -
3 Number Singular s
Plural p
- -------------- -------------- -
4 Person First 1
Second 2
Third 3
- -------------- -------------- -
5 Gender Masculine m
Feminine f
Neuter n
- -------------- -------------- -
6 Poss-Number Singular s
Plural p
- -------------- -------------- -
7 Poss-Person First 1 Poss-Person added
Second 2
Third 3
- -------------- -------------- -
8 Poss-Gender Masculine m Poss-Gender added
Feminine f
Neuter n
= ============== ============== =
Notes:
Case not relevant to English.
Definite and indefinite articles represented as values of "Type".
All of these have been marked as 3rd person. This is redundant, since
determiners are all 3rd person, if anything.
Examples:
Rtn-3 the
Rans3 a/an
Rdns3-ns this, that
Rdnp3-np these, those
Rsn-3-s1 my
Rsn-3--2 your
Rsn-3-s3m his
Rsn-3-s3f her
Rsn-3-s3n its
Rsn-3-p1 our
Rsn-3-p3 their
Rsr-3 whose
Rsq-3 whose
Rgr-3 which
Rgq-3 which, what,
Rgns3 each, ...
Rgnp3 all, both, certain, many, ...
Rgn-3 some, ...
5.5.6 Adverbs (D)
-----------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Degree Positive p
Comparative c
Superlative s
- -------------- -------------- -
2 Function Specifier s Function added
Modifier m
- -------------- -------------- -
3 WH Yes q WH added
No n
= ============== ============== =
Notes:
No distinction has been made between different types of "modifier"
adverbs ("sentence-modifying", "VP-modifying", etc.), since their
distributions overlap considerably.
Examples:
Dbsn so, too, very, as
Dbsq how
Dcsn more
Dssn most
Dbmn quickly, soon, here, then, now
Dcmn better, worse
Dsmn best, worst
Dbmq where, when, how, why
5.5.7 Adpositions (S)
---------------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type Preposition e
Postposition o
= ============== ============== =
Notes:
Postpositions are rare in English.
"possessive" 's and ' might be considered postpositions, especially if
the alternative is to assign them to the unique membership class
(where by definition they would be unrelated).
Examples:
Se in, near, behind,...
So notwithstanding, ago
5.5.8 Conjunctions (C)
----------------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type Coordinating c
Subordinating s
- -------------- -------------- -
2 Comp-Type Infinitive i Comp-Type added
Finite f
- -------------- -------------- -
3 Coord-Position Initial i Coord-Position added
Non-initial n
= ============== ============== =
Notes:
Subordinating conjunctions are often identical to prepositions
(before, after, since, ...).
"Comp-Type" encodes information about the complement of subordinating
conjunctions. Present-participle complements have not been allowed for
here; they consist of bare VPs and are often treated as nominal
constructions, the words that introduce them (by, after, etc.) being
classified as prepositions.
"Coord-Position" encodes the distinction between elements of a
discontinuous coordination. "Initial" conjunctions are those that
appear before the first conjunct, and "Non-initial" conjunctions are
those that appear elsewhere. There is a dependency between initial and
non-initial conjunctions ('both...and' but not 'either...and') which
is not expressed in these attributes.
Examples:
Ccn and, or, but
Cci either, neither, both
Csi for
Csf that, because, ...
5.5.9 Numerals (M)
------------------
= ============== ============== =
P ATT VAL C
= ============== ============== =
1 Type Cardinal c
Ordinal o
= ============== ============== =
Notes:
"Function" depends on syntactic context.
These have not been subsumed under adjectives, pronouns, determiners,
etc. because the internal structure of complex numerals is
idiosyncratic.
Examples:
Mc six
Mo sixth
5.6.1 Multext morphology formalism
The main reason for having morphology is to facilitate maintenance of
lexical lists, both during the project and afterwards. It follows that
the rule formalism for morphology itself should not be a source of
complexity. The designers of the Multext morphology tool have
therefore decided to use fairly common, well-known methods, avoiding
any adventurous modernism.
The tool has two parts: morphosyntax and morphographemics.
For morphosyntax, a user-friendly version of context-free grammar is
used. The friendlyness comes with the possibility to annotate rules
with features, and to set features to the same values through
variables.
For morpho-graphemics, a version of two-level morphology is used.
The system can be used to generate a word list from an input word
list, as well as to look words up in a given word list.
The full description and detail of the rule formalism are given in the
report on the Multext morphology tool (Report nr. 2.3.1B) and the
manuals accompanying the tool. Extensive exemplification can be found
in the report on Multext morphology resources, Report nr. 5.3.1B.
5.6.2 Dutch word classes
The Dutch word classification used for Multext approximates as closely
as possible the proposal in Deliverable A, section 1.6.1.
It can be summarized best in terms of (the relevant selection from)
the types and attributes used in the actual Dutch description (Report
5.3.1B, section on Dutch).
There are 10 word class types:
V : Vtype Vform Person Number Tense
N : Ntype Semgender Gender Number
A : Inflected Degree
Adp : AdpType
Det : DetType Number Gender Defness
Pron : PronType Number Defness Person Semgender Case
Adv : Nil
Num : Nil
Conj : ConjType
Interj : Nil
where:
Vtype : Main Aux Copula Impersonal
Tense : Pres Past
Person : 1 2 3
Number : Sg Pl
Vform : Inf ImPart PerfPart Fin
Ntype : Common Proper
Semgender : M F N
Gender : De Het none
Degree : Pos Compar Super
Inflected : 0 1
AdpType : Post Pre
DetType : Article Quantificational Possessive Demonstrative
Defness : Def Indef
PronType : Reciprocal Reflexive Personal Relative Demonstrative
Quantificational Interrogative
Case : 1 4
ConjType : Coord Subord
Dutch is not a morphologically rich language. A distinction that it
makes which is different from most other Multext languages is that
between `syntactic' and `semantic' gender. A good example is `meisje'
(girl). The syntactic gender is `het' (`het meisje' *`de meisje') but
the semantic gender is female (`het meisje(i) dacht dat
ze(i)/*hij(i)/??het een jongetje was'). It can be seen in the example
that articles agree with their nouns in syntactic gender and that
pronouns (usually) agree with their antecedents in semantic gender.
For the rest, the distinctions given differ from other languages
mainly in what Dutch does not express.
The distinction between pronouns and determiners has been implemented
as follows:
for any X that could be either a det or a pron:
if X distributes like NP, then X is a pronoun;
if X distributes like Det (i.e. NP-initial), then X is a determiner
It is hoped that this is a good starting point for tagging but this
remains to be seen. As a consequence, a word like `mijn' which is
often called a pronoun is analysed as a determiner here.
Decisions like this one on function words can easily be changed; e.g.
in the lexicon supplied for Dutch (Report nr. 5.4.1B), 59 words are
classified as pronouns and 27 as determiners.
Bel N. and A. Aguilar (1994): ``Proposal for Morphosyntactic encoding:
Application to Spanish", Barcelona.
Calzolari N., Ceccotti M.L., Roventini A. (1983): ``Documentazione sui
tre nastri contenenti il DMI", ILC Technical Report, Pisa.
Leech J. and A. Wilson (1993): ``Invitation Draft", EAGLES
Document, Lancaster.
Leech J. and A. Wilson (1994): ``Morphosyntactic Annotation",
EAGLES Interim Report,
Pisa.
Calzolari N. and M. Monachini (1994): ``MULTEXT morphosyntactic
encoding: Application to Italian", Pisa.
Heylen D. (1994): ``Eagles Tagset for Dutch", Utrecht.
ISSCO (1994): ``MULTEXT Morphosyntactic Encoding: Application to
English", ISSCO, Geneva.
Lyons J.
(1981): Language and Linguistics,
Cambridge University Press.
Monachini M. and N. Calzolari (1994): ``Synopsis and Comparison of
Morphosyntactic Phenomena encoded in Lexicons and in Corpora and
Application to European Languages, EAGLES document
EAG-LSG-T4.6/CSG-T3.2, Pisa.
MULTEXT (1993): ``MULTEXT Technical Annex", Aix-en-Provence.
MULTEXT WP1.6 Report A2 (1994): ``Common Specifications and Notation
for Lexicon Encoding", MULTEXT Report Milestone A2.
Steiner P.and L. Lemnitzer (1994): ``An adaptation of the proposal for
morphosyntactic encoding in MULTEXT for German", Muenster.
Veronis J. (1994): ``Intermediary tagset: proposal for revision"
Aix-en-provence.
Veronis J., L. Khouri and C. Meunier (1994): ``Proposal for
morphosyntactic encoding in MULTEXT", Aix-en-Provence.
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -html_version 3.2 -dir ../related/msd-multext -local_icons -split 3 -toc_depth 2 -up_title 'Main MSD directory' -up_url ../ -t 'MULTEXT Morphosyntactic Specifications' msd-multext
The translation was initiated by Tomaz Erjavec on 2004-07-07