Multext-East - Deliverable D1.2. Language-specific resources/Appendix 4 - May 96.

Appendix 4

ELU (partial) description of the Romanian dictionary encoding.

# Define morph

	<head sem pred>     =<form>
	<head sem voice>    =active/passive
	<head agr>          ==VAgr
	<head agr num>	    =singular/plural
	<head agr pers>	    =1/2/3
	<head agr gen>		=masculine/feminine
	<head tensed>		=no/yes
	<head prd>		    =no/yes
	<head type>		    =aux/main
	<bar>			    =0

type(X)<head type>      =X

rol(X)  <head sem nom> =X
function(X)  <head sem functional> =X


	<head agr num> = N
	<head agr pers> = P
	<head agr gen> = G

#paradigm verb_suf4
ar          n   {+a}{-b}    $nom_fem8
&abreve;r   n   {-a}{+b}    $nom_fem8

#paradigm verb1
-	!Verb !type(main)

# paradigm indic_mmcperf_1
asem                    v {+past} !my_Vagr(singular,1,_)
ase&scedil;i            v {+past} !my_Vagr(singular,2,_)
ase                     v {+past} !my_Vagr(singular,3,_)
aser&abreve;m           v {+past} !my_Vagr(plural,1,_)
aser&abreve;&tcedil;i   v {+past} !my_Vagr(plural,2,_)
aser&abreve;            v {+past} !my_Vagr(plural,3,_)

# Lexicon base

abroga * v/n/adj !pref(none)

# Lexicon vform

abrog       v   @abroga\base    !allcases   $verb1
abrog       n   @abroga\base    !rol(denom/patient) $verb_suf4

We run the word-form generator (part of mac-ELU) on this lexicon and got almost 1.300.000 wordforms.

Below is shown the information dispayed by the word-form generator for one inflected form:


bar = 0
cat = adj/n
form = merge
head : agr : (NomAgr)
             pers = 3
             num = plural
             gen = masculine
             case = dative/genitive/vocative
       encl = yes
       hum = person
       intensify = none
       pos = after/before
       prefix = none
       sem : nom = actor
             pred = merge
       type = common

As mac-ELU implementation of Romanian morphology covers not only inflectional morphology but the regular derivatives as well (see example above),the number of lemmas in mac-ELU is less than the one in MULTEXT.

The information generated as above was automatically translated into corresponding MULTEXT entries, getting rid of features not included in MULTEXT and changing the lemma form (in case of derivatives) to the inflectional lemma.

From the representation above, the translator generated the MULTEXT entries:

merg&abreve;re&tcedil;ilor	merg&abreve;re&tcedil;	Ncmpony
merg&abreve;re&tcedil;ilor	merg&abreve;re&tcedil;	Ncmpoyy
merg&abreve;re&tcedil;ilor	merg&abreve;re&tcedil;	Ncmpvny
merg&abreve;re&tcedil;ilor	merg&abreve;re&tcedil;	Ncmpvyy
merg&abreve;re&tcedil;ilor	merg&abreve;re&tcedil;	Afpmpony
merg&abreve;re&tcedil;ilor	merg&abreve;re&tcedil;	Afpmpyy
merg&abreve;re&tcedil;ilor	merg&abreve;re&tcedil;	Afpmpnvy
merg&abreve;re&tcedil;ilor	merg&abreve;re&tcedil;	Afpmpvyy

Due to a large redundancy generated by the attribute "clitic" on most of the grammar categories, we would like to modify our dictionary so that to explicitely use the "clitic" attribute only in case this attribute is responsible for a graphemic modification in the spelling of the wordforms. By using a "don't care" value in all the other cases ("-"), the number of corpus useful entries would be significantly reduced (by almost 35%).

The mac-ELU system is fully functional-equivalent to the SUN-OS ELU implementation

and both systems are jointly distributed by ISSCO-Geneve and RACAI-Bucharest.

| Top | Next | Table of contents | Multext-East | LPL/CNRS

Copyright © Centre National de la Recherche Scientifique, 1996.