COP project 106 MULTEXT-East Fiction, Romanian
Contributors: Dan Tufis and Stefan Bruda (RACAI), Lidia Diaconu, Calin Diaconu (ICI)
The contents of the Romanian MULTEXT-East fiction corpus is made of three novels ``Flacari sub cruce'', ``Obreja'' and ``Testament între înger si diavol'' by Mihai Radulescu. The novels were published by ''Ramida'' Publishing House (the first two novels) and ''PIKA'' Publishing House (the last one). The sources also included introductions by author or by H.H. Teodosie Snagoveanu.
The digital source used as the basis of encoding was provided by the ``Ramida'' Publishing House and ``PIKA'' Publishing House based on the author's written permission.
The Romanian site obtained a license agreement allowing the use of the three novels for the purposes of the MULTEXT-East project, signed by the author, Mihai Radulescu.
As computed by the Unix program wc over the Romanian Fiction corpus has 160405 words (39389, 39521 and 81495 for each novel).
The corpus body consists of some <div type=part> , each of which starts off with a <head> giving the chapter title. Each part may be divided by some <div type=chapter> , which also begin with a <head> .
The <div type=chapter> and <div type=part> elements have the n attribute, giving the chapter or part number, and the id attribute, whose value has the prefixes conformant with the novel, e.g. <div id="obreja.2.1" type="chapter"> .
The text is segmented into paragraphs, with the <head> , <quote> and <poem> elements marked-up at the paragraph level.
Sub-paragraph tagging consists of <hi> and <q> . Direct speech has been marked-up by <q> even where there is no typographical marking to that effect in the printed text.
Rendering information, given as the CES conformant two-letter value of the rend attribute has been included with the appropriated tags and, for mdash and capitalisation, retained in the tag content.
The tag usage for the three novels is the following:
<tagsdecl> <tagusage gi=body occurs=1></tagusage> <tagusage gi=div occurs=18></tagusage> <tagusage gi=head occurs=20></tagusage> <tagusage gi=hi occurs=110></tagusage> <tagusage gi=l occurs=47></tagusage> <tagusage gi=p occurs=817></tagusage> <tagusage gi=poem occurs=5></tagusage> <tagusage gi=q occurs=564></tagusage> <tagusage gi=text occurs=1></tagusage> </tagsdecl>
<tagsdecl> <tagusage gi=body occurs=1></tagusage> <tagusage gi=div occurs=17></tagusage> <tagusage gi=head occurs=32></tagusage> <tagusage gi=hi occurs=368></tagusage> <tagusage gi=l occurs=4></tagusage> <tagusage gi=p occurs=563></tagusage> <tagusage gi=poem occurs=1></tagusage> <tagusage gi=q occurs=821></tagusage> <tagusage gi=text occurs=1></tagusage> <tagusage gi=foreign occurs=1></tagusage> </tagsdecl>
<tagsdecl> <tagusage gi=body occurs=1></tagusage> <tagusage gi=div occurs=45></tagusage> <tagusage gi=head occurs=84></tagusage> <tagusage gi=hi occurs=338></tagusage> <tagusage gi=l occurs=171></tagusage> <tagusage gi=p occurs=1464></tagusage> <tagusage gi=poem occurs=12></tagusage> <tagusage gi=q occurs=808></tagusage> <tagusage gi=quote occurs=341></tagusage> <tagusage gi=text occurs=1></tagusage> </tagsdecl>
Example from the corpus:
<div id="obreja.1" type="part"> <head> AUZI-MĂ, DOAMNE! </head> <head> -Cuvânt către cititor- </head> <P> Una dintre întrebările sfâşietoare ce apar adesea pe buzele nefericiţilor care trec prin încercări mult prea grele pentru puterile lor este: <q rend=dblq> De ce m-a părăsit Dumnezeu? </q> Însoţită de strigăte nedumerite şi disperate, speranţa în intervenţia divină rămâne cu atât mai mare: <q rend=dblq> Unde eşti, Doamne?! </q> Sau, reluând cuvintele psalmistului: <q rend=dblq> Auzi-mă, Doamne! </q> </P> <P> Omul are nevoie permanentă de Părintele său. Îi este greu să înţeleagă întârzierea ajutorului dumnezeiesc lămurit, ori că el se manifestă nevăzut şi nerecunoscut întru întărirea puterilor dăruite celui îndurerat , pentru ca acesta să-şi poată răbda chinul. </P> <P> Una dintre misiunile Teologiei este să facă de înţeles Divinitatea. Alta - mult mai apropiată de orizonturile înguste ale făpturilor umane ce suntem şi de aceea mai importantă pentru biata noastră neputinţă -, altă misiune a Teologiei este să-i explice suferitorului de ce este lăsat de Dumnezeu să se chinuiască, să-i explice lui Iov de ce i s-a îngăduit satanei să-l pună la încercare. </P> <P> . . .
The original versions of the three novels, which were the basis for the encoding, were ASCII exported files from an uncommon text-editor (chiwriter) with a relatively transparent encoding: paragraph boundaries were marked, and a few useful formatting codes were included. Due to the lack of printed versions, no hilighting marking has been provided, except for the "rend=dblq" marking found in the text. Also, the text contained a number of typo errors, which were also in the printed version. We corrected these errors in the fiction corpus.
Due to the lack of printed versions, no hilighting marking has been provided, except for the "rend=dblq" marking found in the text. Also, the text contained a number of typo errors, which were also in the printed version. We corrected these errors in the fiction corpus.