next up previous contents
Next: Czech Up: Multilingual Parallel: Orwell's Previous: English

Bulgarian

  COP project 106 MULTEXT-East ``1984'', Bulgarian

Contributors: Institute of Mathematics, Bulgarian Academy of Sciences

Description of the Corpus

The Bulgarian version of ``1984'' corpus body was prepared (typed-in and tagged) at the Institute of Mathematics, BAS, specially for the purposes of the MULTEXT-East project; it is therefore acceptable to use the Bulgarian translation of ``1984'' for the purposes of the MULTEXT-East project.

Structure of the Corpus

The Bulgarian ``1984'' corpus body consists of three <div type=part>

and of one <div type=appendix> . Each part is further subdivided into a number of <div type=chapter> . In the Bulgarian version, the number of the chapters are given in the <div> .

The <div> elements have the n attribute, giving the successive number of the appropriate level of the <div> , and the id attribute, whose value has the prefix ORWbg and the chapter and section numbers separated by periods, e.g. <div type=chapter n=2 id=ORWbg.1.2> .

The text is segmented into paragraphs, with the <head> , <quote> , <note> , <poem> and <title> elements marked-up at the paragraph level.

Sub-paragraph tagging consists of <hi> , <q> , <abbr> , <date> , <foreign> , <l> , <num> , <ptr> and <name> . The text have been tagged handly and hand-validated; therefore all the tagged words are correct. All names are tagged. Some name tags contain the type attribute (person, org, place), in accordance with CES.

Rendering information, given as the CES conformant two-letter value of the rend attribute has been in most cases included with the appropriated tags.

The following is an example from the Bulgarian ``1984'' corpus; the characters are, in the original, encoded as SGML ISO Cyrillic entities (e.g. &ocy;); to make the following sample more readable, they were automatically transliterated into Latin (e.g. &ocy; o).

<p>
<q rend="centered CA" type=slogan>
<name type=person>Bog</name> ie vlast</q>
</p>
<p>
Priiemashie vsichko. Minaloto bie promienlivo. Minaloto nikoga nie ie bilo 
promienyano.<name 
type=place>Okieaniya</name>
vinagi ie voyuvala 
s <name type=place>Iztaziya</name>.
<name type=person>Dzhouns</name>, <name 
type=person>Aaronson</name> i
<name type=person>Rharddhardrford</name> 
byakha izvhardrshili priesthardplieniyata,
v koito gi obvinyavakha.

Markup Process

The printed edition of the Bulgarian translation of ``1984''was taken as the basis for the encoding. The text was typed in by Lydia Sinapova, Ludmila Dimitrova and Kiril Simov. It was proofread and marked up to CES1 conformance. In the process, a number of typographical errors were discovered in the printed edition of the Bulgarian translation of ``1984''.



next up previous contents
Next: Czech Up: Multilingual Parallel: Orwell's Previous: English



Tomaz Erjavec
Sat May 18 20:25:31 MDT 1996