COP project 106 MULTEXT-East ``1984'', Bulgarian
Contributors: Institute of Mathematics, Bulgarian Academy of Sciences
The Bulgarian version of ``1984'' corpus body was prepared (typed-in and tagged) at the Institute of Mathematics, BAS, specially for the purposes of the MULTEXT-East project; it is therefore acceptable to use the Bulgarian translation of ``1984'' for the purposes of the MULTEXT-East project.
The Bulgarian ``1984'' corpus body consists of three <div type=part>
and of one <div type=appendix> . Each part is further subdivided into a number of <div type=chapter> . In the Bulgarian version, the number of the chapters are given in the <div> .
The <div> elements have the n attribute, giving the successive number of the appropriate level of the <div> , and the id attribute, whose value has the prefix ORWbg and the chapter and section numbers separated by periods, e.g. <div type=chapter n=2 id=ORWbg.1.2> .
The text is segmented into paragraphs, with the <head> , <quote> , <note> , <poem> and <title> elements marked-up at the paragraph level.
Sub-paragraph tagging consists of <hi> , <q> , <abbr> , <date> , <foreign> , <l> , <num> , <ptr> and <name> . The text have been tagged handly and hand-validated; therefore all the tagged words are correct. All names are tagged. Some name tags contain the type attribute (person, org, place), in accordance with CES.
Rendering information, given as the CES conformant two-letter value of the rend attribute has been in most cases included with the appropriated tags.
The following is an example from the Bulgarian ``1984'' corpus; the characters are, in the original, encoded as SGML ISO Cyrillic entities (e.g. о); to make the following sample more readable, they were automatically transliterated into Latin (e.g. о o).
<p> <q rend="centered CA" type=slogan> <name type=person>Bog</name> ie vlast</q> </p> <p> Priiemashie vsichko. Minaloto bie promienlivo. Minaloto nikoga nie ie bilo promienyano.<name type=place>Okieaniya</name> vinagi ie voyuvala s <name type=place>Iztaziya</name>. <name type=person>Dzhouns</name>, <name type=person>Aaronson</name> i <name type=person>Rharddhardrford</name> byakha izvhardrshili priesthardplieniyata, v koito gi obvinyavakha.
The printed edition of the Bulgarian translation of ``1984''was taken as the basis for the encoding. The text was typed in by Lydia Sinapova, Ludmila Dimitrova and Kiril Simov. It was proofread and marked up to CES1 conformance. In the process, a number of typographical errors were discovered in the printed edition of the Bulgarian translation of ``1984''.