TEI Header

§file description
§title statement
id = mten-bg.title
Multext-East cesDoc corpus: Newspapers, Bulgarian
§statement of responsibility
§name Lydia Sinapova
§responsibility Typing-in excerpts from Capital and Continent, Excerpting paragraph and some sub-paragraph level tagging to the electronic and typed-in texts.
§statement of responsibility
§name Lydia Sinapova
§responsibility Modified Newspaper corpus markup down to sub-paragraph level to conform to CES V4.0
§statement of responsibility
§name Tomaž Erjavec
§responsibility Conversion to XML/TEI P5
§edition statement
§edition MULTEXT-East, Version 4
§extent 96538<measure> WordCount represents the number of words in this text exclusive of tags and header information. Microsoft Word 6.0 was used to count words. ByteCount reflects the approximate size of the file containing the doctype and cesDoc element including all text, tags and header information. The size of the file with Cyrillics represented by SGML entities is approximately 5 times larger than the size of the originally tagged Cyrillic file.
§publication statement
§address http://nl.ijs.si/ME/V4/
§distributor Institue of Mathematics, Bulgarian Academy of Sciences
§address Acad G. Bonchev st. bl.8 1113 Sofia, Bulgaria
§address eAddress: mult@ling.math.acad.bg
when = 2010-05-09
§source description
§fully-structured bibliographic citation
§title statement
§title Multext-East CES1: Newspapers, Bulgarian
§statement of responsibility
name Lydia Sinapova
responsibility Typing-in excerpts from Capital and Continent, Excerpting paragraph and some sub-paragraph level tagging to the electronic and typed-in texts.
§statement of responsibility
name Lydia Sinapova
responsibility Modified Newspaper corpus markup down to sub-paragraph level to conform to CES V4.0
§edition statement

MTE Final Release

§publication statement
§distributor Institue of Mathematics, Bulgarian Academy of Sciences
§address Acad G. Bonchev st. bl.8 1113 Sofia, Bulgaria
§address eAddress: mult@ling.math.acad.bg

Available for research purposes upon receipt of signed agreement

when = 1997-10-01
October 1, 1997
§source description
§structured bibliographic citation
monographic level
title Capital (Bulgarian) April 29 - May 5, 1996
publisher AII OOD
date 1996-05-28
publication place Sofia, Bulgaria
monographic level
title Continent (Bulgarian) 1995, January 15
publisher Publishing House "MEGAPRESS" AD
date 1995-01-15
publication place Sofia, Bulgaria
§fully-structured bibliographic citation
title statement
title Selected articles from Pari Daily, in electronic form
statement of responsibility
name Tsvetan Petrov - vice editor
responsibility The electronic texts of the excerpts from "Pari" were prepared by the journalistst for internal usage only and kindly provided by Mr. Tsvetan Petrov for the MTE project in DOS Word 5 format. Not all of the actually published articles were included in the electronic files
publication statement
distributor PARI Daily
address 1000 Sofia, "Tsarigradsko shosse" blvd 47

The electronic texts are property of their authors and are not distributed

date May 02, May 03 1996
source description
structured bibliographic citation
monographic level
title Pari (Bulgarian) May 02, 1996
title Pari (Bulgarian) May 03, 1996
publisher "RUBICON" - Izdatelsko-targovski kompleks PARI OOD
date May 02, May 03 1996
publication place Sofia, Bulgaria
§fully-structured bibliographic citation
title statement
title Selected articles from Standart Daily, in electronic form
statement of responsibility
name Kiril Simov
responsibility The electronic texts of the excerpts from "Standart" were prepared by the journalistst for internal usage only. They were provided for the MTE project in DOS Word 5 format by Kiril Simov.
publication statement
distributor "Standart news" AD
address 1303 Sofia, Antim I, 53

The electronic texts are property of their authors and are not distributed

date February, May, 1995
source description
structured bibliographic citation
monographic level
title Standart Daily
publisher "Standart news" AD
date February, May 1995
publication place Sofia, Bulgaria
§encoding description
§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

§editorial practice declaration

Corpus Encoding Standard, Version 4.3 CES LEVEL: 1

§correction principles

form = std

No quotation marks are preserved in text. Rendition attribute values on Q and QUOTE tags are adapted from ISOpub and ISOnum standard entity set names Two rendition short-cuts are used, 'rend=mdash' stands for 'rend="PRE mdash POST mdash"' 'rend=dblq' stands for 'rend="PRE ldquo POST rdquo"' 'rend="PRE mdash" (or "PRE ldquo") is used when the quoted dialogue ends up with the paragraph (there is no other typographical distinction). 'rend="POST mdash" (or "POST rdquo") is used when there is no typographical distinction (except ordinary punctuation) for the beginning of the quoted dialogue. No default rendition is used.


Marked up to the level of paragraph: P, SP, QUOTE, NOTE, CAPTION, LIST, FIGURE, plus marking of sub-paragraph element Q. Some marking of particular sub-paragraph elements: NAME, TITLE, DATE, TIME, MENTIONED, DISTINCT, FOREIGN, ABBR.


No end-of-line hyphenation present.

§tagging declaration
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = abbr occurs = 1295
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = caption occurs = 143
§tag usage
gi = byline occurs = 302
§tag usage
gi = date occurs = 395
§tag usage
gi = dateline occurs = 29
§tag usage
gi = distinct occurs = 6
§tag usage
gi = div occurs = 560
text division
§tag usage
gi = docAuthor occurs = 105
§tag usage
gi = figDesc occurs = 48
§tag usage
gi = figure occurs = 48
§tag usage
gi = foreign occurs = 9
§tag usage
gi = head occurs = 500
§tag usage
gi = hi occurs = 376
§tag usage
gi = item occurs = 50
§tag usage
gi = label occurs = 2
§tag usage
gi = list occurs = 11
§tag usage
gi = measure occurs = 18
§tag usage
gi = mentioned occurs = 10
§tag usage
gi = name occurs = 4967
§tag usage
gi = note occurs = 11
§tag usage
gi = num occurs = 1555
§tag usage
gi = opener occurs = 29
§tag usage
gi = p occurs = 1440
§tag usage
gi = ptr occurs = 23
§tag usage
gi = q occurs = 228
separated from the surrounding text with quotation marks
§tag usage
gi = quote occurs = 1
§tag usage
gi = ref occurs = 7
§tag usage
gi = s occurs = 155
§tag usage
gi = sp occurs = 80
§tag usage
gi = speaker occurs = 6
§tag usage
gi = text occurs = 1
§tag usage
gi = term occurs = 4
§tag usage
gi = time occurs = 13
§tag usage
gi = title occurs = 254
§text-profile description
§date 1996-03-20
§text classification
§category reference
target = news
§revision description
§change 1996-10-25<date>Lydia Sinapova<name>Replaced Q tags with MENTIONED tags where appropriate
§change 1996-10-25<date>Lydia Sinapova<name>linked broken Q tags with "prev" and "next" attributes
§change 1996-10-25<date>Lydia Sinapova<name>Distinguished text within the mainstream to serve as an "in-between" title has been tagged with CAPTION if the text consists of whole sentences. Otherwise HI is used.
§change 1996-10-25<date>Lydia Sinapova<name>all occurrences of "... have been replaced with the ISO_8879:1986 Publishing entity "hellip"
§change 1996-10-25<date>Lydia Sinapova<name>all occurrences of "%" have been replaced with the ISO_8879:1986 Publishing entity "percnt"
§change 1996-10-25<date>Lydia Sinapova<name>all occurrences of paragraph character have been replaced with the ISO_8879:1986 Publishing entity "sect"
§change 1996-01-28<date>Lydia Sinapova<name>linked broken Q tags with "prev" and "next" attributes
§change 1996-01-28<date>Lydia Sinapova<name>Distinguished text within the mainstream to serve as an "in-between" title has been tagged with CAPTION. whereby broken sentences are linked with "prev" and "next"
§change 1996-01-28<date>Lydia Sinapova<name>inserting ID attribute to P tag in articles with sentences broken by CAPTION for linking purposes
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Normalisation of corpus component CESHEADER elements: CESHEADER, EDITIONSTMT, TITLESTMT/H.TITLE
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>ISO LANGUAGEs implemented as marked section PUBLIC ent
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Language (WSDs) implemented as PUBLIC entities
§change 1997-03-27<date>Tomaz Erjavec, IJS<name>Substituted IGCY entity with JCY (80 occurences)
§change 1997-09-25<date>Tomaž Erjavec<name>Changed editionStmt, byteCount, pubDate, Availability to final form
§change 2004-05-10<date>Tomaž Erjavec<name>Converted to TEI P4, prepared for MTE V3
§change 2010-05-09<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.