TEI Header

§file description
§title statement
§title
id = mten-sl.title
Multext-East cesDoc corpus: Newspapers, Slovene
§statement of responsibility
§name Tomaž Erjavec, LST group, Dept. for Intelligent Systems Jozef Štefan Institute
§responsibility CES1 conformance.
§statement of responsibility
§name Miro Romih Amebis d.o.o
§responsibility Up-translation from diskette format, typographical error correction.
§statement of responsibility
§name Tomaž Erjavec
§responsibility Conversion to XML/TEI P5
§edition statement
§edition MULTEXT-East, Version 4
§extent
§measure
type = words
101749
§publication statement
§address http://nl.ijs.si/ME/V4/
§distributor Dept. of Knowledge Technologies, Jožef Stefan Institute
§address Jamova 39, Ljubljana, Slovenia
§address eAddress: tomaz.erjavec@ijs.si
§address eAddress: http://nl.ijs.si/ME
§date
when = 2010-05-09
2010-05-09
§source description
§fully-structured bibliographic citation
§title statement
§title Multext-East CES1: Newspapers, Slovene
§statement of responsibility
name Tomaž Erjavec, LST group, Dept. for Intelligent Systems Jozef Štefan Institute
responsibility CES1 conformance.
§statement of responsibility
name Miro Romih Amebis d.o.o
responsibility Up-translation from diskette format, typographical error correction.
§edition statement

MTE Final Release

§publication statement
§distributor Dept. of Knowledge Technologies, Jožef Stefan Institute
§address Jamova 39, Ljubljana, Slovenia
§address eAddress: tomaz.erjavec@ijs.si
§address eAddress: http://nl.ijs.si/ME
§availability

Available for research purposes upon receipt of signed agreement

§date
when = 1997-10-01
October 1, 1997
§source description
§fully-structured bibliographic citation
title statement
title Original digital form of the 'Dnevnik' articles: editor's diskettes with idiosyncratic markup
statement of responsibility
name The 'Dnevik' Daily
responsibility Collected the edited the texts from authors
publication statement
distributor The 'Dnevik' Daily
address Ljubljana, Slovenia
availability

date Unknown
source description
structured bibliographic citation
monographic level
title 45 articles from the 'Dnevnik' Daily
imprint
publisher Dnevik
date 8--10 1995
publication place Ljubljana, Slovenia
§encoding description
§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

§editorial practice declaration
§normalization

Corpus Encoding Standard, Version 4.2 CES LEVEL: 1

§correction principles

The OCR'ed text of the novel has been automtaically spell-checked.

§quotation
form = std

No rendition attribute values on Q 'Top level' Q are in '"', inner Qs in "'"

§hyphenation

All text semi-automatically dehyphenated; errors possible where the two parts of the word are both words

§segmentation

Each article proper is in a DIV type="article" The text of the article is in a DIV type="articletext" The sections of articletext, usu. with HEADER are in DIV type="articlepart" After articletext come Figures (DIV type="figure") and frames (DIV type="frame") Marked up to the level of paragraph plus marking of particular sub-paragraph elements: Q DATE: only for date of approx. article publication NAME: only where they were typographically marked in the original

§tagging declaration
§namespace
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = byline occurs = 54
byline
§tag usage
gi = date occurs = 45
date
§tag usage
gi = div occurs = 396
text division
§tag usage
gi = docAuthor occurs = 55
docAuthor
§tag usage
gi = figDesc occurs = 67
figDesc
§tag usage
gi = figure occurs = 67
figure
§tag usage
gi = head occurs = 379
heading
§tag usage
gi = name occurs = 83
name
§tag usage
gi = opener occurs = 45
opener
§tag usage
gi = p occurs = 1204
paragraph
§tag usage
gi = q occurs = 881
separated from the surrounding text with quotation marks
§tag usage
gi = text occurs = 1
text
§text-profile description
§creation 1996-05-07<date> The CES1 Slovene Newspaper corpus comes into being
§text classification
§category reference
target = news
§revision description
§change 1996-05-06<date>Amebis d.o.o.<name> Corrected spelling mistakes that could be caught with spelling checker; Up-translated to almost-CES
§change 1996-05-07<date>Tomaž Erjavec, IJS<name>Made header
§change 1996-05-07<date>Tomaž Erjavec, IJS<name>Glued the articles received from Amebis together
§change 1996-05-07<date>Tomaž Erjavec, IJS<name>Fixed some mistakes in Amebis encoding
§change 1996-08-08<date>Tomaž Erjavec, IJS<name>Revised header CES version numbers and made doctype PUBLIC
§change 1996-08-08<date>Tomaž Erjavec, IJS<name>Word segmentation shows some more typos, e.g. '0Nekateri', '4O'; corrected these silently.
§change 1996-08-08<date>Tomaž Erjavec, IJS<name>Converted from ISO-2 to SGML ents
§change 1996-10-30<date>Tomaž Erjavec, IJS<name>Found all Č and č were switched - corrected
§change 1996-10-30<date>Tomaž Erjavec, IJS<name>Revised header CES version and packed for IM3
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Normalisation of corpus component CESHEADER elements: CESHEADER, EDITIONSTMT, TITLESTMT/H.TITLE
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>ISO LANGUAGEs implemented as marked section PUBLIC ent
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Language (WSDs) implemented as PUBLIC entities
§change 1997-09-25<date>Tomaž Erjavec<name>Changed editionStmt, Extent, pubDate, Availability to final form
§change 2004-05-10<date>Tomaž Erjavec<name>Converted to TEI P4, prepared for MTE V3
§change 2010-05-09<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.