TEI Header

§file description
§title statement
§title
id = mteo-cs.title
Multext-East cesDoc corpus: Nineteen Eighty-Four, Czech
§statement of responsibility
§name Vladimír Petkevič
§responsibility Checked and modified markup for correctness down to the subparagraph level
§statement of responsibility
§name Greg Priest-Dorman
§responsibility Added tagging of sentences in paragraphs using MtSgml and Czech resources.
§statement of responsibility
§name Tomaž Erjavec
§responsibility Conversion to XML/TEI P5
§edition statement
§edition MULTEXT-East, Version 4
§extent
§measure
type = words
80317
§publication statement
§address http://nl.ijs.si/ME/V4/
§distributor Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Czech Republic (ÚTKL FFUK)
§address Celetná 13, Prague, Czech Republic
§address eAddress: vladimir.petkevic@ff.cuni.cz
§address eAddress: ucnk.ff.cuni.cz directory: pub/corpora/ME
§date
when = 2010-05-09
2010-05-09
§source description
§fully-structured bibliographic citation
§title statement
§title Multext-East CES1: Nineteen Eighty-Four, Czech
§statement of responsibility
name Vladimír Petkevič
responsibility Checked and modified markup for correctness down to the subparagraph level
§statement of responsibility
name Greg Priest-Dorman
responsibility Added tagging of sentences in paragraphs using MtSgml and Czech resources.
§edition statement

MTE Final Release

§publication statement
§distributor Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Czech Republic (ÚTKL FFUK)
§address Celetná 13, Prague, Czech Republic
§address eAddress: vladimir.petkevic@ff.cuni.cz
§address eAddress: ucnk.ff.cuni.cz directory: pub/corpora/ME
§availability

Available for research purposes upon receipt of signed agreement

§date
when = 1997-10-01
October 1, 1997
§source description
§fully-structured bibliographic citation
title statement
title Electronic form of 1984 by George Orwell in Czech, obtained via OCR
statement of responsibility
name Vladimír Petkevič Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Czech Republic (ÚTKL FFUK)
responsibility OCR'ed the novel
publication statement
distributor Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Czech Republic (ÚTKL FFUK)
address Celetná 13, Praha 1 Czech Republic
availability

Available for research purposes only

date
when = 1996-05-01
May 1, 1996
source description
structured bibliographic citation
monographic level
title 1984
author George Orwell
imprint
date 1991
publisher Naše vojsko
publication place Prague, Czech Republic
§encoding description
§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

§editorial practice declaration
§normalization

Corpus Encoding Standard, Version 4.0 CES LEVEL: 1

§correction principles

The OCR'ed text of the novel has been automatically spell-checked.

§hyphenation

The text contains no hyphens

§segmentation

Two levels of DIV are used: the first one denotes the PARTS, the second one denotes the CHAPTERS within PARTS Marked up down the subparagraph level according to the CES canonical markup of the English version

§tagging declaration
§namespace
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = abbr occurs = 23
abbreviation
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = date occurs = 39
date
§tag usage
gi = div occurs = 28
text division
§tag usage
gi = foreign occurs = 91
foreign
§tag usage
gi = head occurs = 1
heading
§tag usage
gi = hi occurs = 75
highlighted
§tag usage
gi = item occurs = 4
item
§tag usage
gi = l occurs = 33
verse line
§tag usage
gi = list occurs = 1
list
§tag usage
gi = mentioned occurs = 244
mentioned
§tag usage
gi = name occurs = 2181
name
§tag usage
gi = note occurs = 2
note
§tag usage
gi = num occurs = 48
number
§tag usage
gi = p occurs = 1285
paragraph
§tag usage
gi = lg occurs = 11
line group
§tag usage
gi = ptr occurs = 1
pointer
§tag usage
gi = q occurs = 2208
separated from the surrounding text with quotation marks
§tag usage
gi = quote occurs = 36
quotation
§tag usage
gi = s occurs = 6714
s-unit
§tag usage
gi = term occurs = 2
term
§tag usage
gi = text occurs = 1
text
§tag usage
gi = title occurs = 45
title
§text-profile description
§creation
§date 1996-04-18
§language usage
§language
ident = cs-cl
Czech colloquial
§language
ident = ns-cs
Newspeak Czech
§language
ident = ns-jg-cs
Newspeak official jargon Czech
§text classification
§category reference
target = orwl
§revision description
§change 1996-05-03<date>Vladimír Petkevič, ÚTKL FFUK<name> Corrected the header, so it better corresponds to CES recommendations
§change 1996-05-03<date>Vladimír Petkevič, ÚTKL FFUK<name>Fixed n, id values in DIVs
§change 1996-05-03<date>Vladimír Petkevič, ÚTKL FFUK<name>mdash entity is now used only for sentential punctuation
§change 1996-10-21<date>Vladimír Petkevič, ÚTKL FFUK<name> Marked up down the subparagraph level according to the CES canonical markup of the English version
§change 1996-10-21<date>Vladimír Petkevič, ÚTKL FFUK<name> Corrected the header so as to meet the requirements imposed by creating the corpus containing all corpus components as one SGML document
§change 1997-02-24<date>Vladimír Petkevič<name>Changed IDs, PREV and NEXT attributes previously using "1984cs" to "Ocs"
§change 1997-02-24<date>Vladimír Petkevič<name> Converted words and sentences in capital letters into the small letters
§change 1997-02-24<date>Vladimír Petkevič<name>Corrected broken quotes
§change 1997-02-24<date>Vladimír Petkevič<name>Erased some redundant rendition information
§change 1997-02-24<date>Vladimír Petkevič<name>Corrected and updated the corpus according to the changes specified in mte1984-en.ces.V1.1.CHANGES
§change 1997-02-24<date>Vladimír Petkevič<name>Ensured more text readability
§change 1997-02-24<date>Vladimír Petkevič<name>fixed some typos in the text
§change 1997-02-24<date>Vladimír Petkevič<name>updated BYTECOUNT and WORDCOUNT
§change 1997-03-20<date>Tomaž Erjavec, IJS<name>Normalisation of corpus component CESHEADER elements: CESHEADER, EDITIONSTMT, TITLESTMT/H.TITLE
§change 1997-03-20<date>Tomaž Erjavec, IJS<name>ISO LANGUAGEs implemented as marked section PUBLIC ent
§change 1997-03-20<date>Tomaž Erjavec, IJS<name>Language (WSDs) implemented as PUBLIC entities
§change 1997-03-20<date>Tomaž Erjavec, IJS<name>Newspeak LANGUSAGE/LANGUAGE IDs now ns-xx for lang xx
§change 1997-03-20<date>Tomaž Erjavec, IJS<name>Now every QUOTE in 1984 has at least one P
§change 1997-04-02<date>Greg Priest-Dorman<name>inserted S tags in the locations given by MtSeg
§change 1997-04-02<date>Greg Priest-Dorman<name> inserted Q and HI tags where necessary as a result of S tag insertion
§change 1997-04-02<date>Greg Priest-Dorman<name>updated TAGUSAGE
§change 1997-05-12<date>Vladimír Petkevič<name>corrected some minor errors caused by wrong MtSeg segmentation
§change 1997-05-12<date>Vladimír Petkevič<name>corrected some typos as revealed by segmentation
§change 1997-05-12<date>Vladimír Petkevič<name>adjusted some paragraphs to conform with the English canonical version for the sake of sentence alignment
§change 1997-05-12<date>Vladimír Petkevič<name>adjucted tagusage, wordcount and bytecount info
§change 1997-06-18<date>Vladimír Petkevič<name>corrected 2 typos
§change 1997-07-24<date>Vladimír Petkevič<name>added 2 words
§change 1997-09-25<date>Tomaž Erjavec<name>Changed editionStmt, Extent, pubDate, Availability to final form
§change 1997-09-26<date>Vladimír Petkevič<name>Corrected several typos
§change 1998-12-03<date>Vladimír Petkevič<name>Corrected 2 typos
§change 2004-05-10<date>Tomaž Erjavec<name>Converted to TEI P4, prepared for MTE V3
§change 2010-05-09<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.