TEI Header

§file description
§title statement
§title
id = mteo-ro.title
Multext-East cesDoc corpus: Nineteen Eighty-Four, Romanian
§statement of responsibility
§name Dan Tufiş Center for Artificial Intelligence NLP division Romanian Academy
§responsibility Overal editorship.
§statement of responsibility
§name Ştefan Bruda Center for Artificial Intelligence NLP division Romanian Academy
§responsibility Error correction and CES1 conformance.
§statement of responsibility
§name Greg Priest-Dorman
§responsibility Added tagging of sentences in paragraphs using MtSgml and Romanian resources.
§statement of responsibility
§name Tomaž Erjavec
§responsibility Conversion to XML/TEI P5
§edition statement
§edition MULTEXT-East, Version 4
§extent 118093<measure> wordCount computed considering clitics as distinct words and several words making a compound just one word. This count was computed on the segmented document with word mark-up. If the counting ignores clitics and compounds the wordCount would be 98074; the sequence that provided this count is the following: sed -e '1,/<\/ces[Hh]eader>/d' < ces-file | sed -e 's/<[^<].*>//g' | sed -e 's/<.*$//g' |sed -e 's/^.*>//g' | wc -w bytecount - disk space occupied by the full sgml text
§publication statement
§address http://nl.ijs.si/ME/V4/
§distributor Romanian Academy, Centre for Artificial Intelligence
§address 13, 13 Septembrie Str., Bucharest, Romania
§address eAddress: tufis@valhalla.racai.ro
§date
when = 2010-05-09
2010-05-09
§source description
§fully-structured bibliographic citation
§title statement
§title Multext-East CES1: Nineteen Eighty-Four, Romanian
§statement of responsibility
name Dan Tufiş Center for Artificial Intelligence NLP division Romanian Academy
responsibility Overal editorship.
§statement of responsibility
name Ştefan Bruda Center for Artificial Intelligence NLP division Romanian Academy
responsibility Error correction and CES1 conformance.
§statement of responsibility
name Greg Priest-Dorman
responsibility Added tagging of sentences in paragraphs using MtSgml and Romanian resources.
§edition statement

MTE Final Release

§publication statement
§distributor Romanian Academy, Centre for Artificial Intelligence
§address 13, 13 Septembrie Str., Bucharest, Romania
§address eAddress: tufis@valhalla.racai.ro
§availability

Available for research purposes upon receipt of signed agreement

§date
when = 1997-10-01
October 1, 1997
§source description
§structured bibliographic citation
monographic level
title O mie nouă sute optzeci şi patru
author George Orwell
imprint
date 1991
publisher Editura Univers
publication place Bucharest
§encoding description
§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

§editorial practice declaration
§normalization

Corpus Encoding Standard, Version 4.0 CES LEVEL: 1

§segmentation

Marked up to the level of paragraph: P, QUOTE, LIST, POEM plus marking of particular sub-paragraph elements: HI, Q, FOREIGN, NAME

§tagging declaration
§namespace
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = name occurs = 2159
name
§tag usage
gi = title occurs = 1
title
§tag usage
gi = div occurs = 28
text division
§tag usage
gi = text occurs = 1
text
§tag usage
gi = foreign occurs = 429
foreign
§tag usage
gi = l occurs = 26
verse line
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = quote occurs = 23
quotation
§tag usage
gi = item occurs = 4
item
§tag usage
gi = p occurs = 1335
paragraph
§tag usage
gi = num occurs = 3
number
§tag usage
gi = lg occurs = 7
line group
§tag usage
gi = hi occurs = 413
highlighted
§tag usage
gi = q occurs = 2137
separated from the surrounding text with quotation marks
§tag usage
gi = head occurs = 28
heading
§tag usage
gi = s occurs = 6487
s-unit
§tag usage
gi = note occurs = 3
note
§tag usage
gi = abbr occurs = 3
abbreviation
§tag usage
gi = list occurs = 1
list
§tag usage
gi = date occurs = 7
date
§text-profile description
§creation
§date 1996-05-06
§language usage
§language
ident = ns-ro
Nouvorbă
§text classification
§category reference
target = orwl
§revision description
§change 97-06-30<date>Dan Tufiş<name> Corrected several typos and added missing punctuation (mainly commas) The Bytecount and Wordcount were updated.
§change 97-06-23<date>Dan Tufiş<name> Deleted empty Ss and Qs; inserted missing Ss;
§change 97-06-19<date>Ştefan Bruda<name> Corrected some typos; eliminated the blanks before punctuation marks and between markup and words.
§change 97-05-16<date>Ştefan Bruda<name> Made some changes into the paragraph structure for a better alignment to the English version; added a new paragraph which was not translated from English; updated tagusage.
§change 97-04-3<date>Dan Tufiş<name> Eliminated spaces around punctuation, corrected some mark-up
§change 97-03-6<date>Dan Tufiş<name> Added some lines overlooked when keyboarded. Corrected some typos.
§change 97-02-18<date>Dan Tufiş<name> Corrected extent section of the header
§change 96-11-5<date>Ştefan Bruda<name> Corrected the header, so it better corresponds to CES recommendations
§change 96-11-5<date>Georgiana Rotariu<name> Added name tags
§change 96-5-6<date>Ştefan Bruda<name> Corrected the header, so it better corresponds to CES recommendations
§change 96-5-6<date>Ştefan Bruda<name> Added div tags
§change 95-12-10<date>Ştefan Bruda<name> Marked-up to CES1 compliance
§change 1997-04-02<date>Greg Priest-Dorman<name>inserted S tags in the locations given by MtSeg
§change 1997-04-02<date>Greg Priest-Dorman<name> inserted Q and HI tags where necessary as a result of S tag insertion
§change 1997-04-02<date>Greg Priest-Dorman<name>updated and sorted TAGUSAGE
§change 1997-04-11<date>Ştefan Bruda<name>added "dummy" Ps inside QUOTES for aligning purposes; such paragraphs has the value "DUMMY" for rend attribute.
§change 1997-04-11<date>Ştefan Bruda<name>updated TAGUSAGE
§change 1997-04-13<date>Greg Priest-Dorman<name>segmented newly added Ps with MtSeg
§change 1997-04-13<date>Greg Priest-Dorman<name>inserted S tags in the locations given by MtSeg
§change 1997-04-13<date>Greg Priest-Dorman<name>changed header to comply with Tomaz's header style
§change 1997-04-13<date>Greg Priest-Dorman<name>changed lang="latin" to lang="la"
§change 1997-04-13<date>Greg Priest-Dorman<name>removed rend="DUMMY" from Ps
§change 1997-04-13<date>Greg Priest-Dorman<name>removed QUOTE /QUOTE pairs and moved QUOTE rend to P where appropriate
§change 1997-04-13<date>Greg Priest-Dorman<name>updated TAGUSAGE
§change 1997-04-13<date>Greg Priest-Dorman<name>removed blank lines
§change 1997-09-25<date>Tomaž Erjavec<name>Changed editionStmt, byteCount, pubDate, Availability to final form
§change 97-10-03<date>Vasile Pătraşcu<name> Corrected several typos and added missing punctuation (mainly commas) The Tagusage, Bytecount and Wordcount were updated. Entities that were counted as words are those that were identified by the segmenter that is words, clitics, compounds (counted as one unit, irrespective of the number of constituents), punctuation, numbers.
§change 2004-05-10<date>Tomaž Erjavec<name>Converted to TEI P4, prepared for MTE V3
§change 2010-05-09<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.