TEI Header

§file description
§title statement
§title Multext-East cesAna: Nineteen Eighty-Four, Romanian
§statement of responsibility
§name Dan Tufiş, RACAI
§responsibility Overall Responsibility
§statement of responsibility
§name Tomaž Erjavec, IJS
§responsibility Conversion to TEI
§edition statement
§edition MULTEXT-East, Version 4
§publication statement
§distributor Centre for Artificial Intelligence, NLP division, Romanian Academy
§address 13, 13 Septembrie Str.,
Bucharest 5, 74311
Romania
§address tufis@valhalla.racai.ro
http://nl.ijs.si/ME/
§availability

Available for research purposes upon receipt of signed agreement.

§date
when = 2010-05-09
2010-05-09
§source description
§fully-structured bibliographic citation
§title statement
§title Multext-East cesAna: Nineteen Eighty-Four, Romanian
§statement of responsibility
name Dan Tufiş
responsibility Overall Responsibility
§statement of responsibility
name Ana-Maria Barbu
responsibility Hand-tagging the whole book
§statement of responsibility
name Vasile Pătraşcu
responsibility Conversion to cesAna DTD
§edition statement
§edition MULTEXT-East Final Release
§publication statement
§distributor Center for Advanced Research in Machine Learning, Natural Language Processing and Conceptual Modelling
§address Casa Academiei,13,
13 Septembrie,
Bucharest 5, 74311
Romania
§address tufis@valhalla.racai.ro
http://nl.ijs.si/ME/
§availability

Available for research purposes upon receipt of signed agreement.

§date
when = 1998-01-01
January 1st, 1998
§source description
§structured bibliographic citation
monographic level
title O mie nouă sute optzeci şi patru
author George Orwell
author Translator: Mihnea Gafiţa
imprint
date 1991
publisher Editura Univers
publication place Bucharest
§encoding description
§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

Concede: Consortium for Central European Dictionary Encoding. EU Copernicus Project PL96-1142

The Concede project had the aim of developing a unified dictionary encoding schema and the experiments were done with lexical tokens extracted from Orwell's "1984" multilingual corpus developed within the MULTEXT-East project. The headword extraction considered various frequency intervals and considering all word categories (POS) so that different kinds of encoding problems be revealed. The MULTEXT-East corpus has been significantly improved for the purpose of CONCEDE project.

§tagging declaration
§namespace
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = text occurs = 1
text
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = div occurs = 29
text division
§tag usage
gi = p occurs = 1346
paragraph
§tag usage
gi = s occurs = 6520
s-unit
§tag usage
gi = w occurs = 101772
word
§tag usage
gi = c occurs = 16556
character
§tag usage
gi = head occurs = 4
heading
§tag usage
gi = back occurs = 1
back matter
§tag usage
gi = docAuthor occurs = 1
document author
§tag usage
gi = docDate occurs = 1
document date
§tag usage
gi = f occurs = 204
feature
§tag usage
gi = fLib occurs = 1
feature library
§tag usage
gi = fs occurs = 616
feature structure
§tag usage
gi = fvLib occurs = 1
feature-value library
§tag usage
gi = item occurs = 3
item
§tag usage
gi = label occurs = 3
label
§tag usage
gi = list occurs = 1
list
§tag usage
gi = ref occurs = 1
reference
§tag usage
gi = symbol occurs = 204
symbolic value
§revision description
§change 2004-07-01<date>Tomaž Erjavec<name>Corrected errors in tagging Rgp instad of R (3x)
§change 2004-05-10<date>Tomaž Erjavec<name>From BETA to FINAL
§change 2004-03-05<date>Tomaž Erjavec<name>Minor fixes.
§change 2003-02-11<date>Tomaž Erjavec<name>Conversion to TEI P4 XML
§change 2001-03-19<date>Tomaž Erjavec<name>Conversion to TEI, modified Header
§change 2010-05-09<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.