TEI Header

§file description
§title statement
§title Multext-East cesAna: Nineteen Eighty-Four, Polish
§statement of responsibility
§name Natalia Kotsyba
§responsibility Overall Responsibility
§responsibility Morphosyntactic specifications (the theoretical part and converting the MSD index to the format demanded by the specs), tag correspondence tables from the IPIC to the MTE format for the converter.
§statement of responsibility
§name Adam Radziszewski
§responsibility Preparing a list of tags and statistics of their usage from the IPIC for conversion (MSD index within the morphosyntactic specifications), conversion code, extracting the lexicon from the IPIC and recalculating statistics to fit the MTE tagset.
§statement of responsibility
§name Ivan Derzhanski
§responsibility Morphosyntactic specifications -- categories, values, attributes, editing notes
§statement of responsibility
§name Tomaž Erjavec, JSI
§responsibility MTE TEI P5 conversion and conformance.
§edition statement
§edition MULTEXT-East, Version 4
§publication statement
§distributor Institute for Interdisciplinary Studies, „Artes Liberales” Warsaw University, Warsaw, Poland
§address Krakowskie Przedmieście 26/28
00-046 Warszawa
Poland
§address natalia@ibi.uw.edu.pl
§availability

Freely available for non-commercial use provided that this Header is included in its entirety with any copy distributed.

§date
when = 2010-05-04
2010-05-04
§source description
§structured bibliographic citation
§monographic level
§title 1984
§author George Orwell
§author Translator: Tomasz Mirkowicz
§imprint
date 1993
publisher Da Capo
publication place Warsaw, Poland
§encoding description
§project description

EU Capacities Project GA 211938 "MondiLex"

§editorial practice declaration
§interpretation

The tagging of the text was performed with the help of the TaKIPI program (http://nlp.ipipan.waw.pl/TaKIPI/), specially developed for tagging Polish using the IPIC (IIS PAS Corpus: http://korpus.pl) tagset and based on the Morfeusz Morphosyntactic Analyzer for Polish (http://nlp.ipipan.waw.pl/~wolinski/morfeusz/). Afterwards the tag converter was used to recode it into MTE-style format. To conform with MTE’s major demands, the converter provides a more detailed description of some parts of speech, different PoS grouping and considerable differences in word segmentation principles. A detailed description of the correspondences between tags can be found at http://www.domeczek.pl/~natko/papers/MTE-pl_Ljub.pdf. The discussed conversion method has been implemented in the Python programming language; the code and the data are available online at http://domeczek.pl/~polukr/mte-conv/.

§tagging declaration
§namespace
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = text occurs = 1
text
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = div occurs = 27
text division
§tag usage
gi = p occurs = 1401
paragraph
§tag usage
gi = s occurs = 6666
s-unit
§tag usage
gi = w occurs = 79772
word
§tag usage
gi = c occurs = 17641
character
§tag usage
gi = back occurs = 1
back matter
§tag usage
gi = docAuthor occurs = 1
document author
§tag usage
gi = docDate occurs = 1
document date
§tag usage
gi = f occurs = 169
feature
§tag usage
gi = fLib occurs = 1
feature library
§tag usage
gi = fs occurs = 1324
feature structure
§tag usage
gi = fvLib occurs = 1
feature-value library
§tag usage
gi = head occurs = 2
heading
§tag usage
gi = item occurs = 3
item
§tag usage
gi = label occurs = 3
label
§tag usage
gi = list occurs = 1
list
§tag usage
gi = ref occurs = 1
reference
§tag usage
gi = symbol occurs = 169
symbolic value
§revision description
§change 2010-04-23<date>Natalia Kotsyba<name>, Tomaž Erjavec<name> Changes in a couple MSDs, Final for Version 4.
§change 2010-02-28<date>Natalia Kotsyba<name> Some orhtographic and annotation mistakes were corrected and numbers of parts and chapters introduced.
§change 2009-11-04<date>Tomaž Erjavec<name> Draft P5 version.
§change 2009-09-07<date>Natalia Kotsyba<name> Text of novel in TEI P4.