TEI Header

§file description
§title statement
id = mteo-sl.title
Multext-East cesDoc corpus: Nineteen Eighty-Four, Slovene
§statement of responsibility
§name Tomaž Erjavec
§responsibility Error correction and CES1 conformance.
§statement of responsibility
§name Olga Vuković
§responsibility Up-translation of ECI version to CES1 V2.0 conformance, using the printed edition as the reference proof-reading the text
§statement of responsibility
§name Greg Priest-Dorman
§responsibility Added tagging of sentences in paragraphs using MtSgml and Slovene resources.
§statement of responsibility
§name Tomaž Erjavec
§responsibility Conversion to XML/TEI P5
§edition statement
§edition MULTEXT-East, Version 4
§extent 91619<measure> WordCount represents the number of words in this text exclusive of tags and header information. ByteCount reflects the size of the file containing the doctype and cesDoc element including all text, tags and header information.
§publication statement
§address http://nl.ijs.si/ME/V4/
§distributor Dept. of Knowledge Technologies, Jožef Stefan Institute
§address Jamova 39, SI-1000 Ljubljana, Slovenia
§address eAddress: tomaz.erjavec@ijs.si
§address eAddress: http://nl.ijs.si/ME
when = 2010-05-09
§source description
§fully-structured bibliographic citation
§title statement
§title Multext-East CES1: Nineteen Eighty-Four, Slovene
§statement of responsibility
name Tomaž Erjavec
responsibility Error correction and CES1 conformance.
§statement of responsibility
name Olga Vuković
responsibility Up-translation of ECI version to CES1 V2.0 conformance, using the printed edition as the reference proof-reading the text
§statement of responsibility
name Greg Priest-Dorman
responsibility Added tagging of sentences in paragraphs using MtSgml and Slovene resources.
§edition statement

MTE Final Release

§publication statement
§distributor Dept. of Knowledge Technologies, Jožef Stefan Institute
§address Jamova 39, SI-1000 Ljubljana, Slovenia
§address eAddress: tomaz.erjavec@ijs.si
§address eAddress: http://nl.ijs.si/ME

Available for research purposes upon receipt of signed agreement

when = 1997-10-01
October 1, 1997
§source description
§fully-structured bibliographic citation
title statement
title The European Corpus Initiative Multilingual Corpus 1: 1984 by George Orwell (Slovene)
statement of responsibility
name Association for Computational Linguistics
responsibility Converted from OTA's DTD to ECI DTD
publication statement
distributor ACL
address ACL

Available for research purposes upon receipt of signed agreement

date 1994
source description
fully-structured bibliographic citation
title statement
title Orwell's 1984: electronic edition
statement of responsibility
name Oxford Text Archive
responsibility The four versions of Orwell's 1984 in the OTA were all prepared by the OUCS KDEM service in 1985 for Dr David C Bennett of the School of Oriental And African Studies at London University. The texts here have not been encoded or proofread in any way since they were produced (other than the English text, which was converted to an SGML like encoding by John Price-Wilkin, and subsequently automatically converted to conform to the OTA's dtd by myself and Alan Morrison. The other languages were converted to TEI conformant SGML by the ECI project 1993.) --LB, Nov 1992
edition statement

Public Domain TEI edition prepared at the Oxford Text Archive

publication statement
distributor Oxford Text Archive
address Oxford University Computing Service 13 Banbury Road Oxford OX2 6NN UK archive@ox.ac.uk

Freely available for non-commercial use provided that this header is included in its entirety with any copy distributed

date 19 Nov 1992
source description
structured bibliographic citation
monographic level
title 1984
author George Orwell
author Translator: Alenka Puhar
date 1983
publisher Knjižnica Kondor
publisher Mladinska knjiga
publication place Ljubljana
§encoding description
§project description

MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106

§editorial practice declaration

Corpus Encoding Standard, Version 4.3 CES LEVEL: 1

§correction principles

Typographical mistakes corrected

form = std

Rendition attribute values on HI, Q and QUOTE tags are adapted from ISOpub and ISOnum standard entity set names The 'default' rendition of Q (PRE mdash) has not been included in Q


All end-of-line hyphenation removed.


Marked up to the level of paragraph: P, QUOTE, LIST, POEM plus marking of particular sub-paragraph elements: NAME, Q Page breaks left in the document as comments


No end-of-line hyphenation present in the ECI original.

§tagging declaration
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = abbr occurs = 26
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = date occurs = 33
§tag usage
gi = div occurs = 28
text division
§tag usage
gi = foreign occurs = 7
§tag usage
gi = head occurs = 29
§tag usage
gi = hi occurs = 242
§tag usage
gi = item occurs = 4
§tag usage
gi = l occurs = 34
verse line
§tag usage
gi = list occurs = 1
§tag usage
gi = name occurs = 1327
§tag usage
gi = note occurs = 1
§tag usage
gi = p occurs = 1288
§tag usage
gi = lg occurs = 10
line group
§tag usage
gi = ptr occurs = 1
§tag usage
gi = q occurs = 2260
separated from the surrounding text with quotation marks
§tag usage
gi = quote occurs = 35
§tag usage
gi = s occurs = 6689
§tag usage
gi = text occurs = 1
§tag usage
gi = title occurs = 10
§text-profile description
§date 1996-04-18
§language usage
ident = ns-sl
Newspeak Slovene
§text classification
§category reference
target = orwl
§revision description
§change 1996-04-18<date>Tomaž Erjavec, IJS<name>Marked-up to CES1 compliance
§change 1996-05-02<date>Tomaž Erjavec, IJS<name> Corrected the header, to better corresponds to CES recommendations
§change 1996-05-02<date>Tomaž Erjavec, IJS<name>Fixed n and id values in DIVs
§change 1996-05-02<date>Tomaž Erjavec, IJS<name>Corrected some untagged and mis-tagged NAMEs
§change 1996-05-02<date>Tomaž Erjavec, IJS<name>Changed the rend values in accordance with new CES
§change 1996-07-17<date>Tomaž Erjavec, IJS<name>New CES1 English version received changing Slovene accordingly
§change 1996-07-17<date>Tomaž Erjavec, IJS<name>made header more similar to Eng one
§change 1996-07-17<date>Tomaž Erjavec, IJS<name>Part II, Chp 10 header fixed - is problematic, and Eng version doesn't have a chapter here, just an asterix
§change 1996-07-17<date>Tomaž Erjavec, IJS<name>Changed approp. Qs into TITLEs, moved rend from L to POEM
§change 1996-08-08<date>Tomaž Erjavec, IJS<name>Word segmentation of 1984 shows some more (segmentation) typos, e.g. 'nota,je', '0ceanija'; corrected these.
§change 1996-08-08<date>Tomaž Erjavec, IJS<name>Made all part and chapter HEADs of the same form
§change 1996-08-08<date>Tomaž Erjavec, IJS<name>As CES now supports nested Qs, de-commented those.
§change 1996-10-08<date>Tomaž Erjavec, IJS<name>Some more names were NAME tagged
§change 1996-10-08<date>Tomaž Erjavec, IJS<name>"sv." was inconsistently capitalised in the book, and hence in the corpus; this was unformly set to "Sv."
§change 1996-10-08<date>Tomaž Erjavec, IJS<name>"Sv." tagged as ABBR and left *inside* NAME (!?)
§change 1996-10-08<date>Tomaž Erjavec, IJS<name>Corrected "2+2=" into "2+2=5". Sounds bizarre.
§change 1996-10-30<date>Tomaž Erjavec, IJS<name>Prepared from IM3
§change 1996-12-07<date>Tomaž Erjavec, IJS<name>Two more typos in the book, first chapter corrected: "videti vse [poslopja] tri hkrati." (vsa); "da bi ga bilo moči takoj izbrisati." (moč); "židinja je sedelo" (sedela)".
§change 1996-12-22<date>Tomaž Erjavec, IJS<name>More typos corrected
§change 1997-02-06<date>Tomaž Erjavec, IJS<name>Changed all '...' to hellip entity
§change 1997-02-06<date>Tomaž Erjavec, IJS<name>Deleted HI REND="IT" where it contained only other elements and moved REND="IT" into these elements
§change 1997-02-18<date>Tomaž Erjavec, IJS, Tanja, Renata<name>Started work on structure aligning with English version of 17/01/1997; a number of P and QUOTE elements added or deleted. Thus we loose the original book information, but it can be argued that the translation was simply wrong where it did not reflect the structure of the English original.
§change 1997-02-18<date>Tomaž Erjavec, IJS, Tanja, Renata<name>Destroying the sancity of the translation! Alignment shows that the translation is missing P containing: "The old man brightened suddenly." This has been inserted as P: "Starec se je nenadoma razveselil."
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Normalisation of corpus component CESHEADER elements: CESHEADER, EDITIONSTMT, TITLESTMT/H.TITLE
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>ISO LANGUAGEs implemented as marked section PUBLIC ent
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Language (WSDs) implemented as PUBLIC entities
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Newspeak LANGUSAGE/LANGUAGE IDs now ns-xx for lang xx
§change 1997-03-20<date>Tomaz Erjavec, IJS<name>Now every QUOTE in 1984 has at least one P
§change 1997-04-02<date>Greg Priest-Dorman<name>inserted S tags in the locations given by MtSeg
§change 1997-04-02<date>Greg Priest-Dorman<name> inserted Q and HI tags where necessary as a result of S tag insertion
§change 1997-04-02<date>Greg Priest-Dorman<name>updated TAGUSAGE
§change 1997-04-02<date>Greg Priest-Dorman<name>changed "sl1984" to "Osl"
§change 1997-05-17<date>Tomaz Erjavec, IJS<name>S element harmonisation with English markup
§change 1997-05-17<date>Tomaz Erjavec, IJS<name>The "Svinja!"/Swine! was marked as three Q and three S in Slovene, only one in English
§change 1997-05-18<date>Tomaz Erjavec, IJS<name>P ID=Osl.2.11.33 for some reason not S segmented; corrected by hand.
§change 1997-05-18<date>Tomaz Erjavec, IJS<name>In P ID=Osl.3.5.9 the Qs did not terminated Ss. Inserted two S here
§change 1997-05-18<date>Tomaz Erjavec, IJS<name>Manually re-IDed affected Ps
§change 1997-05-19<date>Tomaz Erjavec, IJS<name>More missegmentation fixed
§change 1997-06-10<date>Tomaz Erjavec, IJS<name>Tag normalisation (no RE, do dbl SP in tags)
§change 1997-06-19<date>Tomaz Erjavec, IJS<name>Corrected some more spelling mistakes
§change 1997-06-19<date>Tomaz Erjavec, IJS<name>Changed all caps words into lower case, and marked them as rend=CA (mtlex does not find them otherwise)
§change 1997-06-19<date>Tomaz Erjavec, IJS<name>Changed hellip ent back into '...'; (mt tools cannot deal with hellip
§change 1997-06-19<date>Tomaz Erjavec, IJS<name>Changed mdash ent to '-' in prefixes in appendix: (pred-, po-, nad-, pod- in Osl.4.8.3)
§change 1997-06-23<date>Tomaz Erjavec, IJS<name>Corrected two more typos in 1st Chp
§change 1997-07-09<date>Tomaz Erjavec, IJS<name>Final typos corrected; lexicon now covers all wordforms in text.
§change 1997-08-06<date>Tomaz Erjavec, IJS<name>deleted LABEL markup in LIST Osl.
§change 1997-08-06<date>Tomaz Erjavec, IJS<name>updated TAGUSAGE (no LABEL), BYTECOUNT
§change 1997-09-08<date>Tomaz Erjavec, IJS<name>Due to a feature of mtlex, lexicon did not cover all word-forms; a few more typos found and corrected.
§change 1997-09-25<date>Tomaž Erjavec<name>Changed editionStmt, byteCount, pubDate, Availability to final form
§change 1997-11-28<date>Tomaž Erjavec<name>Hand tagging by ZRC reveals more typos. Corrected.
§change 2004-05-10<date>Tomaž Erjavec<name>Converted to TEI P4, prepared for MTE V3
§change 2010-05-09<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.