TEI Header

§file description
§title statement
§title
id = mteo-en.title
Multext-East cesDoc corpus: Nineteen Eighty-Four, English
§statement of responsibility
§name Nancy Ide
§responsibility Modified ECI tags of first chapter to conform to CES Added or modified some sub-paragraph level tagging.
§statement of responsibility
§name Tomaz Erjavec
§responsibility Modified full ECI Orwell to conform to CES V3.15
§statement of responsibility
§name Greg Priest-Dorman
§responsibility Modified Tomaz Erjavec's full Orwell to conform to CES V3.21 Checked and modified markup for correctness down to the paragraph level
§statement of responsibility
§name Greg Priest-Dorman
§responsibility Added tagging of sentences in paragraphs using MtSeg and english resources.
§statement of responsibility
§name Tomaž Erjavec
§responsibility Conversion to XML/TEI P5
§edition statement
§edition MULTEXT-East, Version 4
§extent 104302<measure> WordCount represents the number of words in this text exclusive of tags and header information. ByteCount reflects the approximate size of the file containing the doctype and cesDoc element including all text, tags and header information.
§publication statement
§address http://nl.ijs.si/ME/V4/
§distributor Vassar College Computer Science Department
§address 124 Raymond Avenue, Poughkeepsie, New York, USA 12604
§date
when = 2010-05-09
2010-05-09
§source description
§fully-structured bibliographic citation
§title statement
§title Multext-East CES1: Nineteen Eighty-Four, English
§statement of responsibility
name Nancy Ide
responsibility Modified ECI tags of first chapter to conform to CES Added or modified some sub-paragraph level tagging.
§statement of responsibility
name Tomaz Erjavec
responsibility Modified full ECI Orwell to conform to CES V3.15
§statement of responsibility
name Greg Priest-Dorman
responsibility Modified Tomaz Erjavec's full Orwell to conform to CES V3.21 Checked and modified markup for correctness down to the paragraph level
§statement of responsibility
name Greg Priest-Dorman
responsibility Added tagging of sentences in paragraphs using MtSeg and english resources.
§edition statement

MTE Final Release

§publication statement
§distributor Vassar College Computer Science Department
§address 124 Raymond Avenue, Poughkeepsie, New York, USA 12604
§availability

Available for research purposes upon receipt of signed agreement

§date
when = 1997-10-01
October 1st, 1997
§source description
§fully-structured bibliographic citation
title statement
title The European Corpus Initiative Multilingual Corpus 1: 1984 by George Orwell (English)
statement of responsibility
name Association for Computational Linguistics
responsibility Converted from OTA's DTD to ECI DTD
publication statement
distributor ACL
address ACL
availability

Available for research purposes upon receipt of signed agreement

date 1994
source description
fully-structured bibliographic citation
title statement
title Orwell's 1984: electronic edition
statement of responsibility
name Oxford Text Archive
responsibility The four versions of Orwell's 1984 in the OTA were all prepared by the OUCS KDEM service in 1985 for Dr David C Bennett of the School of Oriental And African Studies at London University. The texts here have not been encoded or proofread in any way since they were produced (other than the English text, which was converted to an SGML like encoding by John Price-Wilkin, and subsequently automatically converted to conform to the OTA's dtd by myself and Alan Morrison. The other languages were converted to TEI conformant SGML by the ECI project 1993.) ——LB, Nov 1992
edition statement

Public Domain TEI edition prepared at the Oxford Text Archive

publication statement
distributor Oxford Text Archive
address Oxford University Computing Service 13 Banbury Road Oxford OX2 6NN UK archive@ox.ac.uk
availability

Freely available for non-commercial use provided that this header is included in its entirety with any copy distributed

date 19 Nov 1992
source description
structured bibliographic citation
monographic level
title Nineteen Eighty Four
imprint
date 1949; reprinted 1961
publisher New American Library
publication place New York
§encoding description
§project description

This English version of Orwell's 1984 is encoded conformant to level 1 specifications of the Corpus Encoding Standard for the MULTEXT-EAST project. The English is to serve as the base for the parallel corpus, which will include aligned versions of the text in Romanian, Bulgarian, Estonian, Slovenian, Czech, and Hungarian.

§editorial practice declaration
§normalization

Corpus Encoding Standard, Version 2.0 CES LEVEL: 1

§quotation
form = nonstd

Rendition attribute values on Q, QUOTE, MENTIONED and TERM tags are adapted from ISOpub and ISOnum standard entity set names when used. If the rend attribute is ommited in the markup the rendition on the first set of Q, QUOTE, MENTIONED or TERM tags is "PRE lsquo POST rsquo" and the rendition on Q, MENTIONED or TERM tag nested in a Q or QUOTE tag is "PRE ldquo POST rdquo"

§segmentation

Marked up to the level of paragraph: P, QUOTE plus marking of sub-paragraph element Q. Some marking of particular sub-paragraph elements: NAME, DATE, ABBR, MENTIONED, DISTINCT, FOREIGN.

§hyphenation

No end-of-line hyphenation present in the ECI original.

§tagging declaration
§namespace
name = http://www.tei-c.org/ns/1.0
§tag usage
gi = abbr occurs = 38
abbreviation
§tag usage
gi = body occurs = 1
text body
§tag usage
gi = date occurs = 40
date
§tag usage
gi = distinct occurs = 1
distinct
§tag usage
gi = div occurs = 28
text division
§tag usage
gi = foreign occurs = 39
foreign
§tag usage
gi = head occurs = 1
heading
§tag usage
gi = hi occurs = 103
highlighted
§tag usage
gi = item occurs = 4
item
§tag usage
gi = l occurs = 32
verse line
§tag usage
gi = list occurs = 1
list
§tag usage
gi = mentioned occurs = 261
mentioned
§tag usage
gi = name occurs = 1744
name
§tag usage
gi = note occurs = 2
note
§tag usage
gi = num occurs = 52
number
§tag usage
gi = p occurs = 1286
paragraph
§tag usage
gi = lg occurs = 10
line group
§tag usage
gi = ptr occurs = 2
pointer
§tag usage
gi = q occurs = 2209
separated from the surrounding text with quotation marks
§tag usage
gi = quote occurs = 35
quotation
§tag usage
gi = s occurs = 6701
s-unit
§tag usage
gi = text occurs = 1
text
§tag usage
gi = title occurs = 46
title
§text-profile description
§language usage
§language
ident = ns
Newspeak
§language
ident = ns-jg
Newspeak official jargon
§language
ident = en-ck
British Cockney English
§text classification
§category reference
target = orwl
§revision description
§change 9/5/96<date>Tomaž Erjavec, IJS<name>Corrected the chapter 1 (esp header) to CES V2 conformance
§change 9/5/96<date>Tomaž Erjavec, IJS<name>with spelling cheker corrected a number of original OCR typos: I instead of l, rn instead of m
§change 9/5/96<date>Tomaž Erjavec, IJS<name>inserted Qs
§change 9/5/96<date>Tomaž Erjavec, IJS<name>inserted some missing apostrophes
§change 9/5/96<date>Tomaž Erjavec, IJS<name>changed '. . .' to '...', ' !' to '!', ' ?' to '?'
§change 9/5/96<date>Tomaž Erjavec, IJS<name>changed a number of GIs, as CES does not support ECI ones: EMPH to HI MENTION to MENTIONED and removed punctuation on single words therein GLOSS to TERM (best I could come up with, without loosing distinction)
§change 14/5/96<date>Tomaž Erjavec, IJS<name>Deleted apostrophes from chapter 2 and onwards
§change 14/5/96<date>Tomaž Erjavec, IJS<name>Changed some TERM into FOREIGN
§change 14/7/96<date>Greg Priest-Dorman<name>Changed dashes to entity mdash (not complete)
§change 14/7/96<date>Greg Priest-Dorman<name>Added additional q tags where appropriate
§change 14/7/96<date>Greg Priest-Dorman<name>Added quote tages
§change 14/7/96<date>Greg Priest-Dorman<name>Changed q tags to quote tags where appropriate
§change 14/7/96<date>Greg Priest-Dorman<name>All quotation marks repalced with markup
§change 14/7/96<date>Greg Priest-Dorman<name>Replaced q tags with mentioned tags where appropriate
§change 14/7/96<date>Greg Priest-Dorman<name>Standardized the markup of poems in the text
§change 14/7/96<date>Greg Priest-Dorman<name>Marked broken Q tags as such (linking of broken Q tags with next and prev attributes is not yet done)
§change 15/09/96<date>Greg Priest-Dorman<name>linked broken Q tags with "prev" and "next" attributes
§change 15/09/96<date>Greg Priest-Dorman<name>all occurrences of "..." and ". . ." have been replaced with the ISO_8879:1986 Publishing entity "hellip"
§change 15/09/96<date>Greg Priest-Dorman<name>changes of P and QUOTE tags since version .3 logged in file p.and.quote.changes, available on request
§change 15/09/96<date>Greg Priest-Dorman<name>names tagged with NAME as stated above in TAGUSAGE "gi=name"
§change 15/09/96<date>Greg Priest-Dorman<name>quoted text tagged as stated above in TAGUSAGE "gi=q" and TAGUSAGE "gi=quote"
§change 15/09/96<date>Greg Priest-Dorman<name>dates and numbers tagged as stated above in TAGUSAGE "gi=num" and TAGUSAGE "gi=date"
§change 15/09/96<date>Greg Priest-Dorman<name>abbreviations are tagged as stated above in TAGUSAGE "gi=abbr"
§change 15/09/96<date>Greg Priest-Dorman<name>OCR errors have been corrected when found, most noticeably, the "p" at the beginning of "Party" was usually incorrectly in lower case.
§change 15/09/96<date>Greg Priest-Dorman<name>"rend" if added has been checked against the 1949 Harcourt, Brace & World, Inc. edition of Nineteen Eighty-Four
§change 15/01/97<date>Greg Priest-Dorman<name>Changed IDs, PREV and NEXT attributes using "1984en" to "Oen"
§change 15/01/97<date>Greg Priest-Dorman<name>Fixed tagging error in Part 1 Chapter 4 QUOTE 2 (see mte1984-en.ces.V1.1.CHANGES) and reduced TAGUSAGE for P by 2
§change 15/01/97<date>Greg Priest-Dorman<name>fixed some typos in the header
§change 15/01/97<date>Greg Priest-Dorman<name>replaced any tab(^I) characters in the text (there was one)
§change 15/01/97<date>Greg Priest-Dorman<name>reformated the text for readability and consistency
§change 15/01/97<date>Greg Priest-Dorman<name>updated BYTECOUNT
§change 03/03/97<date>Greg Priest-Dorman<name>Corrected markup: marked broken Qs part 1 chapter 8 paragraph 3 (pointed out by O. Csaba).
§change 03/03/97<date>Greg Priest-Dorman<name>Corrected markup: Part 1 chapter 4, in the list of newspeak quotes from the times part of the last list item was not in the list, it is now (pointed out by T. Erjavec)
§change 03/03/97<date>Greg Priest-Dorman<name>corected punctuation error: Part 1 chapter 4, on two occasions the newspeak quote which ends "fullwise upsub antefiling" occurs. In the printed edition this is followed by a period, so I added the period.
§change 30/04/97<date>Greg Priest-Dorman<name>inserted S tags in the locations given by MtSeg
§change 30/04/97<date>Greg Priest-Dorman<name>inserted Q and HI tags where necessary as a result of S tag insertion
§change 12/05/97<date>Greg Priest-Dorman<name>Corrected several tagging errors pointed out by T. Erjavec and V. Petkevic
§change 12/05/97<date>Greg Priest-Dorman<name>modifed header to comply with T. Erjavec's header style
§change 12/05/97<date>Greg Priest-Dorman<name>updated TAGUSAGE
§change 12/05/97<date>Greg Priest-Dorman<name>removed blank lines
§change 14/05/97<date>Greg Priest-Dorman<name>added Ss to two newspeak paragraphs to aid in alignment
§change 14/05/97<date>Greg Priest-Dorman<name>updated TAGUSAGE
§change 19/05/97<date>Greg Priest-Dorman<name>Corrected several tagging errors pointed out by T. Erjavec
§change 19/05/97<date>Greg Priest-Dorman<name>Corrected several typos in the text pointed out by T. Erjavec and V. Petkevic
§change 19/05/97<date>Greg Priest-Dorman<name>updated TAGUSAGE
§change 20/06/97<date>Greg Priest-Dorman<name>Corrected several tagging errors pointed out by Vladimir Petkevic where a sentence boundry was inserted 2 characters ahead of where it should have been.
§change 1997-12-14<date>Tomaz Erjavec<name>Corrected several errors in the header
§change 2004-05-10<date>Tomaž Erjavec<name>Converted to TEI P4, prepared for MTE V3
§change 2010-05-09<date>Tomaž Erjavec<name>Conversion to MULTEXT-East TEI P5.