This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.

Note that this HTML translation does not contain all the information from the original document.

Uses ISO 8859-1 (Latin-1) encoding.

CES header

Version: 4.1, Type: text, Language: en,
Creator: SIT, Status: update, Created: 1997-09-27, Updated: 1997-12-20

File Description

Title Statement
CES: Nineteen Eighty-Four, Russian
Paul Sokolovsky, Sergey Sryvkin (Proofreading, hyphenation deletion, formatting, inserting paragraph and sub-paragraph level tagging.)
MTE Final Release
76469 words, 2.2 mb
Note: WordCount represents the number of words in this text exclusive of tags and header information, counted before markup process. ByteCount reflects the approximate size of the file containing the doctype and cesDoc element including all text, tags and header information.
Publication Statement
Severodonetsk Institute of Technology, East-Ukraine State University
Sovetsky st., bl. 3a, Severodonetsk, Lugansk reg., Ukraine
Electronic address:
Available for research purposes upon receipt of signed agreement
Publication date:
January 1st, 1998
Source Description
Full Bibliography
Title Statement
Orwell's 1984, Russian: plaintext electronic edition
Maxim Moshkov's Library (Made the electronic edition available on the Internet)
Publication Statement
Maxim Moshkov's Library
Publication date:
Source Description
Structured Bibliography
Nineteen Eighty Four (Russian)
George Orwell
Translator: V. Golyshev
Publication date:

Encoding Description

Project Description:
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106 This text is volunteer contribution to project.
Editorial declaration:
Corpus Encoding Standard, Version 4.0
No quotation marks are preserved in text. Due to stipulations of russian written language, only doublequotes used in rendition ("PRE ldquo POST rdquo")
Marked up to the paragraph level: P, QUOTE, NOTE, plus marking of sub-paragraph element Q. Some marking of particular sub-paragraph elements: NAME, DATE, TIME, MENTIONED, FOREIGN, ABBR.
No hyphenation marks are present in text.
Tag declaration:
abbr = 11
All abbreviations are marked.
body = 1
date = 36
All dates which contain one or more digits (the characters 0-9) are marked, including dates specifying day/month/year and dates consisting only of a year. The attribute 'iso8601' is used consistently. If there were two dates in one phrase, one consisting of digits and other lexical, latter marked up too, e.g. "in 1944 and forty-five" No attempt was made to identify or mark dates in other forms.
div = 28
foreign = 348
In some mteO-??.ces it was pointed that only hilited in typographic text words were marked. We, rather, markup newspeak words, if they by some reasons, mostly morphological, cannot be correct for russian. The one typical example is translation of 'telescreen'. Newspeak idea was wonderfully, as we think, carried into it. Instead of just literally translating "telescreen" into "теле-экран", the translator contracted (unusual phenomena for russian) 'е' & ';э' resulting in "телекран". It sounds even more awful, knowing that "кран" stem has no semantic relation to original "экран". So, this word can't be in plain russian - it's 100% newspeak-russian!
head = 1
hi = 10
This applies to rend attribute of other tags too. As our primary source for markup was an electronic plaintext version, no character-level typographical rendition except capitalization was present in original. Para-level included only line-breaking & centering. Though we look in printed book in process, we decided not to put character-level rendition, because book is different version, and because CES guides that all rendition should be resolved in descreptive tags ;& no rendition attrs which mere purpose is to recreate original view should be left. So, only CA ;& CE values of rend are used. As in others mteO-??, capitalized text was decapitilized.
item = 4
l = 39
list = 1
mentioned = 216
CES and TEI give little vague criteria on what to mark mentioned. We were trying to inherit occurances from Oen, though somewhere it may be inconsistent.
name = 2105
All names of people, places, organizations, products, and events, are marked. Person names in the genitive are not marked. All names of countries and towns are marked with type=place. Names of rivers and oceans are marked too with type=place. Some other proper-nouns(groups) denoted places were marked, e.g. Golden Country ;& Chestnut Tree Caf;é
note = 2
Strange, but electronic version had no notes, though in printed reference they exist. We have reinserted them.
num = 12
Numbers are marked only if corresponding one in english version is marked too So, there only some occurences are marked.
poem = 10
ptr = 2
q = 2160
The Q tag is used to mark slogans and quoted dialogue. The attribute "broken=yes" is currently not inserted when no sentence terminating punctuation (either inside the Q itself or in the intervening text between two Qs) appears between two dialogue fragments by the same speaker.
quote = 30
QUOTE marks quotations from outside sources, including extensive quotations from Winston's diary and Goldstein's treatise.
ref = 1
s = 0
S tags have not yet been insterted.
text = 1
title = 44

Revision Description

Date: 27 Sep 1997 (Team)
Date: 2 Oct 1997 (Team)
Date: 6 Oct 1997 (Team)
Date: 8 Oct 1997 (Sergey Sryvkin)
Date: 9 Oct 1997 (Sergey Sryvkin)
Date: 16 Nov 1997 (Paul Sokolovsky)
Date: 19 Nov 1997 (Sergey Sryvkin)
Date: 23 Nov 1997 (Sergey Sryvkin)
Date: 2 Dec 1997 (Sergey Sryvkin)
Date: 6 Dec 1997 (Sergey Sryvkin)
Date: 12 Dec 1997 (Sergey Sryvkin)
Date: 15 Dec 1997 (Paul Sokolovsky)
Date: 16 Dec 1997 (Paul Sokolovsky)
Date: 16 Dec 1997 (Paul Sokolovsky)
Date: 20 Dec 1997 (Tomaz Erjavec)

Meta-Made by et