This document is a HTML 3.2 rendering of a Corpus Encoding Specification DTD document,
by Fred, using the ceshdr2html_tmap.fred translation map.
Note that this HTML translation does not contain all the information
from the original document.
Uses ISO 8859-1 (Latin-1) encoding.
CES header
Version: 4.1, Type: text, Language: en,
Creator: SIT, Status: update, Created: 1997-09-27, Updated: 1997-12-20
File Description
- Title Statement
- Title:
- CES: Nineteen Eighty-Four, Russian
- Responsibility
- Paul Sokolovsky, Sergey Sryvkin
(Proofreading, hyphenation deletion, formatting,
inserting paragraph and sub-paragraph level tagging.)
- Edition:
- MTE Final Release
- Extent:
- 76469 words, 2.2 mb
Note: WordCount represents the number of words in this text exclusive
of tags and header information, counted before markup process.
ByteCount reflects the approximate
size of the file containing the doctype and cesDoc element including
all text, tags and header information.
- Publication Statement
- Distributor:
- Severodonetsk Institute of Technology, East-Ukraine State University
- Address:
- Sovetsky st., bl. 3a, Severodonetsk, Lugansk reg., Ukraine
- Electronic address:
- : Paul.Sokolovsky@technologist.com
- Availiability:
-
Available for research purposes upon receipt of signed agreement
- Publication date:
- January 1st, 1998
- Source Description
- Full Bibliography
- Title Statement
- Title:
- Orwell's 1984, Russian: plaintext electronic edition
- Responsibility
- Maxim Moshkov's Library
(Made the electronic edition available on the Internet)
- Publication Statement
- Distributor:
- Maxim Moshkov's Library
- Address:
-
http://www.moshkow.orc.ru/koi
http://www.alkar.net/moshkow/html-KOI
- Availiability:
-
- Publication date:
- Unknown
- Source Description
- Structured Bibliography
- Monography
- Title:
- Nineteen Eighty Four (Russian)
- Author:
- George Orwell
- Author:
- Translator: V. Golyshev
- Imprint
- Publication date:
- Unknown
- Publisher:
- Unknown
- Place:
- Unknown
Encoding Description
- Project Description:
-
MULTEXT-East:
Multilingual Text Tools and Corpora for Central and
Eastern European Languages.
EU Copernicus Project COP106
This text is volunteer contribution to project.
- Editorial declaration:
- Conformance:
-
Corpus Encoding Standard, Version 4.0
- Correction:
-
- Quotation:
- No quotation marks are preserved in text.
Due to stipulations of russian written language, only doublequotes used
in rendition ("PRE ldquo POST rdquo")
- Segmentation:
-
Marked up to the paragraph level:
P, QUOTE, NOTE, plus marking of sub-paragraph element Q.
Some marking of particular sub-paragraph elements:
NAME, DATE, TIME, MENTIONED, FOREIGN, ABBR.
- Hyphenation:
-
No hyphenation marks are present in text.
- Tag declaration:
- abbr = 11
- All abbreviations are marked.
- body = 1
-
- date = 36
-
All dates which contain one or more digits (the characters 0-9) are
marked, including dates specifying day/month/year and dates consisting
only of a year. The attribute 'iso8601' is used consistently.
If there were two dates in one phrase, one consisting of digits and other
lexical, latter marked up too, e.g. "in 1944 and forty-five"
No attempt was made to identify or mark dates in other forms.
- div = 28
-
- foreign = 348
-
In some mteO-??.ces it was pointed that only hilited in typographic text words
were marked.
We, rather, markup newspeak words, if they by some reasons, mostly
morphological, cannot be correct for russian.
The one typical example is translation of 'telescreen'. Newspeak idea
was wonderfully, as we think, carried into it. Instead of just
literally translating "telescreen" into
"теле-экран", the translator
contracted (unusual phenomena for russian) 'е' & ';э'
resulting in "телекран". It sounds
even more awful, knowing that "кран" stem has no
semantic relation to original "экран". So, this
word can't be in plain russian - it's 100% newspeak-russian!
- head = 1
-
- hi = 10
-
This applies to rend attribute of other tags too. As our primary
source for markup was an electronic plaintext version, no
character-level typographical rendition except capitalization was
present in original.
Para-level included only line-breaking & centering. Though we look in
printed book in process, we decided not to put character-level
rendition, because book is different version, and because CES guides
that all rendition should be resolved in descreptive tags ;& no
rendition attrs which mere purpose is to recreate original view should
be left.
So, only CA ;& CE values of rend are used. As in others mteO-??, capitalized
text was decapitilized.
- item = 4
-
- l = 39
-
- list = 1
-
- mentioned = 216
-
CES and TEI give little vague criteria on what to mark mentioned. We
were trying to inherit occurances from Oen, though somewhere it may be
inconsistent.
- name = 2105
-
All names of people, places, organizations,
products, and events, are marked.
Person names in the genitive are not marked.
All names of countries and towns are marked with type=place.
Names of rivers and oceans are marked too with type=place.
Some other proper-nouns(groups) denoted places were marked, e.g. Golden Country ;& Chestnut Tree Caf;é
- note = 2
-
Strange, but electronic version had no notes, though in printed
reference they exist. We have reinserted them.
- num = 12
-
Numbers are marked only if corresponding one in english version is marked too
So, there only some occurences are marked.
- poem = 10
-
- ptr = 2
-
- q = 2160
-
The Q tag is used to mark slogans and quoted dialogue.
The attribute "broken=yes" is currently not inserted
when no sentence terminating punctuation
(either inside the Q itself or in the intervening text between two Qs)
appears between two dialogue fragments by the same speaker.
- quote = 30
-
QUOTE marks quotations from outside sources, including extensive
quotations from Winston's diary and Goldstein's treatise.
- ref = 1
-
- s = 0
-
S tags have not yet been insterted.
- text = 1
-
- title = 44
-
Revision Description
- Date: 27 Sep 1997 (Team)
- Started -- Automatically marked some names
- Date: 2 Oct 1997 (Team)
-
- Date: 6 Oct 1997 (Team)
- Hand-marking most of first chapter: quotations, names, etc.
- Date: 8 Oct 1997 (Sergey Sryvkin)
- Completed first chapter: quotations, names, etc.
- Date: 9 Oct 1997 (Sergey Sryvkin)
- Check and clean-up the first chapter.
- Date: 16 Nov 1997 (Paul Sokolovsky)
- tagusage, entities encoding
- Date: 19 Nov 1997 (Sergey Sryvkin)
- Completed second chapter.
- Date: 23 Nov 1997 (Sergey Sryvkin)
- Completed third and fourth chapter of Part 1.
Quotation, mentioning made without attributes.
Mark-uped some similar phrases in whole text.
- Date: 2 Dec 1997 (Sergey Sryvkin)
- Completed fifth,sixth and seventh chapter of Part 1.
Quotation, mentioning made without attributes.
Mark-uped almost all similar phrases in whole text.
- Date: 6 Dec 1997 (Sergey Sryvkin)
- Completed Part 1 and Part 2.
Chapter 1 of Part 3 has been completed too.
Quotation, mentioning made without attributes.
Marked-up all similar phrases in whole text.
- Date: 12 Dec 1997 (Sergey Sryvkin)
- Completed all text.
Quotation, mentioning made without attributes.
- Date: 15 Dec 1997 (Paul Sokolovsky)
- Proofreading. Correcting typos and tagging.
Part 1.
- Date: 16 Dec 1997 (Paul Sokolovsky)
- Proofreading. Correcting typos and tagging. Part 2.
Linking broken q's.
- Date: 16 Dec 1997 (Paul Sokolovsky)
- Proofreading completed.
- "..." changed to entity "hellip".
- Missing footnotes inserted.
- Date: 20 Dec 1997 (Tomaz Erjavec)
- Changed PUBDATE in PUBLICATIONSTMT
- Changed AVAILABILITY
- Changed SOURCEDESC
- Changed some other minor things in the header
- Inserted BYTECOUNT
- Inserted missing TAGUSAGEs
- Inserted OCCURS in all TAGUSAGEs
- Changed ID prefix 'ORWru' to 'Oru'
- Inserted IDs for P, POEM, LIST, ITEM, L
Meta-Made by et