This document is a HTML 3.2 rendering of a
Corpus Encoding Specification
DTD document, produced in the scope of the
MULTEXT-East
project, by
Fred.
Note that this HTML translation does not contain all the information from the cesHeader.
CES header
Creator: HJK
Created: 1995-10-18
Updated: 1997-09-25
File Description
- Title Statement
- Title:
- Multext-East CES1: Nineteen Eighty-Four, Estonian
- Responsibility
- Viire Villandi
(entered and validated the text)
Heili Orav
(added CES tags)
Heiki-Jaan Kaalep
(supervised the work)
Heiki-Jaan Kaalep
(modified the tags and header for version 4)
Leho Paldre
(modified the tags and header for version 4.1)
Greg Priest-Dorman
(
Added tagging of sentences in paragraphs using MtSgml and
Estonian resources.
)
Leho Paldre
(
Manually checked automatic tagging of sentences.
Corrected 48 typos.
)
Heiki-Jaan Kaalep
(
Corrected the final bytecount and wordcount.
)
- Edition:
- MTE Final Release
- Extent:
- 79334 words
1066273 bytes
Note:
WordCount represents the number of words in this
text exclusive of tags and header information.
ByteCount reflects the approximate size of the
file containing the doctype and cesDoc element
including all text, tags and header information.
- Publication Statement
- Distributor:
- TÜ arvutuslingvistika uurimisgrupp
- Address:
- Tiigi 78-232, Tartu, Estonia
- Electronic address:
- hkaalep@psych.ut.ee
- Availiability:
- Freely available
- Publication date:
- October 1, 1997
- Source Description
- Structured Bibliography
- Monography
- Title:
-
1984
- Author:
-
George Orwell
- Responsibility
-
Elias Treeman
(
Translator from English
)
Loomingu Raamatukogu 1990 nr. 48-51
- Imprint
- Publisher:
-
Perioodika
- Place:
-
Tallinn
- Publication date:
-
1990
Encoding Description
- Project Description:
-
MULTEXT-East:
Multilingual Text Tools and Corpora for Central and
Eastern European Languages.
EU Copernicus Project COP106
- Tag declaration:
- abbr = 73
- body = 1
- date = 18
- div = 28
- foreign = 93
- head = 5
- hi = 183
- item = 4
- l = 32
- list = 1
- mentioned = 44
- name = 2457
- note = 2
- num = 14
- p = 1289
- poem = 10
- ptr = 2
- q = 2192
Q tags with a attribute of "type=MI" have been inserted
automatically.
- quote = 35
- s = 6658
S tags have been inserted automatically and then cleaned up
by hand in the locations (character offsets) provided by
MTSeg version 1.3.1 using the Estonian resource files.
- text = 1
- title = 29
Revision Description
- Date:
10/31/96
Heiki-Jaan Kaalep, UT
-
Changed the header to conform to the new CES version
- Date:
19/02/97
Leho Paldre, UT
-
Identified broken Qs, removed redundant HEADs,
checked MENTIONED tags; updated the header
- Date: 1997-03-20
Tomaz Erjavec, IJS
- Normalisation of corpus component CESHEADER elements:
CESHEADER, EDITIONSTMT, TITLESTMT/H.TITLE
- ISO LANGUAGEs implemented as marked section PUBLIC ent
- Language (WSDs) implemented as PUBLIC entities
- Newspeak LANGUSAGE/LANGUAGE IDs now ns-xx for lang xx
- Now every QUOTE in 1984 has at least one P
- Date: 1997-04-04
Greg Priest-Dorman
- inserted S tags in the locations given by MtSeg
-
inserted Q and HI tags where necessary as a result of
S tag insertion
- updated TAGUSAGE
- Date: 1997-06-18
Leho Paldre, UT
- Checked S tagging manually; removed 2 redundant HEADs
-
Corrected 48 typoes manually; added 2.5 missing sentences.
- updated TAGUSAGE
- Date: 1997-08-06
Tomaž Erjavec
- Removed empty S Oet.2.8.1.3, Oet.2.10.7.10.1
and empty Q Oet.1.9.58.7.1, Oet.4.15.4.1
- normalised some HI, FOREIGN so that RE does not
occur in tag
- updated TAGUSAGE for Q and S, BYTECOUNT
- Date: 1997-08-18
Heiki-Jaan Kaalep
- Removed 2 S which contained random keystrokes
and nothing more
- normalised one HI so that RE does not
occur in tag
- updated TAGUSAGE for S, BYTECOUNT
- Date: 1997-09-25
Tomaž Erjavec
- Changed editionStmt, byteCount, pubDate
to final form