creator: | et | |
---|---|---|
status: | update | |
date: | 1996-10-31 (created) | 2004-05-10 (updated) |
Available for research purposes upon receipt of signed agreement.
See individual component source descriptions.
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. <http://nl.ijs.si/ME/>
Encoded to Corpus Encoding Standard Level 1 (CES1)
See individual component headers.
All quoation marks coverted to q, original rendering in rend attribute. At times marked with other attributes (who, type). q sometimes occurs within s - TEI extended to accommodate. See also individual component headers.
Marked up to the level of paragraph: p, quote plus marking of sub-paragraph element q. Some marking of particular sub-paragraph elements, e.g. name, date, abbr, mentioned, distinct, foreign. See also individual component headers.
No end-of-line hyphenation present in texts.
The two-letter language codes follow ISO 639.
creator: | NMI | |
---|---|---|
status: | update | |
date: | 1995-05-10 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
Available for research purposes upon receipt of signed agreement
Public Domain TEI edition prepared at the Oxford Text Archive
Freely available for non-commercial use provided that this header is included in its entirety with any copy distributed
This English version of Orwell's 1984 is encoded conformant to level 1 specifications of the Corpus Encoding Standard for the MULTEXT-EAST project. The English is to serve as the base for the parallel corpus, which will include aligned versions of the text in Romanian, Bulgarian, Estonian, Slovenian, Czech, and Hungarian.
Corpus Encoding Standard, Version 2.0 CES LEVEL: 1
Rendition attribute values on Q, QUOTE, MENTIONED and TERM tags are adapted from ISOpub and ISOnum standard entity set names when used. If the rend attribute is ommited in the markup the rendition on the first set of Q, QUOTE, MENTIONED or TERM tags is "PRE lsquo POST rsquo" and the rendition on Q, MENTIONED or TERM tag nested in a Q or QUOTE tag is "PRE ldquo POST rdquo"
Marked up to the level of paragraph: P, QUOTE plus marking of sub-paragraph element Q. Some marking of particular sub-paragraph elements: NAME, DATE, ABBR, MENTIONED, DISTINCT, FOREIGN.
No end-of-line hyphenation present in the ECI original.
creator: | Ştefan Bruda | |
---|---|---|
status: | update | |
date: | 1995-12-10 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
Marked up to the level of paragraph: P, QUOTE, LIST, POEM plus marking of particular sub-paragraph elements: HI, Q, FOREIGN, NAME
creator: | ET | |
---|---|---|
status: | update | |
date: | 1996-04-18 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
Available for research purposes upon receipt of signed agreement
Public Domain TEI edition prepared at the Oxford Text Archive
Freely available for non-commercial use provided that this header is included in its entirety with any copy distributed
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
Typographical mistakes corrected
Rendition attribute values on HI, Q and QUOTE tags are adapted from ISOpub and ISOnum standard entity set names The 'default' rendition of Q (PRE mdash) has not been included in Q
All end-of-line hyphenation removed.
Marked up to the level of paragraph: P, QUOTE, LIST, POEM plus marking of particular sub-paragraph elements: NAME, Q Page breaks left in the document as comments
No end-of-line hyphenation present in the ECI original.
creator: | VP | |
---|---|---|
status: | update | |
date: | 1996-04-20 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
Available for research purposes only
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
The OCR'ed text of the novel has been automatically spell-checked.
The text contains no hyphens
Two levels of DIV are used: the first one denotes the PARTS, the second one denotes the CHAPTERS within PARTS Marked up down the subparagraph level according to the CES canonical markup of the English version
creator: | LS | |
---|---|---|
status: | update | |
date: | 1996-06-05 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
No quotation marks are preserved in text. Rendition attribute values on Q and QUOTE tags are adapted from ISOpub and ISOnum standard entity set names Two rendition short-cuts are used, 'rend=mdash' stands for 'rend="PRE mdash POST mdash"' 'rend=dblq' stands for 'rend="PRE ldquo POST rdquo"' 'rend="PRE mdash" (or "PRE ldquo") is used when the quoted dialogue ends up with the paragraph (there is no other typographical distinction). 'rend="POST mdash" (or "POST rdquo") is used when there is no typographical distinction (except ordinary punctuation) for the beginning of the quoted dialogue. No default rendition is used.
Marked up to the level of paragraph: P, QUOTE, POEM, NOTE, plus marking of sub-paragraph element Q. Some marking of particular sub-paragraph elements: NAME, DATE, TIME, MENTIONED, FOREIGN, ABBR.
No end-of-line hyphenation present.
creator: | HJK | |
---|---|---|
status: | update | |
date: | 1995-10-18 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.1 CES LEVEL: 1
Marked up to the level of paragraph: P, QUOTE plus marking of sub-paragraph element Q, incl. broken Qs. Some marking of particular sub-paragraph elements. BODY, DIV, HEAD, ITEM, L, LIST, NOTE, P, POEM, PTR, QUOTE, TEXT are used so that to be in harmony with the English electronic version of 1984 for MULTEXT-EAST v. 4; the differences are due only to the differences between the English electronic and Estonian printed version. ABBR, DATE, FOREIGN, HI, MENTIONED, NAME, NUM, Q, TITLE are used sloppily.
No end-of-line hyphenation
creator: | CO | |
---|---|---|
status: | update | |
date: | 1996-04-22 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
The following errors in the Hungarian edition have been corrected: p. 18. l. 9. első számú elsőszámú p. 52. l. 14. nemlétező nem létező p. 85. l. 9. reménytvesztve reményt vesztve p. 128. l. 9. lovasszobor lovas szobor p. 148. l. 17. éleselméjűnek éles elméjűnek p. 212. l. 21. kell hogy kell, hogy p. 233. l. 26. lelkitusa lelki tusa p. 295. l. 34. kell hogy kell, hogy p. 127. l. 32. - p. 128. l. 3. A lány sietve befejezte az ebédet, és eltávozott. Winston még ott maradt, rágyújtott egy cigarettára. Többet nem beszéltek, s amennyire két, ugyanannál az asztalnál egymással szemközt ülő ember egyáltalán megteheti, nem is néztek egymásra. Többet nem beszéltek, s amennyire két, ugyanannál az asztalnál egymással szemközt ülő ember egyáltalán megteheti, nem is néztek egymásra. A lány sietve befejezte az ebédet, és eltávozott. Winston még ott maradt, rágyújtott egy cigarettára.
creator: | SIT | |
---|---|---|
status: | update | |
date: | 1997-09-27 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106 This text is volunteer contribution to project.
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
No quotation marks are preserved in text. Due to stipulations of russian written language, only doublequotes used in rendition ("PRE ldquo POST rdquo")
Marked up to the paragraph level: P, QUOTE, NOTE, plus marking of sub-paragraph element Q. Some marking of particular sub-paragraph elements: NAME, DATE, TIME, MENTIONED, FOREIGN, ABBR.
No hyphenation marks are present in text.
creator: | ET | |
---|---|---|
status: | update | |
date: | 1997-11-03 (created) | 2004-05-10 (updated) |
Version 3
TELRI Final Release
Available on receipt of signed agreement
TELRI
creator: | CK | |
---|---|---|
status: | update | |
date: | 1997-12-03 (created) | 2004-05-10 (updated) |
Version 3
TELRI Final Release
Available for research purposes upon receipt of signed agreement
Public Domain TEI edition prepared at the Oxford Text Archive
Freely available for non-commercial use provided that this header is included in its entirety with any copy distributed
TELRI
Corpus Encoding Specification, Version 4.3 CES LEVEL: 1
Typographical mistakes corrected while preparing the electronic edition, though not systematically.
Rendition attribute values on HI, Q and QUOTE tags are adapted from ISOpub and ISOnum standard entity set names The 'default' rendition of Q (PRE mdash) has not been included in Q
All end-of-line hyphenation removed.
Marked up to the level of paragraph: P, QUOTE, LIST, POEM plus marking of particular sub-paragraph elements: NAME, Q. Page breaks left in the document as comments.
End-of-line hyphenation present in the OTA digital original.
creator: | ET | |
---|---|---|
status: | update | |
date: | 1997-09-29 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
Marked up for DIV, P, S
creator: | ET | |
---|---|---|
status: | update | |
date: | 1997-09-29 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
Marked up for DIV, P
creator: | ET | |
---|---|---|
status: | update | |
date: | 1996-04-18 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
Marked up for DIV, P, S
creator: | ET | |
---|---|---|
status: | update | |
date: | 1997-09-29 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
Marked up for DIV, P, S
creator: | ET | |
---|---|---|
status: | update | |
date: | 1997-09-29 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
Marked up for DIV, P
creator: | ET | |
---|---|---|
status: | update | |
date: | 1997-09-29 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
Marked up for DIV, P, S
creator: | ET | |
---|---|---|
status: | update | |
date: | 1997-09-29 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
Marked up for DIV, P
creator: | SB | |
---|---|---|
status: | update | |
date: | 1996-01-15 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
Marked up to the level of paragraph: P, QUOTE, LIST, POEM plus marking of particular sub-paragraph elements: HI, Q, FOREIGN
creator: | ET | |
---|---|---|
status: | update | |
date: | 1996-04-18 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
The OCR'ed text of the novel has been automtaically spell-checked.
Rendition attribute values on Q and QUOTE tags are adapted from ISOpub and ISOnum standard entity set names Spoken passages are marked by Q even where there are no typographical marks to denote them.
All text semi-automatically dehyphenated; errors possible where the two parts of the word are both words
Two levels of DIV are used: the first denotes the chapters, the second divisions which are marked by spacing in the original text. DIV type=chapter is usually followed by a HEAD and OPENER. Marked up to the level of paragraph plus marking of particular sub-paragraph elements: Q, ABBR.
creator: | VP | |
---|---|---|
status: | update | |
date: | 1996-04-29 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
Electronic form available for non-profit purposes for: Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Czech Republic ÚTKL FFUK
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
The text of the novel has been automatically spell-checked.
The text contains no hyphens
Three levels of DIV are used: the one first denotes the chapters, the second one the composers and the third one the operas. Marked up down to the paragraph level and to some subparagraph level elements. Sentences are not marked up.
creator: | LD | |
---|---|---|
status: | update | |
date: | 1996-05-14 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
Available for internal use only by the publishing house
Available for internal use only by the publishers
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
Rendition attribute values on Q, QUOTE and MENTIONED tags are adapted from ISOpub and ISOnum standard entity set names when used. Two rendition short-cuts are used, 'rend=mdash' stands for 'rend="PRE mdash POST mdash"' 'rend=dblq' stands for 'rend="PRE ldquo POST rdquo"' 'rend="PRE mdash" (or "PRE ldquo") is used when the quoted dialogue ends up with the paragraph (there is no other typographical distinction). 'rend="POST mdash" (or "POST rdquo") is used when there is no typographical distinction (except ordinary punctuation) for the beginning of the quoted dialogue. No default rendition is used.
Marked up to the level of paragraph: P, QUOTE, POEM, plus marking of sub-paragraph element Q. Some marking of particular sub-paragraph elements: NAME, DATE, TIME, MENTIONED, FOREIGN, ABBR.
No end-of-line hyphenation present.
creator: | HJK | |
---|---|---|
status: | update | |
date: | 1995-10-18 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 2.0 CES LEVEL: 1
Up to the level of sentences
No end-of-line hyphenation
creator: | CO | |
---|---|---|
status: | update | |
date: | 1996-04-30 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
Available for research purposes upon agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
creator: | SB | |
---|---|---|
status: | update | |
date: | 1995-04-29 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
Marked up to the level of paragraph: P, QUOTE, LIST, POEM plus marking of particular sub-paragraph elements: Q, HI
creator: | ET | |
---|---|---|
status: | update | |
date: | 1996-05-07 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.2 CES LEVEL: 1
The OCR'ed text of the novel has been automtaically spell-checked.
No rendition attribute values on Q 'Top level' Q are in '"', inner Qs in "'"
All text semi-automatically dehyphenated; errors possible where the two parts of the word are both words
Each article proper is in a DIV type="article" The text of the article is in a DIV type="articletext" The sections of articletext, usu. with HEADER are in DIV type="articlepart" After articletext come Figures (DIV type="figure") and frames (DIV type="frame") Marked up to the level of paragraph plus marking of particular sub-paragraph elements: Q DATE: only for date of approx. article publication NAME: only where they were typographically marked in the original
creator: | VP | |
---|---|---|
status: | update | |
date: | 1996-05-10 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
Electronic form available for non-profit purposes It was made available for: Institute of Theoretical and Computational Linguistics, Faculty of Philosophy, Charles University, Czech Republic ÚTKL FFUK
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1
The texts contain no hyphens
One level of DIV is used: each DIV denotes a separate article. Marked up down to the paragraph level and to some subparagraph level elements. Sentences are not marked up.
creator: | LS | |
---|---|---|
status: | update | |
date: | 1996-05-14 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
The electronic texts are property of their authors and are not distributed
The electronic texts are property of their authors and are not distributed
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.3 CES LEVEL: 1
No quotation marks are preserved in text. Rendition attribute values on Q and QUOTE tags are adapted from ISOpub and ISOnum standard entity set names Two rendition short-cuts are used, 'rend=mdash' stands for 'rend="PRE mdash POST mdash"' 'rend=dblq' stands for 'rend="PRE ldquo POST rdquo"' 'rend="PRE mdash" (or "PRE ldquo") is used when the quoted dialogue ends up with the paragraph (there is no other typographical distinction). 'rend="POST mdash" (or "POST rdquo") is used when there is no typographical distinction (except ordinary punctuation) for the beginning of the quoted dialogue. No default rendition is used.
Marked up to the level of paragraph: P, SP, QUOTE, NOTE, CAPTION, LIST, FIGURE, plus marking of sub-paragraph element Q. Some marking of particular sub-paragraph elements: NAME, TITLE, DATE, TIME, MENTIONED, DISTINCT, FOREIGN, ABBR.
No end-of-line hyphenation present.
creator: | HJK | |
---|---|---|
status: | update | |
date: | 1995-10-18 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Freely available
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 2.0 CES LEVEL: 1
Up to the level of sentences
No end-of-line hyphenation
creator: | CO | |
---|---|---|
status: | update | |
date: | 1996-04-20 (created) | 2004-05-10 (updated) |
Version 3
MTE Final Release
Available for research purposes upon receipt of signed agreement
Available for research purposes upon agreement
MULTEXT-East: Multilingual Text Tools and Corpora for Central and Eastern European Languages. EU Copernicus Project COP106
Corpus Encoding Standard, Version 4.0 CES LEVEL: 1