TEI Headers for
The IJS-ELAN Slovene/English Parallel Corpus
Version 2.0



TEI header

type: corpus
id: ijs-elan.H
creator: ET
status: update
created: 1999-04-14
updated: 2002-04-01

File Description

Title Statement:
Naslov:
Slovenskoangleški vzporedni korpus IJS-ELAN
Title:
The IJS-ELAN Slovene/English Parallel Corpus
Responsibility Statement:
Tomaž Erjavec, Institut Jožef Stefan, tomaz.erjavec@ijs.si
Urednik
Editor
Responsibility Statement:
Peter Holozan, Amebis d.o.o <http://www.amebis.si/>
Leksikalne oznake
Lexical annotation
Responsibility Statement:
Špela Vintar, FF
Zagotovitev in poravnava: SPOR, ANX2, STRA, KMET, EKON, VADE, VINO
Acquisition and alignment: SPOR, ANX2, STRA, KMET, EKON, VADE, VINO
Responsibility Statement:
Roman Maurer, FMF
Prevod, zagotovitev in poravnava: LIGS, GNPO
Translation, acquisition and alignment: LIGS, GNPO
Responsibility Statement:
Andrej Skubic, FF
Zagotovitev in poravnava: KUCA, PARL, ECMR, EKOL
Acquisition and alignment: KUCA, PARL, ECMR, EKOL
Edition Statement:
Edition: Version 2.0
Extent: 59 MB; 1,092,012 words = 501,437 (sl) + 590,575 (en)
Publications Statement:
Distributor:
Naslov:
Odsek za inteligentne sisteme
Institut "Jožef Stefan"
Jamova 93
1000 Ljubljana
Address:
Dept. of Intelligent Systems,
Jozef Stefan Institute
Jamova 39
SI-1000 Ljubljana
Slovenia
Place of publication:
<http://nl.ijs.si/elan/>
Availiability:

This parallel aligned corpus is freely available, provided that the sources described in this Header or in the Headers of its TEI.2 text elements are acknowledged.

Ta vzporedni poravnani korpus je prosto dostopen, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi ali v glavah njegovih TEI.2 besedil.

Source Description:

This corpus is composed of 15 texts:

Bibliography list:
  1. <usta.H> Constitution of the Republic of Slovenia
    Ustava Republike Slovenije
    Extent: 20 kW
  2. <kuca.H> Speeches by the President of Slovenia, M. Kučan
    Govori predsednika RS, M. Kučana
    Extent: 69 kW
  3. <parl.H> Functioning of the National Assembly
    Delovanje Državnega zbora
    Extent: 20 kW
  4. <ecmr.H> Slovenian Economic Mirror; 13 issues, 98/99
    Ekonomsko ogledalo; 13 številk 98/99
    Extent: 239 kW
  5. <ekol.H> National Environmental Protection Programme
    Nacionalni program varstva okolja
    Extent: 70 kW
  6. <spor.H> Europe Agreement
    Evropski sporazum
    Extent: 34 kW
  7. <anx2.H> Europe Agreement - Annex II
    Evropski sporazum - Priloga II
    Extent: 25 kW
  8. <stra.H> Slovenia's Strategy for Integration into EU
    Strategija Slovenije za vključevanje v EU
    Extent: 89 kW
  9. <kmet.H> Slovenia's programme for accession to EU - agriculture
    Državni program za prilagajanje zakonodaje - kmetijstvo
    Extent: 29 kW
  10. <ekon.H> Slovenia's programme for accession to EU - economy
    Državni program za prilagajanje zakonodaje - gospodarstvo
    Extent: 23 kW
  11. <vade.H> Vademecum by Lek
    Vademecum Lekove domače lekarne
    Extent: 24 kW
  12. <vino.H> EC Council Regulation No 3290/94 - agriculture
    Uredba sveta ES št. 3290/94 - kmetijstvo
    Extent: 69 kW
  13. <ligs.H> Linux Installation and Getting Started
    Namestitev in začetek dela z Linuxom
    Extent: 173 kW
  14. <gnpo.H> GNU PO localisation
    GNU PO lokalizacije
    Extent: 13 kW
  15. <orwl.H> G. Orwell: Nineteen Eighty-Four
    G. Orwell: 1984
    Extent: 195 kW

Encoding Description

Project description:

This corpus is the (updated version of the) LJU1 site (IJS) contribution to the EU MLIS project ELAN: European Language Activity Network For more information see the IJS-ELAN homepage and the ELAN project homepage

Editorial declaration:
Normalisation:

All formatting removed from originals removed.

Only ASCII characters and SGML entities used: see the DTD for defined entities.

List bullets normalised to -, or left as *.

No line contains more than one start/end tag or more than one element; white space between tokens is preserved before RE.

Quotation:

Quotation marks converted to " or '

Start / end quote is indicated with the open / close values of the TYPE attribute of C.

Segmentation:

Segmentation into translation units and segments semi automatic with various tools

Tokenisation with MULTEXT mtseg, correcting the results with Perl & Emacs.

Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

Tags declaration:
group = 15
Element 'Group'. Attributes are LANG and ID.
text = 30
Element 'Text'. Attributes are LANG and ID.
body = 30
Element 'Body'.
p = 30
Element 'Paragraph'.
seg = 63800
Element 'Translation segment'. Attribute is ID.
s = 13386
Element 'Sentence'. Only in 'orwl' text! Attribute is ID (value identical to original MTE bundle.
w = 1092012
Element 'Word'. Attributes are TYPE (only "special" words), CTAG (English only), ANA, LEMMA (only known words).
c = 174040
Element 'Punctuation'. Attributes are TYPE (only "special" punctuation), CTAG.

Profile Description

Language use:
sl-en: Translation from Slovene to English
en-sl: Translation from English to Slovene
sl: Slovene
en: English
bg: Bulgarian
cs: Czech
et: Estonian
hr: Croatian
hu: Hungarian
ro: Romanian

Revision Description



TEI header

type: text
id: usta.H
creator: ET
status: update
created: 1999-01-28
updated: 2002-04-01

File Description

Title Statement:
Naslov:
Ustava Republike Slovenije
Title:
Constitution of the Republic of Slovenia
Responsibility Statement:
Peter Holozan, Amebis
Leksikalne oznake
Lexical annotation
Responsibility Statement:
Tomaž Erjavec, IJS
Poravnava, tokenizacija, tagiranje, pretvorba v TEI
Alignment, tokenisation, tagging, conversion to TEI
Edition Statement:
Edition: Version 2.0
Extent: 364 Kb, 20 kW
Publications Statement:
Distributer:
Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
Distributor:
Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
Place of publication:
<http://nl.ijs.si/elan/>
Availiability:

This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by the Constitutional Court of the Republic of Slovenia, <http://www.gov.si/us/>

To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila so v lasti ustavnega sodišča Republike Slovenije, <http://www.gov.si/us/>

Source Description:
Bibliography list:
  1. Ustava Republike Slovenije <http://www.gov.si/us/sus-usta.html> 17 julij 1997
    Publisher:
    Ustavno sodišče Republike Slovenije
  2. The Constitution of the Republic of Slovenia <http://www.gov.si/us/eus-usta.html> July 17 1997 Translators: Sherill O'Connor-Sraj, Garry Moore
    Publisher:
    Constitutional Court of the Republic of Slovenia

Encoding Description

Project description:

This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

Editorial declaration:
Normalisation:

Introductory and back matter removed; not suitable for alignment.

HTML elements removed

Segmentation:

The original HTML documents were converted to TEIlite, and then to the format for Vanilla aligner; the process looses all markup.

Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

Tags declaration:
text = 3
group = 1
body = 2
seg = 1630
w = 20265
c = 2163

Revision Description



TEI header

type: text
id: kuca.H
creator: ET
status: update
created: 1999-04-13
updated: 2002-04-01

File Description

Title Statement:
Naslov:
Govori predsednika RS, M. Kučana
Title:
Speeches by the President of Slovenia, M. Kučan
Responsibility Statement:
Andrej Skubic, FF
Poravnava, zagotovitev digitalnega originala
Alignment, provision of digital original
Responsibility Statement:
Peter Holozan, Amebis
Leksikalne oznake
Lexical annotation
Responsibility Statement:
Tomaž Erjavec, IJS
Tokenizacija, tagiranje, pretvorba v TEI
Tokenisation, tagging, conversion to TEI
Edition Statement:
Edition: Version 2.0
Extent: 1102 Kb, 69 kW
Publications Statement:
Distributer:
Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
Distributor:
Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
Place of publication:
<http://nl.ijs.si/elan/>
Availiability:

This parallel aligned text is freely available.

To vzporedno poravnano besedilo je prosto dostopno.

Source Description:
Bibliography list:
  1. Govori predsednika Republike Slovenije Milana Kučana 1990 - 1995
    Publisher:
    Urad predsednika Republike Slovenije
    <http://www.gov.si/upr/slo/govori.html>
  2. Speeches by the President of the Republic of Slovenia, Milan Kučan 1990 - 1995
    Publisher:
    The Office of the President of the Republic of Slovenia
    <http://www.gov.si/upr/ang/govori.html>

Encoding Description

Project description:

This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

Editorial declaration:
Normalisation:

HTML markup removed; Quotes normalised to ", list bullets to -.

Segmentation:

The digital original was converted, segmented and aligned with Atril and the alignments hand corrected.

Tokenisation performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

Tags declaration:
text = 3
group = 1
body = 2
seg = 3014
w = 68843
c = 7933

Revision Description



TEI header

type: text
id: parl.H
creator: ET
status: update
created: 1999-04-13
updated: 2002-04-01

File Description

Title Statement:
Naslov:
Delovanje Državnega zbora
Title:
Functioning of the National Assembly
Responsibility Statement:
Andrej Skubic, FF
Poravnava, zagotovitev digitalnega originala
Alignment, provision of digital original
Responsibility Statement:
Peter Holozan, Amebis
Leksikalne oznake
Lexical annotation
Responsibility Statement:
Tomaž Erjavec, IJS
Tokenizacija, tagiranje, pretvorba v TEI
Tokenisation, tagging, conversion to TEI
Edition Statement:
Edition: Version 2.0
Extent: 325 Kb, 20 kW
Publications Statement:
Distributer:
Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
Distributor:
Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
Place of publication:
<http://nl.ijs.si/elan/>
Availiability:

This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by The National Assembly of the Republic of Slovenia.

To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo Državnemu zboru Republike Slovenije.

Source Description:
Bibliography list:
  1. Delovanje Državnega zbora 07-Dec-98
    Publisher:
    Državni zbor Republike Slovenije
    Address:
    Šubičeva 4, 1000 Ljubljana, Slovenija
    <http://www.dz-rs.si/>
<http://www.dz-rs.si/si/zgodovina&delo/delovanje_desno.html>
  • Functioning of the National Assembly 11-Jan-99
    Publisher:
    The National Assembly of the Republic of Slovenia
    Address:
    Šubičeva 4, SI-1000 Ljubljana, Slovenija
    <http://www.dz-rs.si/>
  • <http://www.dz-rs.si/en/zgodovina&delo/delovanje_desno.html>

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    HTML markup removed; Quotes normalised to ", list bullets to -.

    Segmentation:

    The digital original was converted, segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 1006
    w = 19545
    c = 2031

    Revision Description



    TEI header

    type: text
    id: ecmr.H
    creator: ET
    status: update
    created: 1999-04-13
    updated: 2002-04-01

    File Description

    Title Statement:
    Naslov:
    Ekonomsko ogledalo; 13 številk 98/99
    Title:
    Slovenian Economic Mirror; 13 issues, 98/99
    Responsibility Statement:
    Andrej Skubic, FF
    Zagotovitev digitalnega originala, poravnava
    Provision of digital original, alignment
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 4056 Kb, 239 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by the Institute of Macroeconomic Analysis and Development of the Republic of Slovenia.

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo Uradu Republike Slovenije za makroekonomske analize in razvoj.

    Source Description:
    Bibliography list:
    1. Ekonomsko ogledalo, 1 - 11/98; 1,2/99 1998/1999
      Publisher:
      Urad Republike Slovenije za makroekonomske analize in razvoj
      <http://www.gov.si/zmar/arhiv/kazalo.html>
    2. Slovenian Economic Mirror, 1 - 11/98; 1,2/99 1998/1999
      Publisher:
      Institute of Macroeconomic Analysis and Development of the Republic of Slovenia
      <http://www.gov.si/zmar/arhiv/kazalo.html>

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    HTML tags removed; Quotes normalised to ", list bullets to -.

    Segmentation:

    The digital original was converter, segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources, and the reslults patched with Perl.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 10010
    w = 238864
    c = 36961

    Revision Description



    TEI header

    type: text
    id: ekol.H
    creator: ET
    status: update
    created: 1999-04-13
    updated: 2002-04-01

    File Description

    Title Statement:
    Naslov:
    Nacionalni program varstva okolja
    Title:
    National Environmental Protection Programme
    Responsibility Statement:
    Andrej Skubic, FF
    Poravnava, zagotovitev digitalnega originala
    Alignment, provision of digital original
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 1222 Kb, 70 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by Republic of Slovenia, Ministry of the Environment and Physical Planning.

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo Repuliki Slovenija, Ministrstvu za okolje in prostor, Uprava RS za varstvo narave.

    Source Description:
    Bibliography list:
    1. Nacionalni program varstva okolja 24-Feb-99
      Založnik:
      Republika Slovenija, Ministrstvo za okolje in prostor, Uprava RS za varstvo narave
      <http://www.gov.si/mop/vsebina/npvo.html>
    2. National Environmental Protection Programme 30-Mar-99
      Publisher:
      Republic of Slovenia, Ministry of the Environment and Physical Planning
      <http://www.gov.si/mop/vsebina/angl/okolje.html>

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    HTML markup removed; Quotes normalised to ", list bullets to -.

    Segmentation:

    The digital original was converted, segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 3482
    w = 70203
    c = 9201

    Revision Description



    TEI header

    type: text
    id: spor.H
    creator: ET
    status: update
    created: 1999-03-31
    updated: 2002-04-01

    File Description

    Title Statement:
    Naslov:
    Evropski sporazum
    Title:
    Europe Agreement
    Responsibility Statement:
    Jasna Belc, SVEZ
    Zagotovitev digitalnega originala
    Provision of digital original
    Responsibility Statement:
    Špela Vintar, FF
    Poravnava
    Alignment
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 589 Kb, 34 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs.

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve.

    Source Description:
    Bibliography list:
    1. Evropski sporazum o pridružitvi med republiko slovenijo na eni strani in evropskimi skupnostmi in njihovimi državami članicami, ki delujejo v okviru evropske unije na drugi strani 10. junij 1996 Luksemburg
    2. Europe Agreement Establishing an Association Between the European Communities and their Member States, Acting within the Framework of the European Union, of the One Part, and the Republic of Slovenia, of the Other Part June 10. 1996 Luxembourg

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.

    Segmentation:

    The digital original was segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 1912
    w = 33758
    c = 4210

    Revision Description



    TEI header

    type: text
    id: anx2.H
    creator: ET
    status: update
    created: 1999-03-31
    updated: 2002-04-01

    File Description

    Title Statement:
    Naslov:
    Evropski sporazum - Priloga II
    Title:
    Europe Agreement - Annex II
    Responsibility Statement:
    Jasna Belc, SVEZ
    Zagotovitev digitalnega originala
    Provision of digital original
    Responsibility Statement:
    Špela Vintar, FF
    Poravnava
    Alignment
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 483 Kb, 25 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve

    Source Description:
    Bibliography list:
    1. Evropski sporazum o pridružitvi med republiko slovenijo na eni strani in evropskimi skupnostmi in njihovimi državami članicami, ki delujejo v okviru evropske unije na drugi strani 10. junij 1996 Luksemburg
    2. Europe Agreement Establishing an Association Between the European Communities and their Member States, Acting within the Framework of the European Union, of the One Part, and the Republic of Slovenia, of the Other Part June 10. 1996 Luxembourg

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network. For more information see the IJS-ELAN homepage <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.

    This document was originally formatted as a table. For easier processing large segments of text containing nummerical data were omitted.

    Segmentation:

    The digital original was segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 2382
    w = 24523
    c = 4640

    Revision Description



    TEI header

    type: text
    id: stra.H
    creator: ET
    status: update
    created: 1999-02-15
    updated: 2002-04-01

    File Description

    Title Statement:
    Naslov:
    Strategija Slovenije za vključevanje v EU
    Title:
    Slovenia's Strategy for Integration into EU
    Responsibility Statement:
    Jasna Belc, SVEZ
    Zagotovitev digitalnega originala
    Provision of digital original
    Responsibility Statement:
    Špela Vintar, FF
    Poravnava
    Alignment
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 1511 Kb, 89 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs.

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve.

    Source Description:
    Bibliography list:
    1. Strategija Republike Slovenije za vključevanje v Evropsko unijo september 1997
      Publisher:
      SVEZ: Služba Vlade RS za evropske zadeve
    2. Strategy of the Republic of Slovenia for Integration into the European Union September 1997
      Publisher:
      SVEZ: Office of the Government of the Republic of Slovenia for European Affairs

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.

    Segmentation:

    The digital original was segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 3824
    w = 89057
    c = 11180

    Revision Description



    TEI header

    type: text
    id: kmet.H
    creator: ET
    status: update
    created: 1999-03-31
    updated: 2002-04-01

    File Description

    Title Statement:
    Naslov:
    Državni program za prilagajanje zakonodaje - kmetijstvo
    Title:
    Slovenia's programme for accession to EU - agriculture
    Responsibility Statement:
    Jasna Belc, SVEZ
    Zagotovitev digitalnega originala
    Provision of digital original
    Responsibility Statement:
    Špela Vintar, FF
    Poravnava
    Alignment
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 543 Kb, 29 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve

    Source Description:
    Bibliography list:
    1. Državni program Republike Slovenije za prilagajanje zakonodaje - kmetijstvo
      Publisher:
      SVEZ
    2. National programme of the Republic of Slovenia for accession to the European Union - agriculture
      Publisher:
      European Union

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.

    This document was originally formatted as a table. For easier processing large segments of text containing nummerical data were omitted.

    Segmentation:

    The digital original was segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 2068
    w = 28830
    c = 4896

    Revision Description



    TEI header

    type: text
    id: ekon.H
    creator: ET
    status: update
    created: 1999-03-31
    updated: 2002-04-01

    File Description

    Title Statement:
    Naslov:
    Državni program za prilagajanje zakonodaje - gospodarstvo
    Title:
    Slovenia's programme for accession to EU - economy
    Responsibility Statement:
    Jasna Belc, SVEZ
    Zagotovitev digitalnega originala
    Provision of digital original
    Responsibility Statement:
    Špela Vintar, FF
    Poravnava
    Alignment
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 394 Kb, 23 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve

    Source Description:
    Bibliography list:
    1. Državni program za prilagajanje zakonodaje - gospodarstvo
      Publisher:
      Služba Vlade RS za evropske zadeve
    2. National programme of the Republic of Slovenia for accession to the European Union - economy
      Publisher:
      Office of the Government of the Republic of Slovenia for European Affairs

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.

    This document was originally formatted as a table. For easier processing large segments of text containing nummerical data were omitted.

    Segmentation:

    The digital original was segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 1042
    w = 23763
    c = 2282

    Revision Description



    TEI header

    type: text
    id: vade.H
    creator: ET
    status: update
    created: 1999-04-13
    updated: 2002-04-01

    File Description

    Title Statement:
    Naslov:
    Vademecum Lekove domače lekarne
    Title:
    Vademecum by Lek
    Responsibility Statement:
    Dragana Milikič, Janez Jelnikar, LEK, OTP Division
    Zagotovitev digitalnega originala
    Provision of digital original
    Responsibility Statement:
    Špela Vintar, FF
    Poravnava
    Alignment; conversion from digital source
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 471 Kb, 24 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by Lek, d.d., Verovškova 57, 1000 Ljubljana

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila so v lasti podjetja Lek, d.d., Verovškova 57, 1000 Ljubljana

    Source Description:
    Bibliography list:
    1. Vademecum Lekove domače lekarne 1995 Gorazd Hladnik, Nataša Kapelj, Jože Kopač, Darja Temlin-Mihelič 1995
      Publisher:
      Lek d.d., OTC sekcija
    2. Vademecum Lek; OTC Division 1995 Gorazd Hladnik, Nataša Kapelj, Jože Kopač, Darja Temlin-Mihelič 1995
      Publisher:
      Lek d.d.; OTC Division

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    Digital original converted to text with Atril.

    Typography codes and ToC information removed

    Quotes normalised to ", list bullets to -

    Segmentation:

    Digital original segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation into W and C (punctuation) elements by MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 2144
    w = 24036
    c = 4448

    Revision Description



    TEI header

    type: text
    id: vino.H
    creator: ET
    status: update
    created: 1999-02-15
    updated: 2002-04-01

    File Description

    Title Statement:
    Title:
    EC Council Regulation No 3290/94 - agriculture
    Naslov:
    Uredba sveta ES št. 3290/94 - kmetijstvo
    Responsibility Statement:
    Jasna Belc, SVEZ
    Zagotovitev digitalnega originala
    Provision of digital original
    Responsibility Statement:
    Špela Vintar, FF
    Poravnava
    Alignment
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 1182 Kb, 69 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve

    Source Description:
    Bibliography list:
    1. COUNCIL REGULATION (EC) No 3290/94 of 22 December 1994 on the adjustments and transitional arrangements required in the agriculture sector in order to implement the agreements concluded during the Uruguay Round of multilateral trade negotiations December 22 1994
      Publisher:
      European Council
    2. UREDBA SVETA (ES) št. 3290/94 z dne 22. decembra 1994 o prilagoditvah in prehodnih dogovorih, ki so potrebni v kmetijskem sektorju za izvajanje sporazumov, sklenjenih med Urugvajskim krogom večstranskih trgovinskih pogajanj 22. december 1994
      Založnik:
      Evropski svet

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.

    This document was originally formatted as a table. For easier processing large segments of text containing nummerical data were omitted.

    Segmentation:

    The digital original was segmented and aligned with Atril and the alignments hand corrected.

    Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 3150
    w = 68743
    c = 9610

    Revision Description



    TEI header

    type: text
    id: ligs.H
    creator: ET
    status: update
    created: 1999-04-06
    updated: 2002-04-01

    File Description

    Title Statement:
    Title:
    Linux Installation and Getting Started
    Naslov:
    Namestitev in začetek dela z Linuxom
    Matt Welsh, Phil Hughes, David Bandel, Boris Beletsky, Sean Dreilinger, Robert Kiesling, Evan Liebovitch, Henry Pierce Translator: Roman Maurer
    Responsibility Statement:
    Roman Maurer
    Translation, alignment
    Prevod, poravnava
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 3044 Kb, 173 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This text of the corpus is available according to the GNU General Public License, by the Free Software Foundation.

    To besedilo korpusa je dostopno po GNU General Public License, spisano pri Free Software Foundation.

    Source Description:
    Bibliography list:
    1. Linux Installation and Getting Started
      Matt Welsh, Phil Hughes, David Bandel, Boris Beletsky, Sean Dreilinger, Robert Kiesling, Evan Liebovitch, Henry Pierce
      Publisher:
      Specialized Systems Consultants <http://www.ssc.com/>
      <http://metalab.unc.edu/LDP/LDP/gs/gs.html> <ftp://metalab.unc.edu/pub/Linux/docs/linux-doc-project/install-guide/>
    2. Namestitev in začetek dela z Linuxom Translator: Roman Maurer
      Publisher:
      LUGOS: Linux User Group Of Slovenia <http://www.lugos.si/>
      <http://www.lugos.si/delo/slo/LIGS-sl/> <ftp://ftp.lugos.si/pub/lugos/doc/install-guide-sl/>

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    All formatting removed.

    Quotation:

    All quotation marks converted to "

    Segmentation:

    Manual into translation segments, with mtseg into tokens.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 11546
    w = 173481
    c = 28993

    Revision Description



    TEI header

    type: text
    id: gnpo.H
    creator: ET
    status: update
    created: 1999-05-03
    updated: 2002-04-01

    File Description

    Title Statement:
    Title:
    GNU PO localisations
    Naslov:
    GNU lokalizacije PO
    Responsibility Statement:
    Primož Peterlin, Roman Maurer
    Translation into Slovene
    Prevod v slovenski jezik
    Responsibility Statement:
    Peter Holozan, Amebis
    Leksikalne oznake
    Lexical annotation
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Tokenizacija, tagiranje, pretvorba v TEI
    Tokenisation, tagging, conversion to TEI
    Edition Statement:
    Edition: Version 2.0
    Extent: 353 Kb, 13 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This text of the corpus is available according to the GNU General Public License, by the Free Software Foundation.

    To besedilo korpusa je dostopno po GNU General Public License, spisano pri Free Software Foundation.

    Source Description:

    The source 'texts' of the GNPO text are GNU localisation files (.po) for the following programs:

    Bibliography list:
    1. enscript-1.6.2.sl.po fileutils-4.0e.sl.po gettext-0.10.35.sl.po grep-2.2f.sl.po hello-1.3.4.sl.po lyx-1.0.1.sl.po recode-3.4l.sl.po tar-1.12.sl.po wget-1.5.3.sl.po 1999
      Publisher:
      Free Software Foundation <http://www.fsf.org/>
    2. enscript-1.6.2.sl.po fileutils-4.0e.sl.po gettext-0.10.35.sl.po grep-2.2f.sl.po hello-1.3.4.sl.po lyx-1.0.1.sl.po recode-3.4l.sl.po tar-1.12.sl.po wget-1.5.3.sl.po 1999
      Publisher:
      Free Software Foundation <http://www.fsf.org/>

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    Editorial declaration:
    Normalisation:

    Source files were merged, and empty and repeated translations removed. All formatting was removed from the original.

    Quotation:

    All quotation marks converted to "

    Segmentation:

    The source localisation files come pre-segmented into translation units. This source was tokenised with MULTEXT mtseg and pre- and post-edited with Perl conversion programs.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 3312
    w = 13377
    c = 4102

    Revision Description



    TEI header

    type: text
    id: orwl.H
    creator: ET
    status: update
    created: 1999-04-13
    updated: 2002-04-01

    File Description

    Title Statement:
    Title:
    Nineteen Eighty-Four
    Naslov:
    1984
    George Orwell
    Responsibility Statement:
    Tomaž Erjavec, IJS
    Pretvorba iz korpusa MULTEXT-East
    Conversion from MULTEXT-East corpus
    Edition Statement:
    Edition: Version 2.0
    Extent: 6698 Kb, 195 kW
    Publications Statement:
    Distributer:
    Odsek za inteligentne sisteme Institut "Jožef Stefan" Jamova 39 1000 Ljubljana
    Distributor:
    Dept. of Intelligent Systems, Jozef Štefan Institute Jamova 39 SI-1000 Ljubljana Slovenia
    Place of publication:
    <http://nl.ijs.si/elan/>
    Availiability:

    This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged.

    To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi.

    Source Description:
    Bibliography list:
    1. Nineteen Eighty-Four George Orwell 1948 <http://nl.ijs.si/ME/CD/docs/1984.html>
    2. 1984 George Orwell Prevod: Alenka Puhar 1983
      Publisher:
      Knjižnica Kondor
      Publisher:
      Mladinska knjiga
      Ljubljana <http://nl.ijs.si/ME/CD/docs/1984.html>

    Encoding Description

    Project description:

    This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.

    The digital source of this text is the updated Slovene and English '1984' produced in the scope of the MULTEXT-East project. <http://nl.ijs.si/ME/>

    Editorial declaration:
    Normalisation:

    Elements POEM and LIST from MULTEXT-East digital originals removed.

    Quotation:

    Elements FOREIGN, HI, TITLE, MENTIONED, DISTINCT from MULTEXT-East digital originals marked as opening and closing '. Ditto for element Q, with "

    Segmentation:

    Sementation into sentences and translation segments taken from MULTEXT-East cesAlign documents.

    Tokenisation and tagging taken from the MULTEXT-East cesAna documents.

    Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).

    Tags declaration:
    text = 3
    group = 1
    body = 2
    seg = 13278
    s = 13386
    w = 194724
    c = 41390

    Revision Description