type: | corpus |
---|---|
id: | ijs-elan.H |
creator: | ET |
status: | update |
created: | 1999-04-14 |
updated: | 2002-04-01 |
This parallel aligned corpus is freely available, provided that the sources described in this Header or in the Headers of its TEI.2 text elements are acknowledged.
Ta vzporedni poravnani korpus je prosto dostopen, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi ali v glavah njegovih TEI.2 besedil.
This corpus is composed of 15 texts:
This corpus is the (updated version of the) LJU1 site (IJS) contribution to the EU MLIS project ELAN: European Language Activity Network For more information see the IJS-ELAN homepage and the ELAN project homepage
All formatting removed from originals removed.
Only ASCII characters and SGML entities used: see the DTD for defined entities.
List bullets normalised to -, or left as *.
No line contains more than one start/end tag or more than one element; white space between tokens is preserved before RE.
Quotation marks converted to " or '
Start / end quote is indicated with the open / close values of the TYPE attribute of C.
Segmentation into translation units and segments semi automatic with various tools
Tokenisation with MULTEXT mtseg, correcting the results with Perl & Emacs.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | usta.H |
creator: | ET |
status: | update |
created: | 1999-01-28 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by the Constitutional Court of the Republic of Slovenia, <http://www.gov.si/us/>
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila so v lasti ustavnega sodišča Republike Slovenije, <http://www.gov.si/us/>
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
Introductory and back matter removed; not suitable for alignment.
HTML elements removed
The original HTML documents were converted to TEIlite, and then to the format for Vanilla aligner; the process looses all markup.
Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | kuca.H |
creator: | ET |
status: | update |
created: | 1999-04-13 |
updated: | 2002-04-01 |
This parallel aligned text is freely available.
To vzporedno poravnano besedilo je prosto dostopno.
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
HTML markup removed; Quotes normalised to ", list bullets to -.
The digital original was converted, segmented and aligned with Atril and the alignments hand corrected.
Tokenisation performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | parl.H |
creator: | ET |
status: | update |
created: | 1999-04-13 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by The National Assembly of the Republic of Slovenia.
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo Državnemu zboru Republike Slovenije.
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
HTML markup removed; Quotes normalised to ", list bullets to -.
The digital original was converted, segmented and aligned with Atril and the alignments hand corrected.
Tokenisation performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | ecmr.H |
creator: | ET |
status: | update |
created: | 1999-04-13 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by the Institute of Macroeconomic Analysis and Development of the Republic of Slovenia.
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo Uradu Republike Slovenije za makroekonomske analize in razvoj.
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
HTML tags removed; Quotes normalised to ", list bullets to -.
The digital original was converter, segmented and aligned with Atril and the alignments hand corrected.
Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources, and the reslults patched with Perl.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | ekol.H |
creator: | ET |
status: | update |
created: | 1999-04-13 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by Republic of Slovenia, Ministry of the Environment and Physical Planning.
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo Repuliki Slovenija, Ministrstvu za okolje in prostor, Uprava RS za varstvo narave.
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
HTML markup removed; Quotes normalised to ", list bullets to -.
The digital original was converted, segmented and aligned with Atril and the alignments hand corrected.
Tokenisation was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | spor.H |
creator: | ET |
status: | update |
created: | 1999-03-31 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs.
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve.
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.
The digital original was segmented and aligned with Atril and the alignments hand corrected.
Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | anx2.H |
creator: | ET |
status: | update |
created: | 1999-03-31 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network. For more information see the IJS-ELAN homepage <http://nl.ijs.si/elan/>.
Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.
This document was originally formatted as a table. For easier processing large segments of text containing nummerical data were omitted.
The digital original was segmented and aligned with Atril and the alignments hand corrected.
Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | stra.H |
creator: | ET |
status: | update |
created: | 1999-02-15 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs.
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve.
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.
The digital original was segmented and aligned with Atril and the alignments hand corrected.
Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | kmet.H |
creator: | ET |
status: | update |
created: | 1999-03-31 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.
This document was originally formatted as a table. For easier processing large segments of text containing nummerical data were omitted.
The digital original was segmented and aligned with Atril and the alignments hand corrected.
Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | ekon.H |
creator: | ET |
status: | update |
created: | 1999-03-31 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.
This document was originally formatted as a table. For easier processing large segments of text containing nummerical data were omitted.
The digital original was segmented and aligned with Atril and the alignments hand corrected.
Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | vade.H |
creator: | ET |
status: | update |
created: | 1999-04-13 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by Lek, d.d., Verovškova 57, 1000 Ljubljana
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila so v lasti podjetja Lek, d.d., Verovškova 57, 1000 Ljubljana
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
Digital original converted to text with Atril.
Typography codes and ToC information removed
Quotes normalised to ", list bullets to -
Digital original segmented and aligned with Atril and the alignments hand corrected.
Tokenisation into W and C (punctuation) elements by MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | vino.H |
creator: | ET |
status: | update |
created: | 1999-02-15 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged. Copyright of the two digital originals for this corpus held by SVEZ: Office of the Government of the Republic of Slovenia for European Affairs
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi. Avtorske pravice nad digitalnima originaloma tega besedila pripadajo SVEZ: Služba Vlade RS za evropske zadeve
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
Typography codes and ToC information removed; Quotes normalised to ", list bullets to -.
This document was originally formatted as a table. For easier processing large segments of text containing nummerical data were omitted.
The digital original was segmented and aligned with Atril and the alignments hand corrected.
Tokenisation into word and character (punctuation) elements was performed by the MULTEXT program mtlex with MULTEXT-East Slovene segmentation resources.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | ligs.H |
creator: | ET |
status: | update |
created: | 1999-04-06 |
updated: | 2002-04-01 |
This text of the corpus is available according to the GNU General Public License, by the Free Software Foundation.
To besedilo korpusa je dostopno po GNU General Public License, spisano pri Free Software Foundation.
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
All formatting removed.
All quotation marks converted to "
Manual into translation segments, with mtseg into tokens.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | gnpo.H |
creator: | ET |
status: | update |
created: | 1999-05-03 |
updated: | 2002-04-01 |
This text of the corpus is available according to the GNU General Public License, by the Free Software Foundation.
To besedilo korpusa je dostopno po GNU General Public License, spisano pri Free Software Foundation.
The source 'texts' of the GNPO text are GNU localisation files (.po) for the following programs:
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
Source files were merged, and empty and repeated translations removed. All formatting was removed from the original.
All quotation marks converted to "
The source localisation files come pre-segmented into translation units. This source was tokenised with MULTEXT mtseg and pre- and post-edited with Perl conversion programs.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).
type: | text |
---|---|
id: | orwl.H |
creator: | ET |
status: | update |
created: | 1999-04-13 |
updated: | 2002-04-01 |
This bi-text of the IJS-ELAN corpus is freely available, provided that the the sources described in this Header are acknowledged.
To vzporedno poravnano besedilo korpusa IJS-ELAN je prosto dostopno, pod pogojem, da se citira njegove vire, dokumentirane v tej glavi.
This text is part of the LJU1 site contribution to the EU MLIS project ELAN: European Language Activity Network <http://nl.ijs.si/elan/>.
The digital source of this text is the updated Slovene and English '1984' produced in the scope of the MULTEXT-East project. <http://nl.ijs.si/ME/>
Elements POEM and LIST from MULTEXT-East digital originals removed.
Elements FOREIGN, HI, TITLE, MENTIONED, DISTINCT from MULTEXT-East digital originals marked as opening and closing '. Ditto for element Q, with "
Sementation into sentences and translation segments taken from MULTEXT-East cesAlign documents.
Tokenisation and tagging taken from the MULTEXT-East cesAna documents.
Words automatically marked with context disambiguated lemma and MULTEXT-East morphosyntactic description. English words additionally tagged with BROWN-like tagset by two taggers (TnT,QTAG).