Multext-East by Language

This pages gives, by language, the contact point responsible for the langauge resources, the hyperlink to the Ethnologue, some ISO designations, a table of derived HTML resources, and the pointers to the directories with the actual resources (WWW access restricted).

Also included here are the additional languages, whose resources have been added due to the TELRI Concerted action.


English

[Contact point]

English is the meta-language of the project.

ISO designations:
ISO 639 code:en
ISO 8859 character set: ISO 8859-1 (Latin 1)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 1//EN

HTML resources:
1984 Report Header Sampler
Speech Report Header Sampler
Lexicon Report Sampler
Morphosyntax Report

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-en.*
Speech Corpus: corp/spch/*-en.*
Word-Form Lexicon: lexi/*-en.*
MULTEXT tool resources: tool/Multext/en


Bulgarian

[Contact point]

Data on the language from the Ethnologue. http://www.ethnologue.com/show_language.asp?code=SLV

ISO designations:
ISO 639 code:bg
ISO 8859 character set: ISO 8859-5 (ISO Cyrillic)
ISO 8879 entities: ISO 8879-1986//ENTITIES Russian Cyrillic//EN
ISO 8879-1986//ENTITIES Non Russian Cyrillic//EN

HTML resources:
1984 Report Header Sampler Alignment
Fiction Report Header Sampler
Newspapers Report Header Sampler
Speech Report Header Sampler
Lexicon Report Sampler
Morphosyntax Report

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-bg.*
Comparable Corpus: corp/comp/*-bg.*
Speech Corpus: corp/spch/*-bg.*
Word-Form Lexicon: lexi/*-bg.*
MULTEXT tool resources: tool/Multext/bg


Czech

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:cs
ISO 8859 character set: ISO 8859-2 (Latin 2)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 2//EN

HTML resources:
1984 Report Header Sampler Alignment
Fiction Report Header Sampler
Newspapers Report Header Sampler
Speech Report Header Sampler
Lexicon Report Sampler
Morphosyntax Report

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-cs.*
Comparable Corpus: corp/comp/*-cs.*
Speech Corpus: corp/spch/*-cs.*
Word-Form Lexicon: lexi/*-cs.*
MULTEXT tool resources: tool/Multext/cs


Estonian

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:et
ISO 8859 character set: ISO 8859-10 (ISO Latin 6)
sloppily: ISO 8859-2 (ISO Latin 2)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 1//EN
ISO 8879-1986//ENTITIES Added Latin 2//EN

HTML resources:
1984 Report Header Sampler Alignment
Fiction Report Header Sampler
Newspapers Report Header Sampler
Speech Report Header Sampler
Lexicon Report Sampler
Morphosyntax Report

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-et.*
Comparable Corpus: corp/comp/*-et.*
Speech Corpus: corp/spch/*-et.*
Word-Form Lexicon: lexi/*-et.*
MULTEXT tool resources: tool/Multext/et


Hungarian

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:hu
ISO 8859 character set: ISO 8859-2 (Latin 2)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 2//EN

HTML resources:
1984 Report Header Sampler Alignment
Fiction Report Header Sampler
Newspapers Report Header Sampler
Speech Report Header Sampler
Lexicon Report Sampler
Morphosyntax Report

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-et.*
Comparable Corpus: corp/comp/*-et.*
Speech Corpus: corp/spch/*-et.*
Word-Form Lexicon: lexi/*-et.*
MULTEXT tool resources: tool/Multext/et


Romanian

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:ro
ISO 8859 character set: ISO 8859-2 (Latin 2)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 2//EN

HTML resources:
1984 Report Header Sampler Alignment
Fiction Report Header Sampler
Newspapers Report Header Sampler
Speech Report Header Sampler
Lexicon Report Sampler
Morphosyntax Report

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-ro.*
Comparable Corpus: corp/comp/*-ro.*
Speech Corpus: corp/spch/*-ro.*
Word-Form Lexicon: lexi/*-ro.*
MULTEXT tool resources: tool/Multext/ro


Slovene

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:sl
ISO 8859 character set: ISO 8859-2 (Latin 2)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 2//EN

HTML resources:
1984 Report Header Sampler Alignment
Fiction Report Header Sampler
Newspapers Report Header Sampler
Speech Report Header Sampler
Lexicon Report Sampler
Morphosyntax Report

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-sl.*
Comparable Corpus: corp/comp/*-sl.*
Speech Corpus: corp/spch/*-sl.*
Word-Form Lexicon: lexi/*-sl.*
MULTEXT tool resources: tool/Multext/sl



TELRI Additions:

Latvian

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:lv
ISO 8859 character set: ISO 8859-10 (ISO Latin 6)
sloppily: ISO 8859-2 (ISO Latin 2)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 1//EN
ISO 8879-1986//ENTITIES Added Latin 2//EN

HTML resources:
1984 Report Header Sampler Alignment


Lithuanian

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:lt
ISO 8859 character set: ISO 8859-10 (ISO Latin 6)
sloppily: ISO 8859-2 (ISO Latin 2)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 1//EN
ISO 8879-1986//ENTITIES Added Latin 2//EN

HTML resources:
1984 Report Header Sampler Alignment

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-lt.*


Serbo-Croatian

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:sh
ISO 8859 character set: ISO 8859-2 (ISO Latin 2)
or ISO 8859-5 (ISO Cyrillic)
ISO 8879 entities: ISO 8879-1986//ENTITIES Added Latin 1//EN
ISO 8879-1986//ENTITIES Added Latin 2//EN
or ISO 8879-1986//ENTITIES Russian Cyrillic//EN
ISO 8879-1986//ENTITIES Non Russian Cyrillic//EN

HTML resources:
1984 Report Header Sampler Alignment

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-sc.*



Xtra Xmas Addition:


Russian

[Contact point]

Data on the language from the Ethnologue.

ISO designations:
ISO 639 code:ru
ISO 8859 character set: ISO 8859-5 (ISO Cyrillic)
ISO 8879 entities: ISO 8879-1986//ENTITIES Russian Cyrillic//EN

HTML resources:
1984 Header Sampler

Resources (WWW access restricted):
'1984' Corpus: corp/1984/*-ru.*


[home]
Last updated 2003-07-28 by et