next up previous contents
Next: Organisation of the MULTEXT-East Up: Multext-East D2.1 F Previous: Slovene

Multilingual Parallel Speech Corpus

 COP project 106 MULTEXT-East Deliverable D2.1 F Multilingual Speech Corpus

For the MULTEXT-East speech corpus, a small sample of the English part of the EUROM1 multilingual speech database was selected. This text was translated into the six languages and, except for Bulgarian and Czech, recorded for one male speaker and digitized in accordance with the EUROM recommendations. The seven texts were, in addition to the EUROM format, also encoded in CES, as cesDoc elements.

The texts chosen comprise 40 passages (4 x 10, designated blocks: O, P, Q, R / 0-9) of 5 thematically linked sentences. As an example we give below the English blocks O0 and O1:

                              BLOCK: O0

Last week my friend had to go to the doctors to have some injections. She
is going to the Far East for a holiday and she needs to have an injection
against cholera, typhoid fever, hepatitis A, polio and tetanus. I think she
will feel quite ill after all those. She is going to get them all done at
once, at one session. I shan't feel sorry for her though!


                              BLOCK: O1

I have a problem with my water softener.  The water-level is too high and
the overflow keeps dripping. Could you arrange to send an engineer on
Tuesday morning please? It's the only day I can manage this week. I'd be
grateful if you could confirm the arrangement in writing.

The translations into the MULTEXT-East languages were 'localised', i.e. the situations described in the monologues were translated as if the speaker were describing a situation in his native land. For example, local place names were used instead of the British ones.

The text were spoken by one native male speaker. The recording was performed as close as possible to the EUROM guidelines, which are as follows:

The digitisation features of the recordings were the following:



 
next up previous contents
Next: Organisation of the MULTEXT-East Up: Multext-East D2.1 F Previous: Slovene
Multext-East