In the corpus there are two types of files:
All the filenames have the structure: name.suffix
The name is composed of:
The speech sample file suffix is, for example, .pss (p = passage; s = Slovene; s = speech sample).
The following one letter codes were used for the MULTEXT-East languages:
The description file suffix is pXo with:
Here is an example of an English associated file (ie suffix .peo):
-------------------------------------------------------------------------- HD: V3.0 TYP: orthographic DBN: EUROM_1 VOL: DIR: SRC: FAO00079.PES TXF: O0.TXT CMT: Information about the recording session SAM: 20000 BEG: 0 END: 406271 RED: 18/Jan/91 RET: 14:15:50 REP: UCL SNB: 2 SBF: 01 SSB: 16 RCC: 2 NCH: 2 SPI: M, 48, BRITISH PCF: PASSAGE.DES CMT: Information about the labelling session EXP: SYS: DAT: SPA: CMT: Item: label start, end, input gain, min level, max level, string LBD: LBR: 0, 406271, 6, -3706, 4534, Last week my friend had to go to the doctors to EXT: have some injections. She is going to the Far East for a holiday and she EXT: needs to have an injection against cholera, typhoid fever, hepatitis A, EXT: polio and tetanus. I think she will feel quite ill after all those. She is EXT: going to get them all done at once, at one session. I shan't feel sorry EXT: for her though! LB2: 0, 406271, 0, -6143, 9339 ELF: -----------------------------------------------------------------------------
The following are the minimal fields that must be present in a description file:
TYP, DBN, SRC, TXF, SAM, SNB, SBF, SSB, BEG, END, RED, SPI, PCF, LBR and following EXT