Each of the six translations of 1984 has been S aligned with the English original, and the alignments hand validated. The alignment is not hierarchical, i.e. division and paragraph level alignments have not been retained, although they have been used in the process of alignment. The S-level elements that have been aligned are the following:
The initial S alignment was performed automatically, where different aligners were used for different languages:
With Vanilla (see http://svenska.gu.se/PEDANT/workshop/workshop.html ), the texts were first aligned to the paragraph level, these alignments checked and, where necessary, corrected. Once this alignment was correct, the paragraph level links were taken as 'hard' links, and S-level alignment performed. This was again hand-validated, where, in addition to alignment errors, this validation often exposed errors of sentence segmentation. Automatic alignment can produce, in addition to 1-1 links, 2-1, 1-2, 2-2, 0-1, and 1-0 links. In manual verification, a number of other links were discovered as well. First, where there was a sequence of 0-1 or 1-0 links, these were (typically) merged into 0-n or n-0 links. Such links were due to translators not translating a portion of the text. But furthermore, other link arities were discovered, e.g. 1-6 and 2-4 links. The table below summarizes all the link arities encountered in the six translation-original alignments of MULTEXT-East:
Link | BG-EN | CS-EN | ET-EN | HU-EN | RO-EN | SL-EN |
---|---|---|---|---|---|---|
0 - 1 | 16 | 21 | 2 | 19 | 10 | 3 |
0 - 2 | 1 | 1 | 3 | 2 | ||
0 - 3 | 2 | |||||
0 - 4 | 1 | |||||
1 - 0 | 2 | 1 | 1 | 2 | ||
1 - 1 | 6623 | 6439 | 6428 | 6477 | 6047 | 6572 |
1 - 2 | 36 | 78 | 100 | 47 | 259 | 53 |
1 - 3 | 2 | 1 | 14 | |||
1 - 4 | ||||||
1 - 5 | 1 | 1 | 1 | |||
1 - 6 | 1 | |||||
2 - 1 | 22 | 110 | 58 | 108 | 85 | 48 |
2 - 2 | 2 | 3 | 2 | |||
2 - 3 | 3 | |||||
2 - 4 | 1 | |||||
3 - 1 | 2 | 2 | 7 | 3 | ||
3 - 3 | 1 | |||||
4 - 1 | 1 | 1 |
Link arities in ``1984'' alignment
In the following section we give the details of the CES encoding of
the alignment documents. As these documents do not contain the aligned
sentences directly, a HTML version of the alignments was also
prepared. To produce it, the NSL software produced by LTG was used.
For details on this software see
http://www.ltg.ed.ac.uk/software/
.