Slovene Dependency Treebank http://nl.ijs.si/sdt/ SDT V0.4 2006-05-17 This is the preliminary release of the Slovene Dependency Treebank, SDT V0.4 which contains the Prague Dependency Treebank-like annotation of the first part of Slovene translation of Orwell's "1984", taken from the MULTEXT-East parallel corpus, V3.0, c.f. http://ufal.mff.cuni.cz/pdt/ http://nl.ijs.si/ME/V3/ http://nl.ijs.si/ME/V3/doc/index.html#mtev3-doc-div2-id2305296 The SDT comes in three formats: *TEI: encoded in XML according to the TEI P4 Guidelines, with corpus header, containing, inter alia, the complete list of Slovene morphosyntactic descriptions encoded as feature structures. This is the cannonical format of the corpus, c.f. http://www.tei-c.org/P4X/ *CONLL: tabular file in the format for the CoNLL-X Shared Task on Multi-lingual Dependency Parsing, suitable for training. The directory includes some scripts that reconfigure the trees for use with various parsers, c.f. http://nextens.uvt.nl/~conll/ *FS: format for use with the TrEd editor, c.f. http://ufal.mff.cuni.cz/~pajas/tred/index.html This corpus is made available under the condition that it will be used for research purposes only, and that its use will be acknowledged in publications by citing the paper "Towards a Slovene Dependency Treebank" published in the Proceedings of Fifth International Conference on Language Resources and Evaluation, LREC'06, 24-26 May 2006. Genoa, c.f. http://nl.ijs.si/sdt/bib/SDT-LREC06.pdf =============================================================== Tomaž Erjavec | Dept. of Knowledge Technologies email: tomaz.erjavec@ijs.si | Jozef Stefan Institute www: http://nl.ijs.si/et/ | Jamova 39 fax: (+386 1) 477-3131 | SI-1000 Ljubljana, Slovenia