SDT logo
Slovene Dependency Treebank

nl.ijs.si SDT TEI header References Related links

The Slovene Dependency Treebank project built a small syntactically annotated corpus of Slovene texts. The corpus was annotated with dependency analyses, taking the Prague Dependecy Treebank as the model. The Slovene Dependency Treebank is annotated with Analytic Tree Structures and contains a part of the morphosyntactically annotated Slovene component of the parallel MULTEXT-East corpus, i.e. the first third of the Slovene translation of the novel "1984" by G. Orwell, containing 30,000 words.

SDT took part in the CoNLL-X Shared Task: Multi-lingual Dependency Parsing. The data for this shared task, including Slovene, is available via LDC and ELRA:

Just the SDT can be also downloaded from http://nl.ijs.si/sdt/data/. Here we offer two version of SDT: the data used for CoNLL-X, and a somewhat more recent release which fixes some annotation erros and also offers the treebank encoded in TEI P4, as well as in the derived CoNLL tabular format. More information about the current version of SDT is given in its TEI header.

If you report on your research involving SDT in a published paper, please cite the first reference below.

In subsequent work we changed to a local, simpler format for annotation. Treebanks annotated in this format are available form the JOS project (jos100k with 100.000 words) and the SSJ project (ssj500k with 250.000 words treebanked). Recently, we moved to the Universal Dependecies framework where you can find the Slovene UD treebank (derived from ssj500k).

Tree samples

Example of annotated tree Example of annotated tree

References

Further links


Last change 2015-12-06, et

Valid HTML 4.01!