SDT logo
Slovene Dependency Treebank

nl.ijs.si SDT TEI header References Download SDT Related links

The Slovene Dependency Treebank project built a small syntactically annotated corpus of Slovene texts. The corpus was annotated with dependency analyses, taking the Prague Dependecy Treebank as the model. The Slovene Dependency Treebank is annotated with Analytic Tree Structures and contains a part of the morphosyntactically annotated Slovene component of the parallel MULTEXT-East corpus, i.e. the first third of the Slovene translation of the novel "1984" by G. Orwell, containing 30,000 words.

SDT took part in the CoNLL-X Shared Task: Multi-lingual Dependency Parsing where there is also a page with comparative results.

The SDT is freely available for research and can be found in the directory http://nl.ijs.si/sdt/data/. If you report on your research involving SDT in a published paper, please cite the first reference below.

We offer two version of SDT: the data used for CoNLL-X, and a more recent release which fixes some annotation erros and also offers the treebank encoded in TEI P4, as well as in the derived CoNLL tabular format. More information about the current version of SDT is given in its TEI header.

In further work, we changed to a local, simpler format for annotation. Treebanks annotated according this this format are available form the JOS project (jos100k with 100.000 words) and the SSJ project (ssj500k with 250.000 words treebanked).

Tree samples

Example of annotated tree Example of annotated tree

References

Further links

Local


Last change 2012-12-21, et

Valid HTML 4.01!