The MULTEXT-East resources are a multilingual dataset for language engineering research and development. Version 4 of this dataset contains multilingual language resources comprising the MULTEXT-East morphosyntactic specifications, lexica, and annotated "1984" corpus; the MULTEXT-East parallel and comparable text and speech corpora; and associated documentation.
The MULTEXT-East morphosyntactic specifications define attributes and values used for word-level syntactic annotation, i.e., they provide a formal grammar for the morphosyntactic properties of the languages covered. The specifications also contain commentary, bibliography, notes, etc.
Chiarcos and Erjavec (2011) describe the semi-automatic conversion of the MULTEXT-East morphosyntactic specifications Version 4 from the original TEI XML to OWL/DL. While TEI is more appropriate for authoring the specifications and displaying them in a book-oriented format, the OWL encoding has the advantages of enabling formally specifying interrelationships between the various features (concepts, or classes) and making logical inferences based on the relationships between them, useful in mediating between different tagsets and tools.
The resulting ontologies are available as OWL/DL files from this page.
The ontologies are distributed under the Creative Commons Attribution 3.0 Unported (CC BY 3.0) licence. You are free to to copy, distribute and transmit the work, to adapt the work and to make commercial use of the work under the condition that you make a reference to:
Christian Chiarcos and Tomaz Erjavec (2011), OWL/DL formalization of the MULTEXT-East morphosyntactic specifications. In: Proceedings of the 5th Linguistic Annotation Workshop (LAW-V), held in conjunction with the ACL-HLT 2011, June 2011, Portland, Oregon, USA, p. 11--20.