JOS
Morphosyntactic Specifications V1.1
The purpose of the JOS morphosyntactic specifications is to provide a well-documented and accessbile tagset appropriate for word-level syntactic tagging of Slovene language corpora and texts.
The JOS specifications are compatible with the Slovene part of
the multilingual
MULTEXT-East morphosyntactic
specifications Version 4. They define the part-of-speech
categories for Slovene, and for each category its attributes and
their values. They also define the mapping from these values
into a position-based compact string encoding, the
morphosyntactic descriptions (MSDs), and list all valid MSDs for
Slovene. So, for example, the specifications state
that Noun, Type = common, Gender = masculine, Number =
singular, Case = accusative, Animate = no
maps to the
MSD Ncmsan
and that it is a valid MSD for Slovene.
The specifications also provide some brief commentary, and
exemplify each MSD with corpus examples. The specifications are
written in Slovene and English, so that the commentary, MSDs and
feature-value combinations can be expressed in either language,
e.g. Ncmsan
maps to Sometn
and samostalnik, vrsta = občno ime, spol = moški, število
= ednina, sklon = tožilnik, živost = ne
.
The source specifications are written in XML, in a TEI P5 schema. From this source several directly usable formats are produced with XSLT stylsheets, i.e. HTML in Slovene and English, tabular files giving conversions between MSDs and feature-sets, and XML libraries for use in corpora.
The specifications (source XML, XSLT stylesheets, HTML and tabular files) are provided under the Creative Commons Attribution 3.0 licence, i.e. you can use them in whatever way you want, but please give credit by citing either the specifications or an appropriate paper in the JOS bibliograpy.
Download
The complete specifications, consisting of:
- tables/
- Conversion tables between MSDs and feature-structures (UTF-8, tab separated columns):
- josMSD.tbl
- Full list of MSDs, with first column giving the collating sequence, second the MSD in Slovene and the third the MSD in English
- josMSD-val-sl.tbl, josMSD-val-en.tbl
- MSDs with short expansions to feature values
- josMSD-attval-sl.tbl, josMSD-attval-en.tbl
- MSDs with attribute=value expansions for all attributes defined for PoS
- josMSD-canon-sl.tbl, josMSD-canon-en.tbl
- MSDs with attribute=value expansions for all defined attributes
- josMSD-lib-sl.xml, josMSD-lib-en.xml
- MSDs and features expressed as TEI XML feature-structure libraries
- html-sl/
- Specifications in HTML / Slovene, including the TEI header
- html-en/
- Specifications in HTML / English, including the TEI header
- xml/
- XML source, with schema
- josMSDv1_1.xml (1.3MB)
- JOS morphosyntactic specifications, source format
- teiP5/
- TEI P5 teiLite schema used for the specifications