sloWNet is the first semantic lexicon for Slovene that is based on
Princeton WordNet and was developed automatically from various available language resources, such as bilingual dictionaries, parallel corpora and Wikipedia.
sloWNet contains nouns, verbs, adjectives and adverbs (called literals) which are grouped into sets of synonyms (called synsets), each representing a distinct concept. Each synset has a unique id, part-of-speech information and a short definition.
Most synsets also contain one or more example sentences. The most general synsets are organized into 3 Base Concept Sets while specialized synsets are marked with a domain label (e.g. Zoology). Synsets in sloWNet are interlinked with semantic and lexical relations, such as hypernymy, meronymy, antonymy etc.
sloWNet 3.0 contains about 39,000 unique literals that are organized into 43,000 synsets. These are mostly nominal (72%). In addition to single words, sloWNet currently contains approximately 9,000 multi-word expressions and 3,000 proper names. On average, one synset contains 1.92 literals, and the same literal appears in 2.07 different synsets.
sloWNet is freely available for research under the
Creative Commons licence.
For more information, please see:
FIŠER, Darja, SAGOT, Benoít. Combining multiple resources to build reliable wordnets. Text, Speech and Dialogue (LNCS 2546). Berlin; Heidelberg: Springer, 2008 pp. 61-68.