This page offers resources for learning Japanese as a second or foreign langugage, which have been produced in cooperation between the Dept. of Asian Studies at the University of Ljubljana and the Dept. of Knowledge Technologies at the Jožef Stefan Institute:
The jaSlo dictionary contains almost 10.000 entries, many of them linked to corpus examples. More infomation about the dictionary can be found it in its TEI header.
The complete dictionary can be downloaded in source TEI P5 format from the CLARIN.SI repository via the permanent URL hdl.handle.net/11356/1050.
The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting data from jaSlo, first automatically and then manually cleaning double or inappropriate entries, labelling the Slovenian headwords with part-of-speech and difficulty tags according to the CEFR scale as available in the Core Vocabulary of Slovenian (http://hdl.handle.net/11356/1697), and manually edited all entries using Lexonomy.
Senses of polysemous words and corresponding translation equivalents were manually glossed with hints on their meaning, in part also with examples, extracted from the jaSlo parallel corpus, and manually adapted for the learner's dictionary. Japanese translational equivalents from different registers were tagged according to their level of politeness and with notes on usage restrictions aimed at dictionary users who are learning Japanese as a foreign language.
The dictionary can be browsed at https://www.lexonomy.eu/#/sloJa, or downloaded from the CLARIN.SI repository with a CC-BY 4.0 licence, at http://hdl.handle.net/11356/1898.
The Japanese-Slovenian parallel corpus jaSlo is sentence aligned and contains about half a million words per language from 130 sources.
It is available via the CLARIN.SI noSketch Engine:
The Japanese web corpus with difficulty levels jpWaC-L contains over 300 million words, with words and sentences annotated with their difficulty level. The corpus is further split into 5 subcorpora, each for one difficulty level, from 4 (easiest) to 0 (hardest).
The difficulty level of the words comes from a lexicon provided by prof. Yoshiko Kawamura, Tokyo International University. Words are assigned difficulty levels according to the Japanese Language Proficiency Test Content Specifications (Revised Edition), Japan Foundation & Association of International Education Japan. Tokyo: Bonjinsha 2004. The difficulty level of the sentences is computed using various heuristics, based on the (difficulty level of) words, sentence length, etc. (c.f. Hmeljak et al. 2010)
The corpus was collected from the Web using WaCkY tools and then processed by Chasen.
The corpus is available as a downloadable dataset from the CLARIN.SI repository (http://hdl.handle.net/11356/1047) as well as for exploration via the CLARIN.SI installation of noSketch Engine, also split by difficulty level:
The corpora were part-of-speech tagged and lemmatised with ChaSen. The ChaSen tags (originally written in Japanese) have been also translated to English tags. In the concordancers the ChaSen-jp tags (i.e. those written in Japanese) are stored in the positional attribute "ctag" ChaSen-en in the attribute "tag". The table below gives the mapping (it is also available as a tabular file):
n | ChaSen-jp ctag | ChaSen-en tag | Expanded | Examples |
---|---|---|---|---|
1 | 名詞 | N | Noun | |
2 | 名詞-一般 | N.g | Noun(general) | ソナタ,年上,耳,好き |
3 | 名詞-固有名詞 | N.Prop | Noun(proper) | |
4 | 名詞-固有名詞-一般 | N.Prop.g | Noun(proper.general) | イスラム教,光が丘… |
5 | 名詞-固有名詞-人名 | N.Prop.n | Noun(proper.name) | |
6 | 名詞-固有名詞-人名-一般 | N.Prop.n.g | Noun(proper.name.general) | お市の方(おいちのかた),太安万侶(おおのやすまろ) |
7 | 名詞-固有名詞-人名-姓 | N.Prop.n.s | Noun(proper.name.surname) | 山田 |
8 | 名詞-固有名詞-人名-名 | N.Prop.n.f | Noun(proper.name.firstname) | 紀子,ひろし |
9 | 名詞-固有名詞-組織 | N.Prop.o | Noun(proper.organization) | NHK, 愛知銀行,パレスホテル… |
10 | 名詞-固有名詞-地域 | N.Prop.p | Noun(proper.place) | |
11 | 名詞-固有名詞-地域-一般 | N.Prop.p.g | Noun(proper.place.general) | 京都,アジア |
12 | 名詞-固有名詞-地域-国 | N.Prop.p.c | Noun(proper.place.country) | 日本,オーストリア |
13 | 名詞-代名詞 | N.Pron | Noun(pronoun) | |
14 | 名詞-代名詞-一般 | N.Pron.g | Noun(pronoun.general) | 私,誰,奴ら,あそこ,あちこち,それ |
15 | 名詞-代名詞-縮約 | N.Pron.sh | Noun(pronoun.shorten) | あたしゃ,そりゃ,そりゃあ,私しゃ |
16 | 名詞-副詞可能 | N.Adv | Noun(adverbal) | いつか,あまり,9月,いちばん,きのう,この先 |
17 | 名詞-サ変接続 | N.Vs | Noun(verbal) | 見学する,我慢する… |
18 | 名詞-形容動詞語幹 | N.Ana | Noun(adjective -na) | あいまい,安全,黄色,気の毒,気がかり,楽天的 |
19 | 名詞-数 | N.Num | Noun(numeral) | 何,四,億,1 |
20 | 名詞-非自立 | N.bnd | Noun(bound) | |
21 | 名詞-非自立-一般 | N.bnd.g | Noun(bound.general) | 作,きらい,ため,どころ,こと |
22 | 名詞-非自立-副詞可能 | N.bnd.Adv | Noun(bound.adverb) | っきり,折り,うち,あいだ,あたり,あまり |
23 | 名詞-非自立-助動詞語幹 | N.bnd.Aux | Noun(bound.auxiliary) | よう,様, やう,よ |
24 | 名詞-非自立-形容動詞語幹 | N.bnd.Ana | Noun(bound.adjective -na) | みたい,ふう |
25 | 名詞-特殊- | N.spec | Noun(special) | |
26 | 名詞-特殊-助動詞語幹 | N.spec.Aux | Noun(special.auxiliary) | そ,そう |
27 | 名詞-接尾 | N.Suff | Noun(suffix) | |
28 | 名詞-接尾-一般 | N.Suff.g | Noun(suffix.general) | OFF,あまり,ごころ,がわり,印 |
29 | 名詞-接尾-人名 | N.Suff.n | Noun(suffix.name) | さん,氏,君 |
30 | 名詞-接尾-地域 | N.Suff.p | Noun(suffix.place) | 駅,区 |
31 | 名詞-接尾-サ変接続 | N.Suff.Vs | Noun(suffix.verbal) | 化,話,分け |
32 | 名詞-接尾-助動詞語幹 | N.Suff.Aux | Noun(suffix.auxiliary) | そ,そう |
33 | 名詞-接尾-形容動詞語幹 | N.Suff.Ana | Noun(suffix.adjective -na) | がち,好き,同然,薄(うす),気(げ),的 |
34 | 名詞-接尾-副詞可能 | N.Suff.Adv | Noun(suffix.adverb) | いっぱい,ころ,時 |
35 | 名詞-接尾-助数詞 | N.Suff.msr | Noun(suffix.measure) | 人,条,ミリバール |
36 | 名詞-接尾-特殊 | N.Suff.spec | Noun(suffix.specific) | 方,たて |
37 | 名詞-接続詞的 | N.Conj | Noun(conjunction) | 兼,対,VS |
38 | 名詞-動詞非自立的 | N.V.bnd | Noun(verbal.bound) | ごらん,御覧,ご覧,ちょ,ちょうだい,頂戴 |
39 | 名詞-引用文字列 | N.Phr | Noun(phrase) | いわく |
40 | 名詞-ナイ形容詞語幹 | N.nai | Noun(+nai) | 味気,申し訳,まちがい |
41 | 接頭詞 | Pref | Prefix | |
42 | 接頭詞-名詞接続 | Pref.N | Prefix(+noun) | いま,ふた¸まっ無,両,好(こう)… |
43 | 接頭詞-動詞接続 | Pref.V | Prefix(+verb) | 引き,御 |
44 | 接頭詞-形容詞接続 | Pref.Ai | Prefix(+adjective -i) | お,バカ, 超,真っ… |
45 | 接頭詞-数接続 | Pref.Num | Prefix(+numeral) | No.,およそ,総, 約 |
46 | 動詞 | V | Verb | |
47 | 動詞-自立 | V.free | Verb(free) | つける,書く… |
48 | 動詞-非自立 | V.bnd | Verb(bound) | 始める,もらう,願える,らっしゃる… |
49 | 動詞-接尾 | V.Suff | Verb(suffix) | がかる,がる,さす,させる,しめる,す,せる,られる,れる |
50 | 形容詞 | Ai | Adjective –i | |
51 | 形容詞-自立 | Ai.free | Adjective –i(free) | 近い, 苦い |
52 | 形容詞-非自立 | Ai.bnd | Adjective -i(bound) | イイ,いい,難い,づらい,にくい,欲しい,やすい,良い,よい |
53 | 形容詞-接尾 | Ai.Suff | Adjective -i(suffix) | くさい,臭い(くさい),たらしい,ったらしい,っぽい,深い(ぶかい),ぽい |
54 | 副詞 | Adv | Adverb | |
55 | 副詞-一般 | Adv.g | Adverb(general) | 力一杯,余り,要は,由来 |
56 | 副詞-助詞類接続 | Adv.P | Adverb(+particle) | 余りに,当然,度々,あんなに |
57 | 連体詞 | Adn | Adnominal | あの,あんな,おなじ,おおきな |
58 | 接続詞 | Conj | Conjunction | 例えば,それなのに,したら,あるいは,おなじく |
59 | 助詞 | P | Particle | |
60 | 助詞-格助詞 | P.c | Particle(case) | |
61 | 助詞-格助詞-一般 | P.c.g | Particle(case.general) | から,が,で,と,に,にて,の,へ,より,を,ん |
62 | 助詞-格助詞-引用 | P.c.r | Particle(case.reported) | と, っと |
63 | 助詞-格助詞-連語 | P.c.Phr | Particle(case.phrase) | という,って,について,をもって, に対して… |
64 | 助詞-接続助詞 | P.Conj | Particle(conjunction) | および,けれども,が |
65 | 助詞-係助詞 | P.bind | Particle(binding) | こそ,さえ,しか,すら,ぞ,は,も,や |
66 | 助詞-副助詞 | P.Adv | partice(adverbial) | だけ,ばかり |
67 | 助詞-間投助詞 | P.ind | Particle(indirect) | (松島)や |
68 | 助詞-並立助詞 | P.coord | Particle(coordinate) | たり,だの,だり,と,とか,なり,や,やら |
69 | 助詞-終助詞 | P.fin | Particle(sentencefinal) | かしら,さ,なあ |
70 | 助詞-副助詞/並立助詞/終助詞 | P.advcoordfin | Particle(adverbial/coordinate/sentencefinal) | か |
71 | 助詞-連体化 | P.prenom | Particle(pronominal) | の |
72 | 助詞-副詞化 | P.advzer | Particle(adverbializer) | と,に |
73 | 助詞-特殊 | P.spec | Particle(special) | かな,けむ,に,にゃ,ん |
74 | 助動詞 | Aux | Auxiliary | まい,たり,たい,っす, じゃん |
75 | 感動詞 | Interj | Phrase/Interjection | ご苦労さま |
76 | 記号 | Sym | Symbol | ? : ; ※ A B c |
77 | 記号-一般 | Sym.g | Symbol(general) | |
78 | 記号-句点 | Sym.p | Symbol(period) | |
79 | 記号-読点 | Sym.c | Symbol(dotincenter) | |
80 | 記号-空白 | Sym.w | Symbol(whitespace) | |
81 | 記号-アルファベット | Sym.a | Symbol(alphabet) | |
82 | 記号-括弧開 | Sym.bo | Symbol(bracketopen) | |
83 | 記号-括弧閉 | Sym.bc | Symbol(bracketclose) | |
84 | その他 | Other | Other | |
85 | その他-間投 | Other.indir | Other(indirect) | |
86 | フィラー | Fill | Filler | あ,うん,そうですね,まあ,あの,なんか |
87 | 非言語音 | Nss | Nonspeechsound | |
88 | 語断片 | Frgm | Fragment | |
89 | 未知語 | Unknown | Unknown |
Kristina Hmeljak Sangawa's work was supported by a Japanese-Language Education Fellowship of the Japan Foundation from March to July 2005, and the work of Tomaž Erjavec by a JSPS scholarship in November 2008. The development of the Slovenian-Japanese dictionary sloJa was supported by CLARIN.SI.