This page offers resources for learning Japanese as a second or foreign langugage, which have been produced in cooperation between the Dept. of Asian Studies at the University of Ljubljana and the Dept. of Knowledge Technologies at the Jožef Stefan Institute:
The jaSlo dictionary contains almost 10.000 entries, many of them linked to corpus examples. More infomation about the dictionary can be found it in its TEI header.
The corpus is available for exploration via the noSketch Engine @ CLARIN.SI as well as a downloadable dataset from the CLARIN.SI repository via the permanent URL http://hdl.handle.net/11356/1047.
The Japanese web corpus with difficulty levels jpWaC-L contains over 300 million words, with words and sentences annotated with their difficulty level. The corpus is further split into 5 subcorpora, each for one difficulty level, from 4 (easiest) to 0 (hardest).
The difficulty level of the words comes from a lexicon provided by prof. Yoshiko Kawamura, Tokyo International University. Words are assigned difficulty levels according to the Japanese Language Proficiency Test Content Specifications (Revised Edition), Japan Foundation & Association of International Education Japan. Tokyo: Bonjinsha 2004. The difficulty level of the sentences is computed using various heuristics, based on the (difficulty level of) words, sentence length, etc. (c.f. Hmeljak et al. 2010)
The corpus was collected from the Web using WaCkY tools and then processed by Chasen.
The corpora were part-of-speech tagged and lemmatised with ChaSen. The ChaSen tags (originally written in Japanese) have been also translated to English tags. In the concordancers the ChaSen-jp tags (i.e. those written in Japanese) are stored in the positional attribute "ctag" ChaSen-en in the attribute "tag". Here you can see all the defined corpus annotations in noSkE. The table below gives the mapping (it is also available as a tabular file):
n | ChaSen-jp ctag | ChaSen-en tag | Expanded | Examples |
---|---|---|---|---|
1 | 名詞 | N | Noun | |
2 | 名詞-一般 | N.g | Noun(general) | ソナタ,年上,耳,好き |
3 | 名詞-固有名詞 | N.Prop | Noun(proper) | |
4 | 名詞-固有名詞-一般 | N.Prop.g | Noun(proper.general) | イスラム教,光が丘… |
5 | 名詞-固有名詞-人名 | N.Prop.n | Noun(proper.name) | |
6 | 名詞-固有名詞-人名-一般 | N.Prop.n.g | Noun(proper.name.general) | お市の方(おいちのかた),太安万侶(おおのやすまろ) |
7 | 名詞-固有名詞-人名-姓 | N.Prop.n.s | Noun(proper.name.surname) | 山田 |
8 | 名詞-固有名詞-人名-名 | N.Prop.n.f | Noun(proper.name.firstname) | 紀子,ひろし |
9 | 名詞-固有名詞-組織 | N.Prop.o | Noun(proper.organization) | NHK, 愛知銀行,パレスホテル… |
10 | 名詞-固有名詞-地域 | N.Prop.p | Noun(proper.place) | |
11 | 名詞-固有名詞-地域-一般 | N.Prop.p.g | Noun(proper.place.general) | 京都,アジア |
12 | 名詞-固有名詞-地域-国 | N.Prop.p.c | Noun(proper.place.country) | 日本,オーストリア |
13 | 名詞-代名詞 | N.Pron | Noun(pronoun) | |
14 | 名詞-代名詞-一般 | N.Pron.g | Noun(pronoun.general) | 私,誰,奴ら,あそこ,あちこち,それ |
15 | 名詞-代名詞-縮約 | N.Pron.sh | Noun(pronoun.shorten) | あたしゃ,そりゃ,そりゃあ,私しゃ |
16 | 名詞-副詞可能 | N.Adv | Noun(adverbal) | いつか,あまり,9月,いちばん,きのう,この先 |
17 | 名詞-サ変接続 | N.Vs | Noun(verbal) | 見学する,我慢する… |
18 | 名詞-形容動詞語幹 | N.Ana | Noun(adjective -na) | あいまい,安全,黄色,気の毒,気がかり,楽天的 |
19 | 名詞-数 | N.Num | Noun(numeral) | 何,四,億,1 |
20 | 名詞-非自立 | N.bnd | Noun(bound) | |
21 | 名詞-非自立-一般 | N.bnd.g | Noun(bound.general) | 作,きらい,ため,どころ,こと |
22 | 名詞-非自立-副詞可能 | N.bnd.Adv | Noun(bound.adverb) | っきり,折り,うち,あいだ,あたり,あまり |
23 | 名詞-非自立-助動詞語幹 | N.bnd.Aux | Noun(bound.auxiliary) | よう,様, やう,よ |
24 | 名詞-非自立-形容動詞語幹 | N.bnd.Ana | Noun(bound.adjective -na) | みたい,ふう |
25 | 名詞-特殊- | N.spec | Noun(special) | |
26 | 名詞-特殊-助動詞語幹 | N.spec.Aux | Noun(special.auxiliary) | そ,そう |
27 | 名詞-接尾 | N.Suff | Noun(suffix) | |
28 | 名詞-接尾-一般 | N.Suff.g | Noun(suffix.general) | OFF,あまり,ごころ,がわり,印 |
29 | 名詞-接尾-人名 | N.Suff.n | Noun(suffix.name) | さん,氏,君 |
30 | 名詞-接尾-地域 | N.Suff.p | Noun(suffix.place) | 駅,区 |
31 | 名詞-接尾-サ変接続 | N.Suff.Vs | Noun(suffix.verbal) | 化,話,分け |
32 | 名詞-接尾-助動詞語幹 | N.Suff.Aux | Noun(suffix.auxiliary) | そ,そう |
33 | 名詞-接尾-形容動詞語幹 | N.Suff.Ana | Noun(suffix.adjective -na) | がち,好き,同然,薄(うす),気(げ),的 |
34 | 名詞-接尾-副詞可能 | N.Suff.Adv | Noun(suffix.adverb) | いっぱい,ころ,時 |
35 | 名詞-接尾-助数詞 | N.Suff.msr | Noun(suffix.measure) | 人,条,ミリバール |
36 | 名詞-接尾-特殊 | N.Suff.spec | Noun(suffix.specific) | 方,たて |
37 | 名詞-接続詞的 | N.Conj | Noun(conjunction) | 兼,対,VS |
38 | 名詞-動詞非自立的 | N.V.bnd | Noun(verbal.bound) | ごらん,御覧,ご覧,ちょ,ちょうだい,頂戴 |
39 | 名詞-引用文字列 | N.Phr | Noun(phrase) | いわく |
40 | 名詞-ナイ形容詞語幹 | N.nai | Noun(+nai) | 味気,申し訳,まちがい |
41 | 接頭詞 | Pref | Prefix | |
42 | 接頭詞-名詞接続 | Pref.N | Prefix(+noun) | いま,ふた¸まっ無,両,好(こう)… |
43 | 接頭詞-動詞接続 | Pref.V | Prefix(+verb) | 引き,御 |
44 | 接頭詞-形容詞接続 | Pref.Ai | Prefix(+adjective -i) | お,バカ, 超,真っ… |
45 | 接頭詞-数接続 | Pref.Num | Prefix(+numeral) | No.,およそ,総, 約 |
46 | 動詞 | V | Verb | |
47 | 動詞-自立 | V.free | Verb(free) | つける,書く… |
48 | 動詞-非自立 | V.bnd | Verb(bound) | 始める,もらう,願える,らっしゃる… |
49 | 動詞-接尾 | V.Suff | Verb(suffix) | がかる,がる,さす,させる,しめる,す,せる,られる,れる |
50 | 形容詞 | Ai | Adjective –i | |
51 | 形容詞-自立 | Ai.free | Adjective –i(free) | 近い, 苦い |
52 | 形容詞-非自立 | Ai.bnd | Adjective -i(bound) | イイ,いい,難い,づらい,にくい,欲しい,やすい,良い,よい |
53 | 形容詞-接尾 | Ai.Suff | Adjective -i(suffix) | くさい,臭い(くさい),たらしい,ったらしい,っぽい,深い(ぶかい),ぽい |
54 | 副詞 | Adv | Adverb | |
55 | 副詞-一般 | Adv.g | Adverb(general) | 力一杯,余り,要は,由来 |
56 | 副詞-助詞類接続 | Adv.P | Adverb(+particle) | 余りに,当然,度々,あんなに |
57 | 連体詞 | Adn | Adnominal | あの,あんな,おなじ,おおきな |
58 | 接続詞 | Conj | Conjunction | 例えば,それなのに,したら,あるいは,おなじく |
59 | 助詞 | P | Particle | |
60 | 助詞-格助詞 | P.c | Particle(case) | |
61 | 助詞-格助詞-一般 | P.c.g | Particle(case.general) | から,が,で,と,に,にて,の,へ,より,を,ん |
62 | 助詞-格助詞-引用 | P.c.r | Particle(case.reported) | と, っと |
63 | 助詞-格助詞-連語 | P.c.Phr | Particle(case.phrase) | という,って,について,をもって, に対して… |
64 | 助詞-接続助詞 | P.Conj | Particle(conjunction) | および,けれども,が |
65 | 助詞-係助詞 | P.bind | Particle(binding) | こそ,さえ,しか,すら,ぞ,は,も,や |
66 | 助詞-副助詞 | P.Adv | partice(adverbial) | だけ,ばかり |
67 | 助詞-間投助詞 | P.ind | Particle(indirect) | (松島)や |
68 | 助詞-並立助詞 | P.coord | Particle(coordinate) | たり,だの,だり,と,とか,なり,や,やら |
69 | 助詞-終助詞 | P.fin | Particle(sentencefinal) | かしら,さ,なあ |
70 | 助詞-副助詞/並立助詞/終助詞 | P.advcoordfin | Particle(adverbial/coordinate/sentencefinal) | か |
71 | 助詞-連体化 | P.prenom | Particle(pronominal) | の |
72 | 助詞-副詞化 | P.advzer | Particle(adverbializer) | と,に |
73 | 助詞-特殊 | P.spec | Particle(special) | かな,けむ,に,にゃ,ん |
74 | 助動詞 | Aux | Auxiliary | まい,たり,たい,っす, じゃん |
75 | 感動詞 | Interj | Phrase/Interjection | ご苦労さま |
76 | 記号 | Sym | Symbol | ? : ; ※ A B c |
77 | 記号-一般 | Sym.g | Symbol(general) | |
78 | 記号-句点 | Sym.p | Symbol(period) | |
79 | 記号-読点 | Sym.c | Symbol(dotincenter) | |
80 | 記号-空白 | Sym.w | Symbol(whitespace) | |
81 | 記号-アルファベット | Sym.a | Symbol(alphabet) | |
82 | 記号-括弧開 | Sym.bo | Symbol(bracketopen) | |
83 | 記号-括弧閉 | Sym.bc | Symbol(bracketclose) | |
84 | その他 | Other | Other | |
85 | その他-間投 | Other.indir | Other(indirect) | |
86 | フィラー | Fill | Filler | あ,うん,そうですね,まあ,あの,なんか |
87 | 非言語音 | Nss | Nonspeechsound | |
88 | 語断片 | Frgm | Fragment | |
89 | 未知語 | Unknown | Unknown |
Kristina Hmeljak Sangawa's work was supported by a Japanese-Language Education Fellowship of the Japan Foundation from March to July 2005, and the work of Tomaž Erjavec by a JSPS scholarship in November 2008.