(v slovenščini)

jaSlo: Japanese-Slovene resources for language learning

This page offers resources for learning Japanese as a second or foreign langugage, which have been produced in cooperation between the Dept. of Asian Studies at the University of Ljubljana and the Dept. of Knowledge Technologies at the Jožef Stefan Institute:

  1. Japanese-Slovene Learner's Dictionary jaSlo
  2. Reading Tutor with jaSlo dictionary
  3. Corpora and concordances:
    1. jaSlo parallel corpus
    2. jpWaC-L web corpus
    3. Corpus annotation
  4. References

1. Japanese-Slovene Learner's Dictionary jaSlo

through interface in: English 日本語 Slovene

The jaSlo dictionary contains almost 10.000 entries, many of them linked to corpus examples. More infomation about the dictionary can be found it in its TEI header.

2. Reading Tutor with jaSlo dictionary

Reading Tutor

Reading Tutor, produced by prof. Yoshiko Kawamura and her team at Tokyo International University, is a system for Japanese language learning, where the text pasted into a window is analysed, and the program returns the words found in the dictionary. Several dictionaries are available, among them also the jaSlo dictionary.

3. Corpora and concordances

3.1 The jaSlo parallel corpus

The Japanese-Slovene parallel corpus jaSlo is sentence aligned and contains about half a million words per language from 130 sources. It is available via the noSketch Engine @ nl.ijs.si.

3.2 The jpWaC-L web corpus

The corpus is available for exploration via the noSketch Engine @ nl.ijs.si as well as a downloadable dataset from the CLARIN.SI repository via the permanent URL http://hdl.handle.net/11356/1047.

The Japanese web corpus with difficulty levels jpWaC-L contains over 300 million words, with words and sentences annotated with their difficulty level. The corpus is further split into 5 subcorpora, each for one difficulty level, from 4 (easiest) to 0 (hardest).

The difficulty level of the words comes from a lexicon provided by prof. Yoshiko Kawamura, Tokyo International University. Words are assigned difficulty levels according to the Japanese Language Proficiency Test Content Specifications (Revised Edition), Japan Foundation & Association of International Education Japan. Tokyo: Bonjinsha 2004. The difficulty level of the sentences is computed using various heuristics, based on the (difficulty level of) words, sentence length, etc. (c.f. Hmeljak et al. 2010)

The corpus was collected from the Web using WaCkY tools and then processed by Chasen.

3.3. Corpus annotation

The corpora were part-of-speech tagged and lemmatised with ChaSen. The ChaSen tags (originally written in Japanese) have been also conveted to English language based tags. In the concordancers the ChaSen-jp tags (i.e. those written in Japanese) are stored in the positional attribute ctag ChaSen-en in the attribute tag. Here you can see all the defined corpus annotations and set which ones are shown, on the jpWaC-L0 (very difficult sentences) in noSkE. The table below gives the mapping (it is also available as a tabular file):

nChaSen-jp
ctag
ChaSen-en
tag
ExpandedExamples
1名詞NNoun
2名詞-一般N.gNoun(general)ソナタ,年上,耳,好き
3名詞-固有名詞N.PropNoun(proper)
4名詞-固有名詞-一般N.Prop.gNoun(proper.general)イスラム教,光が丘…
5名詞-固有名詞-人名N.Prop.nNoun(proper.name)
6名詞-固有名詞-人名-一般N.Prop.n.gNoun(proper.name.general)お市の方(おいちのかた),太安万侶(おおのやすまろ)
7名詞-固有名詞-人名-姓N.Prop.n.sNoun(proper.name.surname)山田
8名詞-固有名詞-人名-名N.Prop.n.fNoun(proper.name.firstname)紀子,ひろし
9名詞-固有名詞-組織N.Prop.oNoun(proper.organization)NHK, 愛知銀行,パレスホテル…
10名詞-固有名詞-地域N.Prop.pNoun(proper.place)
11名詞-固有名詞-地域-一般N.Prop.p.gNoun(proper.place.general)京都,アジア
12名詞-固有名詞-地域-国N.Prop.p.cNoun(proper.place.country)日本,オーストリア
13名詞-代名詞N.PronNoun(pronoun)
14名詞-代名詞-一般N.Pron.gNoun(pronoun.general)私,誰,奴ら,あそこ,あちこち,それ
15名詞-代名詞-縮約N.Pron.shNoun(pronoun.shorten)あたしゃ,そりゃ,そりゃあ,私しゃ
16名詞-副詞可能N.AdvNoun(adverbal)いつか,あまり,9月,いちばん,きのう,この先
17名詞-サ変接続N.VsNoun(verbal)見学する,我慢する…
18名詞-形容動詞語幹N.AnaNoun(adjective -na)あいまい,安全,黄色,気の毒,気がかり,楽天的
19名詞-数N.NumNoun(numeral)何,四,億,1
20名詞-非自立N.bndNoun(bound)
21名詞-非自立-一般N.bnd.gNoun(bound.general)作,きらい,ため,どころ,こと
22名詞-非自立-副詞可能N.bnd.AdvNoun(bound.adverb)っきり,折り,うち,あいだ,あたり,あまり
23名詞-非自立-助動詞語幹N.bnd.AuxNoun(bound.auxiliary)よう,様, やう,よ
24名詞-非自立-形容動詞語幹N.bnd.AnaNoun(bound.adjective -na)みたい,ふう
25名詞-特殊-N.specNoun(special)
26名詞-特殊-助動詞語幹N.spec.AuxNoun(special.auxiliary)そ,そう
27名詞-接尾N.SuffNoun(suffix)
28名詞-接尾-一般N.Suff.gNoun(suffix.general)OFF,あまり,ごころ,がわり,印
29名詞-接尾-人名N.Suff.nNoun(suffix.name)さん,氏,君
30名詞-接尾-地域N.Suff.pNoun(suffix.place)駅,区
31名詞-接尾-サ変接続N.Suff.VsNoun(suffix.verbal)化,話,分け
32名詞-接尾-助動詞語幹N.Suff.AuxNoun(suffix.auxiliary)そ,そう
33名詞-接尾-形容動詞語幹N.Suff.AnaNoun(suffix.adjective -na)がち,好き,同然,薄(うす),気(げ),的
34名詞-接尾-副詞可能N.Suff.AdvNoun(suffix.adverb)いっぱい,ころ,時
35名詞-接尾-助数詞N.Suff.msrNoun(suffix.measure)人,条,ミリバール
36名詞-接尾-特殊N.Suff.specNoun(suffix.specific)方,たて
37名詞-接続詞的N.ConjNoun(conjunction)兼,対,VS
38名詞-動詞非自立的N.V.bndNoun(verbal.bound)ごらん,御覧,ご覧,ちょ,ちょうだい,頂戴
39名詞-引用文字列N.PhrNoun(phrase)いわく
40名詞-ナイ形容詞語幹N.naiNoun(+nai)味気,申し訳,まちがい
41接頭詞PrefPrefix
42接頭詞-名詞接続Pref.NPrefix(+noun)いま,ふた¸まっ無,両,好(こう)…
43接頭詞-動詞接続Pref.VPrefix(+verb)引き,御
44接頭詞-形容詞接続Pref.AiPrefix(+adjective -i)お,バカ, 超,真っ…
45接頭詞-数接続Pref.NumPrefix(+numeral)No.,およそ,総, 約
46動詞VVerb
47動詞-自立V.freeVerb(free)つける,書く…
48動詞-非自立V.bndVerb(bound)始める,もらう,願える,らっしゃる…
49動詞-接尾V.SuffVerb(suffix)がかる,がる,さす,させる,しめる,す,せる,られる,れる
50形容詞AiAdjective –i
51形容詞-自立Ai.freeAdjective –i(free)近い, 苦い
52形容詞-非自立Ai.bndAdjective -i(bound)イイ,いい,難い,づらい,にくい,欲しい,やすい,良い,よい
53形容詞-接尾Ai.SuffAdjective -i(suffix)くさい,臭い(くさい),たらしい,ったらしい,っぽい,深い(ぶかい),ぽい
54副詞AdvAdverb
55副詞-一般Adv.gAdverb(general)力一杯,余り,要は,由来
56副詞-助詞類接続Adv.PAdverb(+particle)余りに,当然,度々,あんなに
57連体詞AdnAdnominal あの,あんな,おなじ,おおきな
58接続詞ConjConjunction例えば,それなのに,したら,あるいは,おなじく
59助詞PParticle
60助詞-格助詞P.cParticle(case)
61助詞-格助詞-一般P.c.gParticle(case.general)から,が,で,と,に,にて,の,へ,より,を,ん
62助詞-格助詞-引用P.c.rParticle(case.reported)と, っと
63助詞-格助詞-連語P.c.PhrParticle(case.phrase)という,って,について,をもって, に対して…
64助詞-接続助詞P.ConjParticle(conjunction)および,けれども,が
65助詞-係助詞P.bindParticle(binding)こそ,さえ,しか,すら,ぞ,は,も,や
66助詞-副助詞P.Advpartice(adverbial)だけ,ばかり
67助詞-間投助詞P.indParticle(indirect)(松島)や
68助詞-並立助詞P.coordParticle(coordinate)たり,だの,だり,と,とか,なり,や,やら
69助詞-終助詞P.finParticle(sentencefinal)かしら,さ,なあ
70助詞-副助詞/並立助詞/終助詞P.advcoordfinParticle(adverbial/coordinate/sentencefinal)
71助詞-連体化P.prenomParticle(pronominal)
72助詞-副詞化P.advzerParticle(adverbializer)と,に
73助詞-特殊P.specParticle(special)かな,けむ,に,にゃ,ん
74助動詞AuxAuxiliaryまい,たり,たい,っす, じゃん
75感動詞InterjPhrase/Interjectionご苦労さま
76記号SymSymbol? : ; ※ A B c
77記号-一般Sym.gSymbol(general)
78記号-句点Sym.pSymbol(period)
79記号-読点Sym.cSymbol(dotincenter)
80記号-空白Sym.wSymbol(whitespace)
81記号-アルファベットSym.aSymbol(alphabet)
82記号-括弧開Sym.boSymbol(bracketopen)
83記号-括弧閉Sym.bcSymbol(bracketclose)
84その他OtherOther
85その他-間投Other.indirOther(indirect)
86フィラーFillFillerあ,うん,そうですね,まあ,あの,なんか
87非言語音NssNonspeechsound
88語断片FrgmFragment
89未知語UnknownUnknown

References

  1. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. The Japanese-Slovene dictionary jaSlo: its developments, enhancement and use. Studia Kognitiva, 2010, no. 10, pp. 211-224. [PDF]
  2. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž, KAWAMURA, Yoshiko. Automated Collection of Japanese Word Usage Examples from a Parallel and a Monolingual Corpus. V: eLexicography in the 21st century : new challenges, new applications: proceedings of eLex 2009, Louvain-la-Neuve, 22 - 24 October 2009, (Cahiers du Cental, 7). Louvain: Presses Universitaires de Louvain, 2010, pp. 137-147. [PDF]
  3. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. A low cost approach to building a Japanese-Slovene parallel corpus. 電子情報通信学会技術研究報告 - IEICE Technical Report (Denshi Jōhō Tsūshin Gakkai gijutsu kenkyū hōkoku), 2008, vol. 108, no. 50, pp. 7-10.
  4. KAWAMURA, Yoshiko, HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. Web kôpasu o katsuyô shita reberubetsu reibun kensaku shisutemu no kaihatsu to hyôka. V: OKOMURA, Minako (ur.), MIWA, Sei (ur.). Dai 14kai Yôroppa Nihongo kyôiku shimpojiumu hôkoku-happyô rombunshû, (Yôroppa Nihongo kyôiku, 14). [Berlin]: Yôroppa nihongo kyôshikai: =Association of Japanese Language Teachers in Europe, 2009, pp. 231-238
  5. SRDANOVIĆ, Irena, ERJAVEC, Tomaž, KILGARRIFF, Adam. A web corpus and word sketches for Japanese. Shizen gengo shori, 2008, vol. 15, no. 2, pp. 137-159. [PDF]
  6. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. Gakushoushayou nihongo jisho no tame no taizaku reibun kakutoko. V: The Fourteenth Annual Meeting of The Association for Natural Language Processing, 21 March 2008, University of Tokyo. Proceedings of the Workshop on Natural Language Processing for Education. [S. l.]: The Association for Natural Language Processing, 2008, pp. 19-22.
  7. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. Črpanje primerov za japonsko-slovenski slovar iz vzporednega korpusa. V: ERJAVEC, Tomaž (ur.), ŽGANEC GROS, Jerneja (ur.). Zbornik Šeste konference Jezikovne tehnologije, 16. do 17. oktober 2008, : zbornik 11. mednarodne multikonference Informacijska družba - IS 2008, zvezek C : proceedings of the 11th International Multiconference Information Society - IS 2008, volume C, (Informacijska družba). Ljubljana: Institut Jožef Stefan, 2008, pp. 33-36
  8. ERJAVEC, Tomaž, HMELJAK SANGAWA, Kristina, SRDANOVIĆ, Irena. jaSlo, A Japanese-Slovene Learners' Dictionary: Methods for Dictionary Enhancement. In Proceedings of the 12th EURALEX International Congress. Turin, Italy, 2006. [PDF]
  9. ERJAVEC, Tomaž, HMELJAK SANGAWA, Kristina, SRDANOVIĆ, Irena, VAHČIČ, Anton Ml. Making an XML-based Japanese-Slovene Learners' Dictionary. In Proceedings of Fourth International Conference on Language Resources and Evaluation, LREC'04, 26-28 May 2004. Lisbon.
  10. ERJAVEC, Tomaž, HMELJAK SANGAWA, Kristina, SRDANOVIĆ, Irena. XML zapis japonsko-slovenskega slovarja. V Zborniku 12. Mednarodne elektrotehnične in računalniške konference ERK 2003, 25-26 September 2003, Ljubljana, pp. 471-474.
  11. HMELJAK SANGAWA, Kristina. Slovar japonskega jezika za študente japonščine. V Zborniku Konference o jezikovnih tehnologijah, SDJT'02, 14-15 October 2002, Ljubljana: Inštitut Jožef Stefan, 2002, pp. 102-105.

Acknowledgements

Kristina Hmeljak Sangawa's work was supported by a Japanese-Language Education Fellowship of the Japan Foundation from March to July 2005, and the work of Tomaž Erjavec by a JSPS scholarship in November 2008.

Povezave


Last change 2016-05-19, et

Valid HTML 4.01!