(v slovenščini)

jaSlo: Japanese-Slovenian resources for language learning

This page offers resources for learning Japanese as a second or foreign langugage, which have been produced in cooperation between the Dept. of Asian Studies at the University of Ljubljana and the Dept. of Knowledge Technologies at the Jožef Stefan Institute:

  1. Japanese-Slovenian Learner's Dictionary jaSlo
  2. Slovenian-Japanese Learner's Dictionary sloJa
  3. Japanese corpora
    1. jaSlo parallel corpus
    2. jpWaC-L web corpus
    3. Corpus annotation
  4. References
  5. Acknowledgements
  6. Further links

Japanese-Slovenian Learner's Dictionary jaSlo

through interface in: English 日本語 Slovenian

The jaSlo dictionary contains almost 10.000 entries, many of them linked to corpus examples. More infomation about the dictionary can be found it in its TEI header.

The complete dictionary can be downloaded in source TEI P5 format from the CLARIN.SI repository via the permanent URL hdl.handle.net/11356/1050.

Slovenian-Japanese Learner's Dictionary sloJa

The Slovenian-Japanese online dictionary for Slovenian speaking learners of Japanese was compiled by extracting and converting data from jaSlo, first automatically and then manually cleaning double or inappropriate entries, labelling the Slovenian headwords with part-of-speech and difficulty tags according to the CEFR scale as available in the Core Vocabulary of Slovenian (http://hdl.handle.net/11356/1697), and manually edited all entries using Lexonomy.

Senses of polysemous words and corresponding translation equivalents were manually glossed with hints on their meaning, in part also with examples, extracted from the jaSlo parallel corpus, and manually adapted for the learner's dictionary. Japanese translational equivalents from different registers were tagged according to their level of politeness and with notes on usage restrictions aimed at dictionary users who are learning Japanese as a foreign language.

The dictionary can be browsed at https://www.lexonomy.eu/#/sloJa, or downloaded from the CLARIN.SI repository with a CC-BY 4.0 licence, at http://hdl.handle.net/11356/1898.

Japanese Corpora

The jaSlo parallel corpus

The Japanese-Slovenian parallel corpus jaSlo is sentence aligned and contains about half a million words per language from 130 sources.

It is available via the CLARIN.SI noSketch Engine:

The jpWaC-L web corpus

The Japanese web corpus with difficulty levels jpWaC-L contains over 300 million words, with words and sentences annotated with their difficulty level. The corpus is further split into 5 subcorpora, each for one difficulty level, from 4 (easiest) to 0 (hardest).

The difficulty level of the words comes from a lexicon provided by prof. Yoshiko Kawamura, Tokyo International University. Words are assigned difficulty levels according to the Japanese Language Proficiency Test Content Specifications (Revised Edition), Japan Foundation & Association of International Education Japan. Tokyo: Bonjinsha 2004. The difficulty level of the sentences is computed using various heuristics, based on the (difficulty level of) words, sentence length, etc. (c.f. Hmeljak et al. 2010)

The corpus was collected from the Web using WaCkY tools and then processed by Chasen.

The corpus is available as a downloadable dataset from the CLARIN.SI repository (http://hdl.handle.net/11356/1047) as well as for exploration via the CLARIN.SI installation of noSketch Engine, also split by difficulty level:

Corpus annotation

The corpora were part-of-speech tagged and lemmatised with ChaSen. The ChaSen tags (originally written in Japanese) have been also translated to English tags. In the concordancers the ChaSen-jp tags (i.e. those written in Japanese) are stored in the positional attribute "ctag" ChaSen-en in the attribute "tag". The table below gives the mapping (it is also available as a tabular file):

nChaSen-jp
ctag
ChaSen-en
tag
ExpandedExamples
1名詞NNoun
2名詞-一般N.gNoun(general)ソナタ,年上,耳,好き
3名詞-固有名詞N.PropNoun(proper)
4名詞-固有名詞-一般N.Prop.gNoun(proper.general)イスラム教,光が丘…
5名詞-固有名詞-人名N.Prop.nNoun(proper.name)
6名詞-固有名詞-人名-一般N.Prop.n.gNoun(proper.name.general)お市の方(おいちのかた),太安万侶(おおのやすまろ)
7名詞-固有名詞-人名-姓N.Prop.n.sNoun(proper.name.surname)山田
8名詞-固有名詞-人名-名N.Prop.n.fNoun(proper.name.firstname)紀子,ひろし
9名詞-固有名詞-組織N.Prop.oNoun(proper.organization)NHK, 愛知銀行,パレスホテル…
10名詞-固有名詞-地域N.Prop.pNoun(proper.place)
11名詞-固有名詞-地域-一般N.Prop.p.gNoun(proper.place.general)京都,アジア
12名詞-固有名詞-地域-国N.Prop.p.cNoun(proper.place.country)日本,オーストリア
13名詞-代名詞N.PronNoun(pronoun)
14名詞-代名詞-一般N.Pron.gNoun(pronoun.general)私,誰,奴ら,あそこ,あちこち,それ
15名詞-代名詞-縮約N.Pron.shNoun(pronoun.shorten)あたしゃ,そりゃ,そりゃあ,私しゃ
16名詞-副詞可能N.AdvNoun(adverbal)いつか,あまり,9月,いちばん,きのう,この先
17名詞-サ変接続N.VsNoun(verbal)見学する,我慢する…
18名詞-形容動詞語幹N.AnaNoun(adjective -na)あいまい,安全,黄色,気の毒,気がかり,楽天的
19名詞-数N.NumNoun(numeral)何,四,億,1
20名詞-非自立N.bndNoun(bound)
21名詞-非自立-一般N.bnd.gNoun(bound.general)作,きらい,ため,どころ,こと
22名詞-非自立-副詞可能N.bnd.AdvNoun(bound.adverb)っきり,折り,うち,あいだ,あたり,あまり
23名詞-非自立-助動詞語幹N.bnd.AuxNoun(bound.auxiliary)よう,様, やう,よ
24名詞-非自立-形容動詞語幹N.bnd.AnaNoun(bound.adjective -na)みたい,ふう
25名詞-特殊-N.specNoun(special)
26名詞-特殊-助動詞語幹N.spec.AuxNoun(special.auxiliary)そ,そう
27名詞-接尾N.SuffNoun(suffix)
28名詞-接尾-一般N.Suff.gNoun(suffix.general)OFF,あまり,ごころ,がわり,印
29名詞-接尾-人名N.Suff.nNoun(suffix.name)さん,氏,君
30名詞-接尾-地域N.Suff.pNoun(suffix.place)駅,区
31名詞-接尾-サ変接続N.Suff.VsNoun(suffix.verbal)化,話,分け
32名詞-接尾-助動詞語幹N.Suff.AuxNoun(suffix.auxiliary)そ,そう
33名詞-接尾-形容動詞語幹N.Suff.AnaNoun(suffix.adjective -na)がち,好き,同然,薄(うす),気(げ),的
34名詞-接尾-副詞可能N.Suff.AdvNoun(suffix.adverb)いっぱい,ころ,時
35名詞-接尾-助数詞N.Suff.msrNoun(suffix.measure)人,条,ミリバール
36名詞-接尾-特殊N.Suff.specNoun(suffix.specific)方,たて
37名詞-接続詞的N.ConjNoun(conjunction)兼,対,VS
38名詞-動詞非自立的N.V.bndNoun(verbal.bound)ごらん,御覧,ご覧,ちょ,ちょうだい,頂戴
39名詞-引用文字列N.PhrNoun(phrase)いわく
40名詞-ナイ形容詞語幹N.naiNoun(+nai)味気,申し訳,まちがい
41接頭詞PrefPrefix
42接頭詞-名詞接続Pref.NPrefix(+noun)いま,ふた¸まっ無,両,好(こう)…
43接頭詞-動詞接続Pref.VPrefix(+verb)引き,御
44接頭詞-形容詞接続Pref.AiPrefix(+adjective -i)お,バカ, 超,真っ…
45接頭詞-数接続Pref.NumPrefix(+numeral)No.,およそ,総, 約
46動詞VVerb
47動詞-自立V.freeVerb(free)つける,書く…
48動詞-非自立V.bndVerb(bound)始める,もらう,願える,らっしゃる…
49動詞-接尾V.SuffVerb(suffix)がかる,がる,さす,させる,しめる,す,せる,られる,れる
50形容詞AiAdjective –i
51形容詞-自立Ai.freeAdjective –i(free)近い, 苦い
52形容詞-非自立Ai.bndAdjective -i(bound)イイ,いい,難い,づらい,にくい,欲しい,やすい,良い,よい
53形容詞-接尾Ai.SuffAdjective -i(suffix)くさい,臭い(くさい),たらしい,ったらしい,っぽい,深い(ぶかい),ぽい
54副詞AdvAdverb
55副詞-一般Adv.gAdverb(general)力一杯,余り,要は,由来
56副詞-助詞類接続Adv.PAdverb(+particle)余りに,当然,度々,あんなに
57連体詞AdnAdnominal あの,あんな,おなじ,おおきな
58接続詞ConjConjunction例えば,それなのに,したら,あるいは,おなじく
59助詞PParticle
60助詞-格助詞P.cParticle(case)
61助詞-格助詞-一般P.c.gParticle(case.general)から,が,で,と,に,にて,の,へ,より,を,ん
62助詞-格助詞-引用P.c.rParticle(case.reported)と, っと
63助詞-格助詞-連語P.c.PhrParticle(case.phrase)という,って,について,をもって, に対して…
64助詞-接続助詞P.ConjParticle(conjunction)および,けれども,が
65助詞-係助詞P.bindParticle(binding)こそ,さえ,しか,すら,ぞ,は,も,や
66助詞-副助詞P.Advpartice(adverbial)だけ,ばかり
67助詞-間投助詞P.indParticle(indirect)(松島)や
68助詞-並立助詞P.coordParticle(coordinate)たり,だの,だり,と,とか,なり,や,やら
69助詞-終助詞P.finParticle(sentencefinal)かしら,さ,なあ
70助詞-副助詞/並立助詞/終助詞P.advcoordfinParticle(adverbial/coordinate/sentencefinal)
71助詞-連体化P.prenomParticle(pronominal)
72助詞-副詞化P.advzerParticle(adverbializer)と,に
73助詞-特殊P.specParticle(special)かな,けむ,に,にゃ,ん
74助動詞AuxAuxiliaryまい,たり,たい,っす, じゃん
75感動詞InterjPhrase/Interjectionご苦労さま
76記号SymSymbol? : ; ※ A B c
77記号-一般Sym.gSymbol(general)
78記号-句点Sym.pSymbol(period)
79記号-読点Sym.cSymbol(dotincenter)
80記号-空白Sym.wSymbol(whitespace)
81記号-アルファベットSym.aSymbol(alphabet)
82記号-括弧開Sym.boSymbol(bracketopen)
83記号-括弧閉Sym.bcSymbol(bracketclose)
84その他OtherOther
85その他-間投Other.indirOther(indirect)
86フィラーFillFillerあ,うん,そうですね,まあ,あの,なんか
87非言語音NssNonspeechsound
88語断片FrgmFragment
89未知語UnknownUnknown

References

  1. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. The Japanese-Slovene dictionary jaSlo: its developments, enhancement and use. Studia Kognitiva, 2010, no. 10, pp. 211-224. [PDF]
  2. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž, KAWAMURA, Yoshiko. Automated Collection of Japanese Word Usage Examples from a Parallel and a Monolingual Corpus. V: eLexicography in the 21st century : new challenges, new applications: proceedings of eLex 2009, Louvain-la-Neuve, 22 - 24 October 2009, (Cahiers du Cental, 7). Louvain: Presses Universitaires de Louvain, 2010, pp. 137-147. [PDF]
  3. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. A low cost approach to building a Japanese-Slovene parallel corpus. 電子情報通信学会技術研究報告 - IEICE Technical Report (Denshi Jōhō Tsūshin Gakkai gijutsu kenkyū hōkoku), 2008, vol. 108, no. 50, pp. 7-10.
  4. KAWAMURA, Yoshiko, HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. Web kôpasu o katsuyô shita reberubetsu reibun kensaku shisutemu no kaihatsu to hyôka. V: OKOMURA, Minako (ur.), MIWA, Sei (ur.). Dai 14kai Yôroppa Nihongo kyôiku shimpojiumu hôkoku-happyô rombunshû, (Yôroppa Nihongo kyôiku, 14). [Berlin]: Yôroppa nihongo kyôshikai: =Association of Japanese Language Teachers in Europe, 2009, pp. 231-238
  5. SRDANOVIĆ, Irena, ERJAVEC, Tomaž, KILGARRIFF, Adam. A web corpus and word sketches for Japanese. Shizen gengo shori, 2008, vol. 15, no. 2, pp. 137-159. [PDF]
  6. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. Gakushoushayou nihongo jisho no tame no taizaku reibun kakutoko. V: The Fourteenth Annual Meeting of The Association for Natural Language Processing, 21 March 2008, University of Tokyo. Proceedings of the Workshop on Natural Language Processing for Education. [S. l.]: The Association for Natural Language Processing, 2008, pp. 19-22.
  7. HMELJAK SANGAWA, Kristina, ERJAVEC, Tomaž. Črpanje primerov za japonsko-slovenski slovar iz vzporednega korpusa. V: ERJAVEC, Tomaž (ur.), ŽGANEC GROS, Jerneja (ur.). Zbornik Šeste konference Jezikovne tehnologije, 16. do 17. oktober 2008, : zbornik 11. mednarodne multikonference Informacijska družba - IS 2008, zvezek C : proceedings of the 11th International Multiconference Information Society - IS 2008, volume C, (Informacijska družba). Ljubljana: Institut Jožef Stefan, 2008, pp. 33-36
  8. ERJAVEC, Tomaž, HMELJAK SANGAWA, Kristina, SRDANOVIĆ, Irena. jaSlo, A Japanese-Slovene Learners' Dictionary: Methods for Dictionary Enhancement. In Proceedings of the 12th EURALEX International Congress. Turin, Italy, 2006. [PDF]
  9. ERJAVEC, Tomaž, HMELJAK SANGAWA, Kristina, SRDANOVIĆ, Irena, VAHČIČ, Anton Ml. Making an XML-based Japanese-Slovene Learners' Dictionary. In Proceedings of Fourth International Conference on Language Resources and Evaluation, LREC'04, 26-28 May 2004. Lisbon.
  10. ERJAVEC, Tomaž, HMELJAK SANGAWA, Kristina, SRDANOVIĆ, Irena. XML zapis japonsko-slovenskega slovarja. V Zborniku 12. Mednarodne elektrotehnične in računalniške konference ERK 2003, 25-26 September 2003, Ljubljana, pp. 471-474.
  11. HMELJAK SANGAWA, Kristina. Slovar japonskega jezika za študente japonščine. V Zborniku Konference o jezikovnih tehnologijah, SDJT'02, 14-15 October 2002, Ljubljana: Inštitut Jožef Stefan, 2002, pp. 102-105.

Acknowledgements

Kristina Hmeljak Sangawa's work was supported by a Japanese-Language Education Fellowship of the Japan Foundation from March to July 2005, and the work of Tomaž Erjavec by a JSPS scholarship in November 2008. The development of the Slovenian-Japanese dictionary sloJa was supported by CLARIN.SI.

Further links


Last change 2023-11-16, et