Japanese Word Sketches: Advances and Problems

Irena SRDANOVIĆ, Naomi IDA, Chikako SHIGEMORI BUČAR, Adam KILGARRIFF, Vojtěch KOVÁŘ

Abstract


In this paper, we present results of an evaluation of Japanese word sketches and address in detail issues that were observed by the evaluators. A word sketch presents a list of salient collocates of a word, organized by the grammatical relations holding between the word and its collocate. The word sketch functionality is incorporated into the Sketch Engine corpus query system and has been created for more than twenty languages so far, including Japanese. The issues that have been discovered in the evaluation of word sketches in Japanese are to be addressed for further enhancement of the word sketch functionality. Other tools and resources which are combined for use and influence the performance of the word sketches should also be looked over. We divide the issues into the following: 1) the lemmatizer and tagger in use, 2) the sketch grammar that is specifically written for Japanese, and 3) the corpus and statistical methods.


Keywords


word sketches; Japanese collocations; evaluation; corpus; language technologies

Full Text:

PDF

References


Himeno, M. (2004). Nihongo hyoogen katuyoo ziten. Kenkyusha.

Kilgarriff, A. & Rundell, M. (2002). Lexical Profiling Software and its Lexicographic Applications - a Case Study. EURALEX 2002 Proceedings. 807-818.

Kilgarriff, A., Kovář, V., Krek, S., Srdanović, I., Tiberius, C. (2010). A Quantitative Evaluation of Word Sketches. Proceedings of the XIV Euralex International Congress. Leeuwarden : Fryske Academy. 7pp. (available at http://nlp.fi.muni.cz/publications/kilgarriff_xkovar3_etal/kilgarriff_xkovar3_etal.pdf)

Maekawa, K., Yamazaki, M., Maruyama, T., Yamaguchi, M., Ogura, H., Kashino, W., Ogiso, T., Koiso, H., Den, Y. (2010). Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese. Proceedings of LREC 2010, Malta. 1483-1486.

Oxford Collocations Dictionary for Students of English (OCD). (2009). Oxford University Press

Rundell, M, ed. (2002). Macmillan English Dictionary for Advanced Learners. London: Macmillan.

Seeley, C. (1991). A History of Writing in Japan. University of Hawai'i Press, Honolulu. 243pp.

Srdanović, E. I., Erjavec T. & Kilgarriff, A. (2008a). A web corpus and word-sketches for Japanese. Sizen gengo syori (Journal of Natural Language Processing) 15/2. 137-159. (also available at http://www.jstage.jst.go.jp/article/imt/3/3/3_529/_article)

Srdanović, I, Bekeš, A., Nishina, K. (2008b). Distant collocations of adverbs and modality forms observed in various Japanese language corpora. Tokutei ryooiki kenkyuu 'Nihongo koopasu', Tokyo: Monbukagakusyoo kagakukenkyuuhi tokuteiryooiki kenkyuu 'Nihongo koopasu' Sookatu ban (Workshop of the Priority Area Research “Japanese corpus”), Tokio. 223-230.

Srdanović, E.I., Nishina, K. (2008). Koopasu kensaku tuuru Sketch Engine no nihongoban to sono riyoo hoohoo (The Sketch Engine corpus query tool for Japanese and its possible applications), Nihongo kagaku (Japanese Linguistics) 23. 59-80.

Vance, T. J. (1991). Instant vocabulary through prefixes and suffixes. Power Japanese series. Kodansha International. 128pp.




DOI: http://dx.doi.org/10.4312/ala.1.2.63-82

Refbacks

  • There are currently no refbacks.


Copyright (c) 2011 Irena SRDANOVIĆ, Naomi IDA, Chikako SHIGEMORI BUČAR, Adam KILGARRIFF, Vojtěch KOVÁŘ

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Ljubljana University Press, Faculty of Arts
(Znanstvena založba Filozofske fakultete Univerze v Ljubljani) 

Online ISSN: 2232-3317