Japanese Word Sketches: Advances and Problems
Keywords:word sketches, Japanese collocations, evaluation, corpus, language technologies
In this paper, we present results of an evaluation of Japanese word sketches and address in detail issues that were observed by the evaluators. A word sketch presents a list of salient collocates of a word, organized by the grammatical relations holding between the word and its collocate. The word sketch functionality is incorporated into the Sketch Engine corpus query system and has been created for more than twenty languages so far, including Japanese. The issues that have been discovered in the evaluation of word sketches in Japanese are to be addressed for further enhancement of the word sketch functionality. Other tools and resources which are combined for use and influence the performance of the word sketches should also be looked over. We divide the issues into the following: 1) the lemmatizer and tagger in use, 2) the sketch grammar that is specifically written for Japanese, and 3) the corpus and statistical methods.
Kilgarriff, A. & Rundell, M. (2002). Lexical Profiling Software and its Lexicographic Applications - a Case Study. EURALEX 2002 Proceedings. 807-818.
Kilgarriff, A., Kovář, V., Krek, S., Srdanović, I., Tiberius, C. (2010). A Quantitative Evaluation of Word Sketches. Proceedings of the XIV Euralex International Congress. Leeuwarden : Fryske Academy. 7pp. (available at http://nlp.fi.muni.cz/publications/kilgarriff_xkovar3_etal/kilgarriff_xkovar3_etal.pdf)
Maekawa, K., Yamazaki, M., Maruyama, T., Yamaguchi, M., Ogura, H., Kashino, W., Ogiso, T., Koiso, H., Den, Y. (2010). Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese. Proceedings of LREC 2010, Malta. 1483-1486.
Oxford Collocations Dictionary for Students of English (OCD). (2009). Oxford University Press
Rundell, M, ed. (2002). Macmillan English Dictionary for Advanced Learners. London: Macmillan.
Seeley, C. (1991). A History of Writing in Japan. University of Hawai'i Press, Honolulu. 243pp.
Srdanović, E. I., Erjavec T. & Kilgarriff, A. (2008a). A web corpus and word-sketches for Japanese. Sizen gengo syori (Journal of Natural Language Processing) 15/2. 137-159. (also available at http://www.jstage.jst.go.jp/article/imt/3/3/3_529/_article)
Srdanović, I, Bekeš, A., Nishina, K. (2008b). Distant collocations of adverbs and modality forms observed in various Japanese language corpora. Tokutei ryooiki kenkyuu 'Nihongo koopasu', Tokyo: Monbukagakusyoo kagakukenkyuuhi tokuteiryooiki kenkyuu 'Nihongo koopasu' Sookatu ban (Workshop of the Priority Area Research “Japanese corpus”), Tokio. 223-230.
Srdanović, E.I., Nishina, K. (2008). Koopasu kensaku tuuru Sketch Engine no nihongoban to sono riyoo hoohoo (The Sketch Engine corpus query tool for Japanese and its possible applications), Nihongo kagaku (Japanese Linguistics) 23. 59-80.
Vance, T. J. (1991). Instant vocabulary through prefixes and suffixes. Power Japanese series. Kodansha International. 128pp.
How to Cite
Copyright (c) 2011 Irena SRDANOVIĆ, Naomi IDA, Chikako SHIGEMORI BUČAR, Adam KILGARRIFF, Vojtěch KOVÁŘ
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors are confirming that they are the authors of the submitting article, which will be published online in journal Acta Linguistica Asiatica by Ljubljana University Press, Faculty of Arts (University of Ljubljana, Faculty of Arts, Aškerčeva 2, 1000 Ljubljana, Slovenia). Author’s name will be evident in the article in journal. All decisions regarding layout and distribution of the work are in hands of the publisher.
- Authors guarantee that the work is their own original creation and does not infringe any statutory or common-law copyright or any proprietary right of any third party. In case of claims by third parties, authors commit their self to defend the interests of the publisher, and shall cover any potential costs.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.