Grounding sloWNet on Slovene corpus data

  • Darja Fišer Department of Translation, Faculty of Arts, University of Ljubljana
  • Maciej Piasecki Department of Artificial Intelligence, Institute of Informatics, Wroclaw University of Technology
  • Bartosz Broda Department of Artificial Intelligence, Institute of Informatics, Wroclaw University of Technology
Keywords: lexical semantics, wordnet, semantic similarity, semantic relations

Abstract

Wordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is why in this paper we test a language-motivated approach that uses linguistically annotated corpus data and basic statistical methods to extract lists of semantically similar words that are then incorporated into the wordnet for Slovene. The approach was originally developed for Polish but because the algorithm itself is language-independent and can use minimally annotated corpus resources in any language, it is also attractive for other languages that are still lacking an extensive wordnet or a similar semantic lexicon. An important advantage of the approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network.

Downloads

Download data is not yet available.

References

Fišer, D., Piasecki, M., Broda, B. (2013): Grounding sloWNet on Slovene corpus data. Slovenščina 2.0, letnik (številka): 82–112.
Published
2013-12-01
How to Cite
FišerD., PiaseckiM., & BrodaB. (2013). Grounding sloWNet on Slovene corpus data. Slovenščina 2.0: Empirical, Applied and Interdisciplinary Research, 1(2), 82-112. https://doi.org/10.4312/slo2.0.2013.2.82-112