Grounding sloWNet on Slovene corpus data
AbstractWordnets can be translated from another language or can be built from corpus evidence. The transfer approach is easier and quicker, which is why it has been most widely used. However, it has a big disadvantage that the created resource does not necessarily reflect the language in question. This is why in this paper we test a language-motivated approach that uses linguistically annotated corpus data and basic statistical methods to extract lists of semantically similar words that are then incorporated into the wordnet for Slovene. The approach was originally developed for Polish but because the algorithm itself is language-independent and can use minimally annotated corpus resources in any language, it is also attractive for other languages that are still lacking an extensive wordnet or a similar semantic lexicon. An important advantage of the approach is that it relies on real linguistic evidence harvested from a corpus, yielding a linguistically sound organization of the vocabulary. As all the previous approaches used for the construction of Slovene wordnet were transfer-based and relied on the English Princeton WordNet, the encouraging results obtained in the presented experiment will be a welcome complement to the existing semantic network.
Copyright (c) 2013 Darja Fišer, Maciej Piasecki, Bartosz Broda
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All content of Slovenščina 2.0 is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Slovenščina 2.0 applies the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license to all published material. Under this license, authors retain ownership of the copyright for their content, but allow anyone to download, reuse, reprint, modify, distribute, copy, remix, transform and/or build upon the content for any purpose, even commercial, as long as the original authors and source are cited. No permission is required from the authors or the publishers. Appropriate attribution can be provided by simply citing the original article. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. For any reuse or redistribution of a work, users must also make clear the license terms under which the work was published.
No separate publishing agreements are signed between the author and the publisher. Authors retain copyright and the publishing rights of their work without any restrictions.
Authors are permitted and encouraged to post the journal’s published version of the work online (e.g., in institutional repositories, on their own websites), with an acknowledgement of its initial publication in Slovenščina 2.0.