Coreference Resolution for Slovene on Annotated Data from coref149
Keywords:coreference resolution, Slovene, ssj500k, coref149, SkipCor algorithm
Coreference resolution is one of the three main tasks of the information extraction from text. Its goal is to classify all mentions of entities in a text discourse into groups where each group would represent a separate entity. Coreference resolution methods for larger languages are being developed for quite some time, while none has been proposed for the Slovene language yet. In this paper we present a new manually annotated Slovene corpus for coreference resolution – coref149. We adapt our english-based automatic coreference resolution system SkipCor to the Slovene language and achieve 76% CoNLL 2012 score. We analyse the influences of developed feature functions and check types of the most frequent errors. During the text analysis we have also developed a software library with a web interface, which offers to run all the analysis we describe in this paper and to browse their predictions. The resuls are promising and comparable to the results of coreference analysis for other larger languages. We show that it is possible to implement algorithms for automatic coreference resolution for the Slovene language. Therefore we propose to prepare a larger and better quality corpus featuring all the specifics of the language, which would enable the implementation of generally useful methods for coreference resolution.
How to Cite
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All content of Slovenščina 2.0 is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Slovenščina 2.0 applies the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license to all published material. Under this license, authors retain ownership of the copyright for their content, but allow anyone to download, reuse, reprint, modify, distribute, copy, remix, transform and/or build upon the content for any purpose, even commercial, as long as the original authors and source are cited. No permission is required from the authors or the publishers. Appropriate attribution can be provided by simply citing the original article. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. For any reuse or redistribution of a work, users must also make clear the license terms under which the work was published.
No separate publishing agreements are signed between the author and the publisher. Authors retain copyright and the publishing rights of their work without any restrictions.
Authors are permitted and encouraged to post the journal’s published version of the work online (e.g., in institutional repositories, on their own websites), with an acknowledgement of its initial publication in Slovenščina 2.0.