In the Search of Lexicographically Relevant Collocation: The Example of Grammatical Relations Containing Adverbs
This paper presents the results of the analysis of grammatical relations that focussed of identifying not only collocations relevant for lexicographic purposes, but also problematic areas that need further investigation on both lexicographic and grammatical level. In the initial study, collocation candidates for a wide selection of grammatical relations for a heterogeneous sample of 333 lemmas have been automatically extracted from the Gigafida reference corpus of Slovene. A group of linguists then annotated the relevance of collocation candidates, examining both collocations and their examples of use, and their answers were analysed for agreement. The findings were that relations such as adjective + noun, noun + noun in gerund, and some relations verb + preposition + noun exhibited high agreement and large shares of approved collocation candidates. On the other hand, grammatical relations containing adverbs proved to be among the ones where disagreement or uncertainty of linguists-annotators was the highest. Consequently, it was decided that these adverbial relations should be analysed first as a sample set in testing our bottom-up approach to determining which collocation candidates are lexicographically relevant.
Further analysis has shown that the decision on the relevance of collocation candidates for dictionary purposes needs to be made separately for each relation, and groups of adverbs within it. An example of semantically less relevant group proved to be adverbs functioning as intensifiers or having a semantically less relevant role of a participle. Even more problematic is a group of numeral adverbs (once, twice…) which have different levels of semantic relevance (e.g. četrtič doktorirati 'to receive a PhD for the fourth time' versus stokrat povedati 'to say something a hundred times') and thus cannot be delimited on a group level within a particular grammatical relation.
The data from the analyses described in this paper will enable further detailed analyses, in particular a description of each grammatical relation from the perspective of its collocationality. In addition, bad collocation candidates that are the result of errors in morphosyntactic annotation will enable the improvement of sketch grammar and relatedly the quality of automatic extraction output. Furthermore, we intend to use existing findings in order to improve the results of grammatical relations that have been initially excluded from the automatic extraction procedure due to a high percentage of noise.
Copyright (c) 2019 Eva Pori, Iztok Kosem
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All content of Slovenščina 2.0 is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Slovenščina 2.0 applies the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license to all published material. Under this license, authors retain ownership of the copyright for their content, but allow anyone to download, reuse, reprint, modify, distribute, copy, remix, transform and/or build upon the content for any purpose, even commercial, as long as the original authors and source are cited. No permission is required from the authors or the publishers. Appropriate attribution can be provided by simply citing the original article. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. For any reuse or redistribution of a work, users must also make clear the license terms under which the work was published.
No separate publishing agreements are signed between the author and the publisher. Authors retain copyright and the publishing rights of their work without any restrictions.
Authors are permitted and encouraged to post the journal’s published version of the work online (e.g., in institutional repositories, on their own websites), with an acknowledgement of its initial publication in Slovenščina 2.0.