In the Search of Lexicographically Relevant Collocation: The Example of Grammatical Relations Containing Adverbs

  • Eva Pori Slovenian
  • Iztok Kosem
Keywords: lexicography, semantics, collocationality, collocations dictionary of Modern Slovene, adverb


This paper presents the results of the analysis of grammatical relations that focussed of identifying not only collocations relevant for lexicographic purposes, but also problematic areas that need further investigation on both lexicographic and grammatical level. In the initial study, collocation candidates for a wide selection of grammatical relations for a heterogeneous sample of 333 lemmas have been automatically extracted from the Gigafida reference corpus of Slovene. A group of linguists then annotated the relevance of collocation candidates, examining both collocations and their examples of use, and their answers were analysed for agreement. The findings were that relations such as adjective + noun, noun + noun in gerund, and some relations verb + preposition + noun exhibited high agreement and large shares of approved collocation candidates. On the other hand, grammatical relations containing adverbs proved to be among the ones where disagreement or uncertainty of linguists-annotators was the highest. Consequently, it was decided that these adverbial relations should be analysed first as a sample set in testing our bottom-up approach to determining which collocation candidates are lexicographically relevant.

Further analysis has shown that the decision on the relevance of collocation candidates for dictionary purposes needs to be made separately for each relation, and groups of adverbs within it. An example of semantically less relevant group proved to be adverbs functioning as intensifiers or having a semantically less relevant role of a participle. Even more problematic is a group of numeral adverbs (once, twice…) which have different levels of semantic relevance (e.g. četrtič doktorirati 'to receive a PhD for the fourth time' versus stokrat povedati 'to say something a hundred times') and thus cannot be delimited on a group level within a particular grammatical relation.

The data from the analyses described in this paper will enable further detailed analyses, in particular a description of each grammatical relation from the perspective of its collocationality. In addition, bad collocation candidates that are the result of errors in morphosyntactic annotation will enable the improvement of sketch grammar and relatedly the quality of automatic extraction output. Furthermore, we intend to use existing findings in order to improve the results of grammatical relations that have been initially excluded from the automatic extraction procedure due to a high percentage of noise.


