A COMPARISON OF COLLOCATIONS AND WORD ASSOCIATIONS IN ESTONIAN FROM THE PERSPECTIVE OF PARTS OF SPEECH

The paper provides a comparative study of the collocational and associative structures in Estonian with respect to the role of parts of speech. The lists of collocations and associations of an equal set of nouns, verbs and adjectives, originating from the respective dictionaries, is analysed to find both the range of coincidences and differences. The results show a moderate overlap, among which the biggest overlap occurs in the range of the adjectival associates and collocates. There is an overall prevalence for nouns appearing among the associated and collocated items. The coincidental sets of relations are tentatively explained by the influence of grammatical relations i.e. the patterns of local grammar binding together the collocations and motivating the associations. The results are discussed with respect to the possible reasons causing the associations-collocations mismatch and in relation to the application of these findings in the fields of lexicography and second language acquisition.


I N T R O D U C T I O N
Both the terms collocation and word association designate an implicit bond between words 1 . Whether the collocations and associations are basically the same or represent different kinds of lexical and/or mental organisation is a question that has intrigued researchers for some time already (for an overview see Deyne and Storms, 2015). In the present paper we do not intend to answer the question theoretically and once and for all but aim to bring forth the tendencies that occur in the Estonian language in that regard. The existing literature about comparisons of associations and collocations covers data of Indo-European languages so far (mostly English, see overview in Kang, 2018; German as in Shulte im Walde et al, 2008; and Russian as in Sinopalnikova, 2004). Some evidence from genetically different language groups would hopefully bring more insights into the field. We take the advantage of having two PoS analysis is relevant because of two reasons. Firstly, Estonian is a Finno-Ugric language that belongs to the agglutinating-flective typological class.
The PoS categorisation in Estonian relies on multiple factors: semantics, morphological inflection, syntactic behaviour and pragmatics (Paulsen et al., 2019). Estonian is characterised by well-formed morphosyntactic structure, among other features. This implies that a word's behaviour in speech (and text) is expected to be predetermined by its implicit PoS, which can further affect the structure of collocations derived from the texts. To which extent the word associations retrieved from memory follow the determined-by-the-PoS structure of text production is an interesting question. Secondly, there is a 1 By the term word association we refer to a concept used in applied linguistics and psycholinguistics (e.g. Deyne and Storms, 2015;Fitzpatrick et al., 2015). We do not use word association in the general sense of the term that would cover also patterns of relatedness of the words in text (e.g. Church and Hanks, 1990). tradition of classifying word associations according to their PoS homogeneity/heterogeneity principle, which has also been applied to the Estonian data (Toim, 1980). Thus, the PoS categories are expected to affect both the collocational and the associative structure of Estonian.
We assume that the Estonian data can contribute to the overall theoretical discussion by elaborating the role that PoS play in the formation of implicit bonds that the collocations and word associations tend to explicate. We consider that there is also some practical importance to elaborating the overlap vs non-overlap of collocations and word associations. So far, the practical interest in the topic has relied on the expectation that the (relatively low-cost) procedures of text mining for collocates would replace the high-cost psycholinguistic testing needed for establishing the relations comprising the mental lexicon (see, e.g. the Word Association Network 4 or Church and Hanks, 1990). We propose applicability also in the fields of lexicography and language teaching.
In this paper we will give a brief theoretical background, introduce the principles of material selection and carry out a systematic comparison of associations and collocations, paying special attention to the role of PoS categories.
The paper ends with a discussion about the reasons of the mismatch between collocations and associations in our data and about applicability of the results.

C O L L O C A T I O N S A N D A S S O C I A T I O N S
We refer to collocation as a frequent and meaningful combination of content words with other lexical and grammatical units (see, e.g. Firth, 1957). As such, collocations can be detected by computational analysis of a large text corpus by means of corpus query systems (CQS), one of which is Sketch Engine (Kilgarriff et al., 2004;Kilgarriff et al., 2014)-a CQS widely used among lexicographers in Europe. For automatic extraction of the ECD database (Kallas et al., 2015), the Sketch Engine function Word Sketch (Kilgarriff et al., 2010;Kallas, 2013) was used. Word Sketch is a one-page summary of a word's grammatical and collocational behaviour, and it displays collocations of a given keyword (or a node), grouped together according to their grammatical relation (e.g. adjectives as modifiers).
Collocation has a structure of a node and its collocate. Nodes refer to the words that are being looked at (e.g. dog) and collocates refer to words with which they form collocations (e.g. barks → dog barks; bites → dog bites; friendly → friendly dog) (see Sinclair, 1966;Roth, 2013). Any given node occurs in a number of collocations and has a number of collocates. The role of node vs.
collocate depends on the perspective. For example, looking from the perspective of the noun dog as a node, the dog can bark, bite and sniff; looking from the perspective of the verb bite as a node, the dog acts as a collocate, as also bugs, mosquitoes and spiders.
We refer to word association in the psycholinguistic sense of the term. The notion originates in the context of testing people (WAT 5 ) for their first and spontaneous responses to a range of verbal stimuli (for the origins of the method, see Galton, 1879;Jung, 1910; for the peak of popularity see e.g. Rosenzweig, 1961;Kiss et al., 1973;Postman and Keppel, 1970;Deese, 1965, and for current understanding see e.g. Nelson et al., 2000, andDeyne andStorms, 2015).
The word association can be, thus, defined as a person's lexical response to a lexical stimulus, e.g. if one says cat the reply might be dog, or if the stimulus would be bread the response could be butter. Stimulus and response are the basic structural components of word association.
The responses may vary over the respondents (e.g. bread may evoke butter but also breakfast etc.). Thus, one stimulus can have a list of responses and the same response can occur with a number of stimuli (e.g. bank→money and to waste→money). The collections of responses summed up over a number of respondents (at least one hundred, usually) and elicited to a certain range of stimuli are called association norms (see e.g. Kent et al., 1910;Postman and Keppel, 1970;Nelson et al., 2004;Schulte im Walde and Borgwaldt, 2015).  Fitzpatrick, 2007;Durrant and Doherty, 2010). It is difficult to provide a general quantitative measure because of the variation in the methodologies and in the statistics used (Kang, 2018).
One of the variables affecting the outcome of the comparison seems to be the inclusiveness of the lists of associations and collocations. The longer the span of text from which the collocations are extracted (e.g. in Kang's (2008)  It has been proposed that the partly controversial results of previous studies that compare collocations and associations may be due to the fact that collocations were misleadingly considered as emerging from the texts being treated as »a bag of words« (De Deyne and Storms, 2015), i.e. by ignoring the grammatical relations and syntactic structures that give the flow of language its natural texture. On the other hand, the previous studies have reached the conclusion that "...the word association task, as a special method of elicitation, is not of the same kind as the natural task of language production…" (Mollin 2009, p. 197) and hence the difference between associations and collocations.
A closer look at the structures represented by collocations and associations is a question of qualitative analysis. In that respect, word associations-if not mere clangs-have been interpreted traditionally as either belonging to a paradigmatic or syntagmatic class of relations (see e.g. Fitzpatrick, 2007;De Deyne and Storms, 2015). An example of a paradigmatic relation would be red Theoretically, thus, we can expect some similarities in the qualitative structure of the collocations and associations to occur too. Homogeneity versus heterogeneity (in terms of PoS ) of the relations can be a revealing factor in this respect.

T H E S T U D Y
Collocations and associations are similar by structure as pairs of words despite the difference in their origin (corpus query procedures versus psycholinguistic testing). Both collocations and associations consist of two structural members and asymmetry laid upon them: one of the two members that is in focus as a keyword is always an »access member« (AM) and the other is the »related member« (RM). These two are called »stimulus« and »response« in the case of word associations and »node« and »collocate« in the case of collocations (See Figure 1). In present analysis we will use the term access member (AM) to refer both to the stimuli (of associations) and nodes (of collocations). We use the term related member (RM) both in case of referring to responses (of associations) and to the collocates (of collocations). iii) We assume that the RMs with top positions in the ranking will dominate among the common pairs while the non-overlapping pairs will include RMs with a relatively low ranking. We are interested in whether this holds for all PoS.

Material and method
As mentioned in the Introduction, we rely on the newest and best organized data available: the Estonian Collocations Dictionary (ECD) and the Dictionary of Estonian Word Associations (DEWA). The dictionaries represent, respectively, collocations extracted from the latest available text corpus (see Kallas et al., 2015, for how the database was generated) and the latest and topical associations gathered (Vainik, 2018). More detailed description of the data sources is presented in Table 1.

Presentation mode of AMs and RMs
Base forms: nouns and adjectives in the nominative singular case, verbs in ma-infinitive As lemmas or in their most frequent grammatical form

Method of compilation
A citizen science project with more than 400 participants. See description in Vainik (2018) Semi-automatic; using Sketch Engine for the extraction of collocations from the Estonian National Corpus 2013 (463 million words) In ECD, the node (AM) and the collocate (RM) are presented as lemmas (e.g. The coverage of the two sources differs almost ten times with respect to the number of AMs. The overlap of keywords in two dictionaries is 1102, which makes 11.6% of ECD and 85% of DEWA. For the purpose of the study we made a selection that contains 90 AMs present in both dictionaries and is balanced in two ways: by PoS and by corpus frequency 8 . The procedures were as follows: the list of shared keywords was ranked according to decreasing frequency, and equal proportions (N = 10) of adjectives, nouns and verbs were retrieved from the top, from the bottom and from around the middle of the frequency list.
This step was taken in order to avoid the possible side effects of varying frequency of AMs across PoS (e.g. that nouns would appear to be more frequent, generally, than verbs or adjectives). The selection of AMs was not based on any semantic criterion.
The data for comparison (pairs of AMs and RMs) were retrieved from the databases of ECD and DEWA by queries containing equal sets (N = 30) of adjectives, nouns and verbs in the search list. The procedure resulted in data tables containing full lists of collocations (N = 4743) and associations (N = 8138), which were further filtered for the recurrent (F > = 2) connections.
Subsequently, the two lists were compared automatically in order to find the cases where both the AMs and RMs coincided. We refer to those coincidental cases as common pairs in the following sections, while the non-coincidental collocations and associations of those 90 AMs are referred to as exclusive collocations and associations, respectively. Our method of comparing full lists of recurrent associations and collocations strives for accounting for the maximum of the potential overlap.

Comparison in general terms
One of the main results of this study is the list of the common pairs (N = 582).

Comparison in terms of parts of speech
There is an intriguing division of the leading role between the PoS as AMs. Adjectives comprise a larger proportion in the pool of common pairs (see Table   0   The distribution of RMs follows neither the equal proportions of the test words nor the slightly diverging proportions of the AMs. Table 3 demonstrates that nouns comprise the biggest proportion of RMs among both the common and exclusive pairs. In the case of exclusive collocations, the prevalence can be observed to a lesser degree, and, in addition, some other PoS (mostly adverbs) emerge as RMs. The prevalence of nouns among RMs can be explained in a few ways. The most obvious explanation is that the proportion of nouns in the lexicon generally is larger (see e.g. Hudson, 1994)-a fact that gives this PoS an advantage in making any kind of relationships. Another explanation is that nouns serve in diverging functions with respect to forming relationships. An RM-noun can occur in a paradigmatic relation with an AM-noun (e.g. they form pairs of synonyms, antonyms and cohyponyms, which are both elicited in WATs and do co-occur in the texts). An RM-noun can also participate in syntagmatic relations, for example being the head of a phrase (e.g. house (N) in a phrase big house) or emerge as an argument of a verb e.g. house (N) in a phrase building a house. Relations similar to the syntagmatic one can also motivate word asso- of PoS (Toim,198o). Table 4 presents the distribution of homogenous and heterogenous pairs. It appears that the exclusive associations (and apparently the associations in general) include more homogenous relations. This finding seems to be in line with the claims that »the word class of the stimulus word plays a role in that it causes the same word class to be over proportionally represented in the responses to it« (Mollin, 2009, p. 196). Whether the percentage from roughly 10 to 25 is overproportional depends on the perspective. The most prevalent group in the analysed dataset is N→N relation among the exclusive associations. The relation is also relatively stronger among the common pairs. The second most prevalent type of relation is heterogeneous A→N, which is the leading pair among the common pairs. The third prevalent type, V→N, occurs also in the range of the common pairs. All three most prevalent patterns have a noun in the position of RM. It is also worth mentioning that the common pairs lack heterogenous relations where nouns are not involved (e.g. A→V, A→D and V→D). These patterns seem to occur only among collocations. Exceptionally, there are some pairs with the structure V→A (e.g. maitsma→hea 'to taste→good', tundma→mõnus 'to feel→pleasant').
Taken together, the homogenous relations make up a larger proportion among the exclusive associations (47.11%) and common pairs (42.61%), while their proportion is much lower in the case of exclusive collocations (16.83%). The latter tend to demonstrate a heterogeneous PoS structure and thus reveal syntagmatic relations. This is quite expected, realising that collocations are derived from texts, which are syntactically arranged, while associations are driven from people's memory where such an arrangement cannot be taken for granted. It is still interesting that the biggest overlaps between associations and collocations occur among heterogeneous relations: A→N and V→N. Apparently, the syntagmatic (or syntagmatic-like semantic) relations play a role also in the memory and/or in the strategies of association elicitation.

Distribution of grammatical relations
In this section we provide a closer look at the distribution of grammatical relations that motivate the different types of AM→RM pairs. Information about grammatical relations derives from the ECD database.
As stated in Section 2, collocations in ECD are presented according to their grammatical relation in order to make it easier for the learner to acquire them and put them directly into use in their correct grammatical form. The grammatical relations illustrate what word pairs most typically occur in texts written by native speakers. Grammatical relation represents a category which displays collocates with the same relation to the search word (e.g. modifiers of a noun or objects of a verb).
Even though associations do not reveal grammatical relations directly-both stimulus and response are presented in base form in DEWA-we can take the corresponding grammatical relations in ECD as indicators of the potential grammatical relations motivating the emergence of certain associations.
The distribution of grammatical relations among both the common pairs and exclusive collocations is given in Table 5, and the most salient grammatical relations are discussed below.  Table 5 shows that the and/or relation is the most frequent one, forming about 1/3 of all common pairs. This is because this homogeneous relation is not specific to any PoS. The and/or relation represents semantic relations like synonyms (tähtis ja oluline 'significant and important'), antonyms (kerge või raske 'easy or difficult') and cohyponyms (ema ja laps 'mother and child'), which are paradigmatic in nature. The remarkable intersection between associations and collocations shows that paradigmatic relations are not only restricted to memory but occur as coordinated constituents of a clause at the syntactic level of expression too.
The second most frequent grammatical relation among the common pairs is the modifies relation between AM-adjectives and RM-nouns. It is a syntagmatic relation of attribute and its head. The intersection shows that, apparently, qualities tend to make well-established connections to their typical carriers both in memory and written language use. This relation also comprises the third largest proportion of the exclusive collocations, revealing the wealth of attributive constructions in the texts.
When we look at exclusive collocations, the distribution of grammatical relations is different as no prevalent ones occur. The most frequent one is adverbi-al_semantic case between AM-nouns and RM-verbs, which captures adverbials that are nouns in semantic case forms 10 (e.g. inessive, adessive, comitative etc, as in restoranis sööma 'to eat in a restaurant', inimestega suhtlema 'to communicate with people', naisesse armuma 'to fall in love with a woman'). This grammatical relation contributes to the N→V type of PoS patterns, which is rather low among the common pairs and almost missing among the exclusive associations.
The second most frequent grammatical relation adv_modifier 11 between AM-verbs, AM-adjectives and RM-adverbs captures adverbs that modify verbs (koos mängima 'to play together') and adjectives (tohutu suur 'enormously big'). This type represents the V→D and A→D PoS patterns that were missing among the common pairs and exclusive associations (see Table 4). The third most frequent grammatical relation (modifies; A→N) coincides with the second most prevalent one among the common pairs (see comments above).
Table 5 also shows that in some cases a specific PoS pattern can be motivated by more than one grammatical relation. One of those is N→N, to which two grammatical relations-in addition to the and/or relation-also contribute: genitive_modifies and genitive_modifier. The latter two represent the possessive construction as seen from two perspectives. In the case of the geni-tive_modifies relation, the AM-noun GEN (e.g. lapse 'child's') is modifying RM-noun NOM (e.g. ema 'mother') (lapse ema 'child's mother'); in the case of genitive_modifier, AM-noun NOM (e.g. saba 'tail') is modified by RM-noun GEN (e.g. kassi 'cat's') (kassi saba 'cat's tail'). Another PoS pattern, possibly motivated by multiple grammatical relations, is N→V. There are two grammatical relations that-in addition to the adverbial_semantic case discussed above-contribute to this syntagmatic pattern: subject_of and object_of. The same syntagmatic relation is reflected in V→N patterns object and subject, again, as from the other perspective.
In sum, there are indeed certain types of grammatical relations that are favoured both among collocations and associations. These are the paradigmatic and/or relation, which subsumes different PoS, and the syntagmatic relation modifies, which holds between an adjective and its head noun.

Comparison in terms of ranking
Our data sources (ECD and DEWA) are similar in respect to presenting the RMs of a given AM in a decreasing order of frequency (see Table 1 in section 3.1.). The rank of a RM reflects its position in an ordered list and as such it is an approximate indicator of the (relative) strength of the relation. Rank 1 indicates the strongest relation in a given list, rank 2 the second strongest, etc.
Equal rank of two RMs indicates their equal frequency in a given list.
It must be taken into account that the dictionaries differ, too, not only in their coverage of headwords (see Table 1) but also with respect to the number of RMs presented. The average number of different RMs (F > = 2) associated with an AM in ECD was 43.4 (StDev = 27.2), while in DEWA the average was 27.6 (StDev = 7.9). This indicates more variation, generally, in the length of the lists of collocations rather than of associations, which further affects the ranking. The mean rank of collocations, in general, is 28.4 (StDev = 23.10) while the mean rank of associations, in general, is 8.6 (StDev = 3.5).
We hypothesised that the RMs in top positions in the ranking would dominate among the common pairs, while the non-overlapping pairs would include RMs with a relatively lower rank. If this is the case, there should be a difference in the mean ranks of the common pairs as compared to the sets of exclusive associations and collocations.
The results of the comparison are presented in Table 6. The set of common pairs is characterised by the mean ranks in both DEWA and ECD, and those two should be compared to the means of the exclusive associations and collocations, respectively. It is indeed the case that the mean ranks of the common pairs are smaller than the mean ranks of exclusive associations and collocations.
The means are rather even across the PoS, except for the mean for the collocations of adjectives among the common pairs, which is lower (16.29) than the mean for the collocations of verbs and nouns. This could mean that adjectives as AMs are selected for stronger collocative relations. Another explanation could lie in the fact that adjectives are provided with shorter lists of collocates in ECD compared to verbs and especially nouns. The longer lists of AM-nouns in ECD are reflected in their larger mean rank (37.43) among the exclusive collocations. It is still not the case that all of the strongest relations (with ranks 1-5) will appear among the common pairs. There is actually a great deal of variation in the ranks among the common pairs-StDev in DEWA = 3.8 and StDev in ECD = 18.7-and, on the other hand, the exclusive lists of associations and collocations also contain strong relations (with the ranks 1-5), which are not mutually present.
There were, for example, only few common pairs that shared the first rank both among associations and collocations: beež→pruun 'beige→brown', kana→muna 'hen→egg', lahutama→abielu 'to separate→marriage', laps→väike 'child→small', lugema→raamat 'to read→book', naine→mees 'woman→man', tantsima→laulma 'to dance→to sing', võidupüha→paraad Examples of the strongest exclusive associations (rank = 1) include: pairs of the most obvious antonyms (meeldiv→ebameeldiv 'pleasant→unpleasant', vasak→parem 'left→right'), pairs of an attribute and its typical carrier (or-anž→apelsin 'orange→orange', triibuline→sebra 'striped→zebra'), pairs of synonyms (sõjavägi→armee 'army→army', ostukeskus→pood 'shopping centre→ shop') and many more. These kinds of pairs are interpretable as strong relations in the memory, which are, at the same time, not represented as collocations in the language usage. It seems that the words are either mutually closing out or too obvious by semantics to be used in a close proximity while talking or writing. It has also been proposed that the strongly associated pairs which do not occur in the corpus reflect the world knowledge rather than the information that needs to be expressed in context (Schulte im Walde et al.,

D I S C U S S I O N
The main result of our study revealed (section 3.2.1) that the coincidental part of AM→RM relations is much lower than the divergent parts of exclusive AM→RM relations. This finding is well in line with previous studies of English (Mollin, 2009). The overall proportion of our common pairs (582) makes 9% of the total set of recurrent associations and collocations and fits quite well with Mollin's 3%. However, the proportion of coincidental pairs in our study is three times bigger. We can give two reasons for this difference.
Firstly, Estonian as a morphologically rich language does not exploit function words widely to indicate grammatical relations. The presence of content word→function word collocations that were missing among associations was one of the main arguments for the collocation association mismatch in Mollin's study of English. Secondly, the lists of associations in Estonian data were elicited by ca. 300 respondents (Vainik, 2018) while Mollin (2009) used the data of EAT, which contains responses of 100 undergraduate students (Kiss et al., 1973). The bigger number of respondents leads to longer lists of recurrent associations, which increases the probability of coincidence with some of the collocations.

The association-collocation mismatch
It was mentioned above that ECD is a much richer source of information both in terms of coverage of the headwords and the number of collocates presented.
This is a quantitative factor inducing an overflow of collocations resulting inevitably in a larger proportion of mismatches on the side of collocations. There are also some qualitative factors affecting the incompatibility of the outcome.
One of the factors is the nature of the data that stems from the method of data gathering. The material presented in ECD is influenced by the size and character of the corpora on which it is based (Kallas et al., 2015;Koppel et al., 2019b). The material in DEWA, on the other hand, is influenced by the number of respondents, by the selection of the stimuli, etc. (see Vainik, 2018) and, apparently, also by following the common strategies of association elicitation by respondents (see Clark, 1970).
The nature and quality of the corpus influence, for example, which word pairs would emerge as more salient in ECD. In section 3.2. we mentioned that the RMs of the exclusive collocations revealed more abstract concepts related to the aspects and values of social life (e.g. regionaalne 'regional ', riiklik 'national', koostöö 'collaboration'). This might easily be because of the more official register brought forth by the content of the corpus, which includes an abundance of official documents and texts. One can also notice vocabulary related to certain specific fields like sports (e.g. märg rada 'wet track', naiste turniir 'women's tournament') and weather forecasting (märg lumi 'wet snow'). Another aspect that may reduce the number of coinciding AM→RM relations is the fact that the semi-automatically gathered material of ECD was controlled manually, and collocations pointing to obvious idioms and proverbs were deliberately excluded 12 .
There are also some systematic characteristics of the material in DEWA that may have caused its partial incompatibility with the collocations. One of them is the form of the stimuli, which is presented in the base form, i.e.
the nominative singular case (in the case of declinable words) (see section 3.1. Another reason for formal incompatibility might be due to the association stimuli being given in singular, which influences the form of responses. Therefore, the cases in which a collocation is frequent but where AM is in plural, e.g.
kohalikud valimised 'local elections', are not found among the common pairs.
Another notable form-related difference is the scarcity of comparative forms among associations. There were common collocations found in the corpus which contained comparative adjectives (e.g. suurem laps 'older child') that did not occur in associations.
In section 3.2.2. (Table 4) we highlighted that adverbs were almost missing from the RMs in the case of associations and were totally absent in the case of the common pairs. The reason for the lack of adverb word pairs is likely due to both semantics as well as word order in Estonian. For example, since adverbs are placed before adjectives in the sentence, then in the case of adjective stimuli, the response is probably less likely to be the preceding word than the following one. The general semantics of the adverbs as a PoS also plays a role.
One can speculate that adverbs, though frequent collocates in corpora, are often semantically emptier as they mostly function as intensifiers (e.g. tohutult (D) suur (A) 'enormously big') or modifiers (e.g. peamiselt kohalik 'mainly local', enamasti kohalik 'mostly local', etc.). Such adverbs express the extent of a quality rather than a true relation between two content words, and are thus less likely to occur in the WAT tests. People prefer to give lexical rather than function words as responses (Clark, 1970, p. 283).
In conclusion, the constituency of corpora as well as form, word order and semantics all play a role in creating the difference between associations and collocations.

Practical implications
We foresee applicability of the knowledge about common pairs of collocations and association in lexicography and language teaching. In both fields, a strategy of prioritisation is needed because of the everlasting demand for efficiency in the condition of a rich flow of information. Mimicking deliberately the structure of a native speaker's mental lexicon would be one possible strategy of prioritisation when presenting the material in web dictionaries and supporting materials targeted at learners.
In that respect, one could formulate a tentative principle, "the first relations first", while deciding where to start learning from or to which type of constructions to pay the most attention. If a dictionary, language portal or teaching material contains a lot of collocations, associations can offer an alternative strategy to corpus frequency in deciding which ones should be given priority.
For example, the collocations dictionary is very sizable (e.g. some frequent nouns can have over 100 collocates) and can be difficult for a learner to absorb. The supporting information about the presence of these relations in the native speaker's mental lexicon would be a valuable key for the first approximation. Common pairs, as the more focal relations, could be marked for learners by adding key-symbols, for example.
In ECD, collocations are presented as constructions in order to make it easier for the learner to use them and include them into their active vocabulary.
Based on the findings of this analysis, we could suggest that the paradigmatic relations represented by the and/or relation and the syntagmatic relation of attribution (the grammatical relation modifies) should also be given special attention when compiling materials for language teaching.
From the perspective of PoS, one could infer that the combinations A+N and V+N seem to be more central in the mental lexicon than, for example, combinations including verbs, adverbs and adjectives.
One can consider applicability of the results also in relation to writing dictionary definitions in dictionaries where familiarity for the user is strived for. In such cases associations could play a major role. For example, if at certain words or group of words paradigmatic relation is found more relevant, providing synonyms/antonyms next to or as part of the definition would be useful 13 . It has been also suggested that associations reveal information about domain information and relevance of the senses for the ordinary speakers (Sinopalnikova, 2004).
This should be even more true about the association-collocation overlap.

C O N C L U S I O N
The main goal of the present paper was to systematically compare word associations and collocations in Estonian in order to achieve some new insights regarding the role of PoS. We assumed that Estonian as a language with a well-developed morphosyntactic structure would reveal some constructions that may favour the occurrence of certain PoS combinations. The analysis was based on a representative selection of test words (N = 90) and their related items from two recent dictionaries, ECD and DEWA.
The results revealed an overlap of 14.9% of all collocations and 23.4% of all associations related to the test words. We interpreted the common pairs (N = 582) as a similarity of collocations and associations and the exclusive pairs as a mismatch.
With regard to the PoS, it was discovered that adjectives tend to make proportionally more common pairs than nouns and verbs. There was a well-established combination of adjectives and nouns recurring that was explained as being motivated by the attributive grammatical relation modifies. It also appeared that adjectives tend to make somewhat stronger collocations, which is a topic that needs further study. We tentatively concluded that there is a remarkable consensus concerning attributing qualities in both memory and language use.
It was also discovered that, regardless of the PoS of the headword/stimulus, there occurred proportionally more nouns as collocates/responses among the common pairs. The biggest overlaps between associations and collocations were found among heterogeneous relations comprising different PoS: in addition to the A→N relation mentioned above, the relation V→N was salient. Apparently, the syntagmatic (or syntagmatic-like semantic) relations play a role not only in texts but also in the semantic memory and/or in the strategies of association elicitation. Interestingly, the common pairs lacked heterogenous relations when nouns were not involved, which reveals also the tendency for nouns to recur as the related members.
The and/or relation was found to be the dominant grammatical relation among the common pairs because it subsumes different PoS and expresses paradigmatic relations (e.g. synonymy, antonymy, cohyponymy). On the other hand, a totally different grammatical relation (adverbial_semantic case) was found to prevail among the exclusive collocations. This is obviously because Estonian is a morphologically rich language that uses semantic cases, whereas English, for example, uses prepositions.
The most frequent combination of PoS was the homogenous N→N combination, which was prevalent among the exclusive associations. Although the and/or relation seems a convenient and plausible motivation, our analysis showed that other grammatical relations like genitive_modifies and genitive_ modifier contribute to this prevailing pattern too.
As the non-coincidental part of collocations and associations was large-85.1% and 76.6%, respectively-we also paid attention to discussing some possible reasons for the systematic mismatch. Besides the quantitative disproportion of collocations, we proposed such qualitative factors as the constituency of the corpus, a form of stimuli, word order and semantics playing a role.
In sum, we can see several reasons, both quantitative and qualitative, that may cause the mismatch between associations and collocations. It is still remarkable though that these reasons seemingly do not rule out completely the similarities between associations and collocations. We interpret the similarity as revealing a set of core connections that are actively upheld while people think, talk and write texts in Estonian. The core connections seem to share a