An odd couple – Corpus frequency and look-up frequency: what relationship?
AbstractIn this paper, we investigate the relationship between log file records and corpus frequency. The study was motivated by practical considerations of how best to keep an already existing corpus-based dictionary updated. Should the next word in the dictionary be the one that follows next on a list of declining corpus frequency? Or the one that users most frequently look up but don’t find? In order to establish manageable criteria, we analysed log files for The Danish Dictionary from 2009 to 2012 and compared the list of most popular words looked up by the users with the frequency of the same words in the corpus underlying The Danish Dictionary. The users’ actual search behaviour was analysed in order to find answers to questions such as these: Are there words which are never looked up? If so, can we say something meaningful about their corpus frequency patterns – do they belong to particular parts of speech, are they particularly frequent or infrequent, could it even be that the pattern is cumulative, in such a way that a particular threshold can be identified? Ultimately, the question is whether it makes sense to use corpus frequency as a criterion for lemma selection.
Copyright (c) 2014 Lars Trap-Jensen, Henrik Lorentzen, Nicolai H. Sørensen
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All content of Slovenščina 2.0 is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Slovenščina 2.0 applies the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license to all published material. Under this license, authors retain ownership of the copyright for their content, but allow anyone to download, reuse, reprint, modify, distribute, copy, remix, transform and/or build upon the content for any purpose, even commercial, as long as the original authors and source are cited. No permission is required from the authors or the publishers. Appropriate attribution can be provided by simply citing the original article. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. For any reuse or redistribution of a work, users must also make clear the license terms under which the work was published.
No separate publishing agreements are signed between the author and the publisher. Authors retain copyright and the publishing rights of their work without any restrictions.
Authors are permitted and encouraged to post the journal’s published version of the work online (e.g., in institutional repositories, on their own websites), with an acknowledgement of its initial publication in Slovenščina 2.0.