Morphological Patterns in the Sloleks Lexicon of Slovene: An Initial Set of Patterns for Nouns
The paper presents the first step to expanding the Sloleks lexicon of Slovene with morphological patterns, starting with nouns. In the first phase, the patterns were extracted automatically from the lexicon based on a selection of differentiating characteristics (morphosyntactic tags and variant word parts). This was followed by a manual categorization during which we (a) separated patterns that are either systemic or based on actual language use from examples extracted because of noise attributable to either the extraction method or inconsistencies in Sloleks; (b) arranged patterns into groups based on their content and relatedness; (c) analyzed and more clearly defined form variability, with both standard and non-standard word forms; (d) propose future steps for the further development of the extraction method and lexicon upgrades. The result is a set of formalized morphological patterns for (common and proper) nouns containing 10 groups (64 patterns) for masculine nouns, 9 groups (29 patterns) for feminine nouns and 8 groups (20 patterns) for neuter nouns. The preparation of the set of formalized patterns also resulted in numerous suggestions on how to upgrade the lexicon, while a machine-focused view of morphological flection offers opportunities to improve the current grammatical description of Slovene. As part of our future work, we intend to expand the set of patterns with other parts of speech and corpus-based material. The final categorization of patterns will be included in the Sloleks lexicon, and the patterns will also be published on the CLARIN.SI repository in a machine-readable format.
Copyright (c) 2018 Špela Arhar Holdt, Jaka Čibej
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All content of Slovenščina 2.0 is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
Slovenščina 2.0 applies the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license to all published material. Under this license, authors retain ownership of the copyright for their content, but allow anyone to download, reuse, reprint, modify, distribute, copy, remix, transform and/or build upon the content for any purpose, even commercial, as long as the original authors and source are cited. No permission is required from the authors or the publishers. Appropriate attribution can be provided by simply citing the original article. If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. For any reuse or redistribution of a work, users must also make clear the license terms under which the work was published.
No separate publishing agreements are signed between the author and the publisher. Authors retain copyright and the publishing rights of their work without any restrictions.
Authors are permitted and encouraged to post the journal’s published version of the work online (e.g., in institutional repositories, on their own websites), with an acknowledgement of its initial publication in Slovenščina 2.0.