STATE-OF-THE-ART ON MONOLINGUAL LEXICOGRAPHY FOR CROATIA (CROATIAN)

In this minireview, the state of the art of the Croatian monolingual lexicography is presented. A brief overview and classification of all existing lexicographic resources is provided in the firts part of the minireview, followed by somewhat more detailed insight into the existing Croatian monolingual dictionaries and monolingual lexicographic projects, orthography dictionaries, and dictionary writing systems used.


I N T R O D U C T I O N
Croatian lexicography has a remarkably long and rich tradition, 1 which is more than five centuries long, and which has always been an important part of European lexicography, built on the tradition of Latin dictionaries (Štrkalj Despot and Möhrs, 2015, p. 329).However, after the important change of lexicographic paradigm brought by the digital era, the discontinuation of this tradition became 1 For the historical overview of the Croatian lexicography cf.Vince (2002), Moguš (2009), Samardžija (2011aSamardžija ( , 2011bSamardžija ( , 2011c)), Tafra and Fink Arsovski (2013), etc.
very apparent, primarily in the number and quality of e-dictionaries compared to other European, and even other Slavic languages (Ibid, p. 330).Here, we will provide an overview of the current state-of-the-art of the Croatian e-lexicography based on Štrkalj Despot and Möhrs (2015, p. 335), which propose the classification of the contemporary Croatian lexicographic accomplishments following de Schryver (2003), and Klosa (2013).They have classified the contemporary Croatian lexicography as follows: 1) Corpus driven dictionaries: e. g.Croatian Frequency Dictionary (Moguš, Bratanić and Tadić, 1999), Dictionary of Marulić's Judita (Moguš, 2001) etc.
3) Dictionaries that are available in the closed digital forms (CD/DVD): First Croatian School Dictionary (Čilaš Šimpraga, Jojić, and Lewis, 2008), Big Dictionary of the Croatian Language (Jojić et al., 2015); 4) Open access network dictionaries: Croatian linguistic portal;     (Birtić et al., 2012), which contains 30,000 entries.This dictionary is aimed at advanced users in their secondary education, but it serves as a general monolingual dictionary as well.It is the first Croatian general monolingual corpus based dictionary.The conception of the dictionary and the lexicographic workflow were determined and led by Lana Hudeček and Milica Mihaljević (see Birtić et al., 2012, Introduction).The workflow was not organized around alphabetically ordered entries, but around semantically or grammatically closely related groups of words (such as colours, week days, seasons, pronouns, conjunctions, etc.), which resulted in the consistent lexicographic processing.In this dictionary, in addition to a thoroughly compiled grammatical unit, definition, and phraseological unit, a lot of attention has been paid to the systematicity of semantic, morphological and grammatical definitions, relationships between words (synonymy, antonymy, homonymy) and the ap- propriateness of examples and idioms.
As a predecessor to this dictionary, in 2008, the First Croatian School Dictionary (Čilaš Šimpraga, Jojić, and Lewis) consisting of 2,500 entries was published by IHJJ and Školska knjiga.This dictionary is aimed to be used by 1st and 2nd elementary school graders as a preparation and practice for future usage of real dictionaries.Dictionary consists of everyday words that are known to the children as well as of less known words from their handbooks.
Its abecedarium was compiled on the lexical material of the school program for the 1st and 2nd grade of elementary school.The structure of the article is simple, offering simple definitions with illustrations.It can therefore serve as a dictionary for foreigners as well.The dictionary is accompanied by a DVD, on which digital version may be used together with lexicographic games.
In the past five years, IHJJ has also become known for its numerous online lexicographic resources, described in the previous chapter.
The biggest Croatian lexicographic project is currently ongoing at IHJJ (with L. Hudeček as its PI), and is funded by the Croatian Science Foundation -Online Croatian Dictionary (MREŽNIK) -(see Hudeček, 2018;Hudeček, and Mihaljević, 2018).The aim of the MREŽNIK project is to compile the first digital born, free, monolingual, corpus-based, hypertextual, easily searchable, online dictionary of the Croatian standard language with three modules: for adult native speakers, for school children, and for foreigners.Šonje.Since this dictionary is produced by the public institution, it is obviously supported from the public funds. 13ovi Liber publishing house published several general monolingual dictionaries, all based on the material compiled by Vladimir Anić: Dictionary of the Croatian Language (1991,1994,1996,1998,2000, pocket edition in 2007, printed edition with CD in 2010); Great Dictionary of the Croatian Language (2003,2004,2009). 14The Dictionary of the Croatian Language has not been In 2006, on the basis of the Dictionary of Foreign Words (Anić andGoldstein, 1998, 2000), Encyclopaedic Dictionary, Large Dictionary of the Croatian Language, and some other resources, Novi Liber created the Croatian Language Portal, an open access searchable portal with more than 110,000 entries (containing basic grammatical information, definitions, examples, as well as phraseological, onomastic and etymological information).The Croatian Language Portal is currently the only open access online dictionary of the Croatian language. 15Despite the fact that its content has not been updated, it is very popular and widely used.According to the Croatian part of the survey on dictionary use in Europe results, 16 cca 50% of all the examinees confirmed it as their first and only dictionary choice, obviously because it is currently the only online freely available resource.17Currently, there are no general monolingual dictionaries of Croatian for foreign learners, but within the Mrežnik project, a module is envisaged to accommodate these needs.

C R O A T I A N O R T O G R A P H Y D I C T I O N A R I E S
As is the case with the majority of Slavic languages, the Croatian language has a long and rich tradition of production and wide usage of otography dictionaries.There are four Croatian ortography dictionaries currently in use, all of which are an integral part of an orthography handbook (Anić and Silić, 2001;Badurina, Marković and Mićanović, 2008;Babić and Moguš, 2011;Jozić et al., 2013).They consist primarily of lemmas causing possible orthographic 15 In 2015 this portal was taken over by Znanje publishing house. 16 The results of a survey on dictionary use in Europe, focusing on general monolingual dictionaries are presented in Kosem et al., 2018.The survey covers close to 10,000 dictionary users in nearly thirty countries.It was completed by 9,562 respondents, over 300 respondents per country on average.
problems.The Orthography handbook produced and published by the Institute for the Croatian Language and Linguistics (Jozić et al., 2013.)Judging from the number of orthography handbooks during the history, and especially from the number of their editions and printed copies in present days, their popularity and market potential, even their national symbolism, is valued higher than that of general monolingual dictionaries.20

D I C T I O N A R Y W R I T I N G S Y S T E M S
For many years, the only dictionary writing system used in Croatia was Softleks21 -a software developed by a small private company in early 90's.Softlex is a blend of a custom NoSQL database optimized for storing dictionary data and a WYSIWYG text editor to enable quick and user-friendly enter of complex dictionary data.At the time of its implementation, 16 years before the first XML standard, it was very advanced, but by not being constantly upgraded it soon became a bottleneck in development of a national lexicography resulting in a rather slow process of our inclusion into modern e-lexicographic trends.
The majority of the dictionaries presented here is primarily manually compiled with a very limited corpus support.Despite the fact that the connection with corpus seems not to play an important role to dictionary users (among 11 users' criteria of importance, two of them related to corpora were rated at the 5th and 7th place in the survey), it is is a conditio sine qua non for modern lexicographers.
Methodological turning point in the Croatian lexicography is the compilation of digital born The Online Croatian Dictionary (Mrežnik), see ch. 3. It is 18 http://pravopis.hr/rjecnik19 www.rjecnici.hrbeing compiled in TLex DWS, which has been customized to the project needs.
SketchEngine is being used as a CQS.This information is integrated with TLex program, in which lexicographic processing is done on all levels.For the morphological level descriptions, the project uses the inflectional morphological lexicon hrLex. 22When the lexicographic processing is finished, all the data will be exported to a web application, as well as to the CLARIN repository.
This will enable MREŽNIK to be widely used both by public and for language technology applications that require access to the full data.
When it comes to the internet availability, among EU countries, the Eurostat statistics for the year 2016 ranks Croatia at the 21st place. 23According to the social media management agency Hootsuite, the internet penetration rate in Croatia in 2017 is 75%, which is on average higher among the South European countries, but lower in comparison with Western Europe. 24 When it comes to crowdsourcing, it has not been used so far for compiling mono- are more than 400 publicly available lexicographic papers on Hrčak, an open access portal of Croatian scientific journals), however, it is primarily focused on the lexicographic content (e.g.Nikolić-Hoyt, 2004, Tafra, 2005), while the research on dictionary use is completely neglected.
Dictionary use is unfortunately not systematically taught in primary education nor is it a part of official curriculum for it.It is up to teachers to dedicate one or two school hours to teach children how to use dictionaries, and they usually do this in collaboration with school librarians in 7th or 8th grade of elementary school, when learning about the history of the Croatian language.
Lexicography is a part of the curriculum only for the highest grades of secondary education.However, this might change due to a fact that Croatia is going through a national educational reform, and publicly available version of the curriculum for the Croatian language involves training in dictionary use among other learning strategies.26 Surely, earlier and more official introduction to using dictionaries and other basic printed and online linguistic resources would be far more fruitful and would benefit future general lexicographic and linguistic culture.

C O N C L U S I O N S A N D O U T L O O K
From the overview given above it can be noticed that the majority of Croatian general monolingual dictionaries (including school dictionaries) are printed dictionaries.The majority of those dictionaries are primarily manually compiled with none, or very limited corpus usage.They are published both by public institutes and by private publishers, and in both cases they are entirely or partially publicly funded.There is no obvious difference in terms of users' trust to either of those.The ease of access to content seems to play a much more important role.Leading monolingual dictionaries do not seem to be connected with national symbolism, it seems that there is much stronger connection of orthography handbooks with national symbolism.
At the moment, there is only one free-of-charge and web-supported dictionary -the Croatian Language Portal.There is also an application-based The Great Dictionary of the Standard Croatian Language, which is available only if the printed version is bought.
The biggest Croatian lexicographic project and the only digital born and corpus-based dictionary -the Online Croatian Dictionary Mrežnik-is currently being conducted at the Institute for the Croatian Language and Linguistics.
This dictionary is a methodological turning point in the Croatian lexicography leading it towards the inclusion into the newest trends and highest standards of European e-lexicography.
funded by any public funding, instead, it managed to get almost 20.000 subscribers.The edition for schools has been financially supported by the Croatian Ministry of Culture, and recommended for the usage in elementary and secondary schools by the Ministry of Science, Education and Sports of the Republic of Croatia.
Školska knjiga publishing house (founded in 1950) is a private Croatian publisher with important participation in the market of school dictionaries and orthographic dictionaries.Školska knjiga published its first general monolingual dictionary in 2015 (The Great Dictionary of the Standard Croatian Language, ed.Lj.Jojić) containing more than 120.000 entries on 1800 pages.This dictionary has its digital edition in form of the desktop and mobile applications, which is accessible only by purchasing the printed edition.It has been financially supported by the Ministry of Science, Education and Sports of the Republic of Croatia, and the Croatian Academy of Sciences and Arts.
lingual dictionaries.However, this might change soon due to results of a 3-year research project SenseHive: Dynamic Crowdsourcing Models for Incremental Construction of Lexico-Semantic Resources, funded by the Croatian Science Foundation (2015-2018), with Jan Šnajder as its PI, which is aimed at developing a comprehensive crowdsourcing methodology for incremental construction of large-scale lexico-semantic resources.The research combines dynamic crowdsourcing, corpus-based models of semantics (distributional semantics and topic models), and active machine learning methods into a comprehensible and language-independent crowdsourcing framework, the SenseHive. 25 D I C T I O N A R Y U S EThe status of lexicography in general in Croatia is relatively satisfying, both in terms of lexicographic theory and practice, especially considering printed dictionaries.There exists a great amount of lexicographic research (E.g.there 22 https://www.clarin.si/repository/xmlui/handle/1135... 23 https://goo.gl/27HPe8(link to the Eurostat web page on the internet access) 24 https://www.slideshare.net/wearesocialsg/digital-in-2017-southern-europe25 The project description and the list of project publications may be retrieved at https:// www.researchgate.net/project/SenseHive-Dynamic-Crowdsourcing-Models-for-Incremental-Construction-of-Lexico-Semantic-Resources.
The project functions as an expanded version of Croatian FrameNet.This database lists hierarchically organized conceptual metaphors and metonymies decomposed into source-target relations among cognitive primitives, image schemas, and semantic frames, which are further decomposed into semantic roles (seeDespot et al., In press).
Brozović et al., 2018)ićet al., 2018).The lexicographers select data from the corpus as well as from other Croatian dictionaries, websites, and other resources.The dictionary workflow is supported by Sketch Engine(Kilgariff et al., 2004).The compilation of the dictionary is based on Word Sketches specially adapted to the needs of the project, which are based on a developed Sketch Grammar and the application of the GDEx module for finding appropriate examples in the corpus.To support the preparation of the dictionary text the TLex software package is used.The bulk of the project will be finished in 2021.
Ljubešić and Klubička, 2014)wo Croatian corpora: the Croatian Web Corpus hrWaC (http://nlp.ffzg.hr/resources/corpora/hrwac/,LjubešićandKlubička, 2014), and the Croatian Language Repository (CLR; focused on and largely appreciated for encyclopaedic lexicography.The first general dictionary produced by this institute (and co-published with Školska knjiga in 2000) was the Dictionary of the Croatian Language, edited by J.
is recommended for use in elementary and secondary education by the Ministry of Science and Education of the Republic of Croatia.Its spelling dictionary is freely available through a web page since 2013. 18It is widely used and the most popular online linguistic resource in Croatia.Since 2015, Babić and Mo-