TermFrame: A Systematic Approach to Karst Terminology

We describe a systematic and data-driven approach to karst terminology where knowledge from different textual sources is structured into a comprehensive multilingual knowledge representation. The approach is based on a domain model which is constructed in line with the frame-based approach to terminology and the analytical geomorphological method of describing karst phenomena. The domain model serves as a basis for annotating definitions and aggregating the information obtained from different definitions into a knowledge network. We provide examples of visual knowledge representations and demonstrate the advantages of a systematic and interdisciplinary approach to domain knowledge.


INTRODUCTION
Karst is a type of Earth's surface that got its name after the Karst region in the hinterlands of the Gulf of Trieste in present-day Slovenia and Italy. The science of studying karst is called karstology. Its development was vastly expedited by research of the northern part of the Dinaric Karst, which was the site of the first explorations of this kind of terrain and was hence designated as the Classical Karst (Mihevc, 2010). There are several reasons why Slovenian Karst was the one to become the synonym for the scientific term and not some other karstic area in Europe. The most important factor is its geographic location and its geopolitical position in the period when karstology was developing between the 16 th and 19 th century, as the southern part of the Balkan Peninsula had been a part of the Ottoman Empire. At the time, Istria and a part of Karst were part of the Habsburg Monarchy and Trieste had become an important commercial hub. In light of all this, the region in the hinterlands of Trieste managed to impress the travellers of that day, thus becoming a synonym for a barren, rocky surface (Kranjc, 1994).
Since the beginning of the scientific study of karst in the middle of the 19 th century, in addition to the general term karst, many other karst terms were derived from South Slavic languages or local dialects within the area of the Classical Karst. They are still used today in international karstology describing mostly basic surface karst features such as dolina, uvala, polje, hum, ponor, etc. (Kranjc, 2008).
Because of the strong interactions between the international karst nomenclature and South Slavic languages covering prominent karst regions, within TermFrame: Terminology and Knowledge Frames Across Languages 3 we explore, model and systematically represent karst terminology and knowledge in three languages: English, Slovene and Croatian. In line with the state-of-the-art frame-based approach to terminology (Faber et al., 2012), the TermFrame project aims to propose a systematic domain model of karstology comprising concept categories, relations and definition frames. Such a domain model allows us to build a knowledge base for karst using a comprehensive collection of relevant texts as the primary source, and employing advanced methods of text mining and natural language processing to extract the information we do not find in existing reference works for karst terminology.
The aim of this paper is to present the advantages of our frame-based and datadriven approach to describing and representing karstology, especially if compared to existing karst terminologies. In Section 2 we thus first describe past attempts to collect and describe karst terminology, then proceed with a more detailed description of the data sources and methods used in the TermFrame project in Section 3. Section 4 presents the main outputs, namely the annotated collection of definitions which serves as the basis for the structured knowledge base, and its potential uses by experts, researchers, students and other karst enthusiasts. We conclude with a brief discussion and plans for future work.

KARST TERMINOLOGY -OVERVIEW OF EXISTING WORKS
Several attempts to organise international karst terminology have been made in the past. Among the first such attempts was the Glossary of Karst Terminology by Watson H. Monroe (Monroe, 1970). According to its preface, this glossary includes mostly terms used in describing karst geomorphologic features and processes as used in the literature of English-speaking countries, but a few of the more common terms in French, German, and Spanish are included, with references to the corresponding English terms where they are available. The glossary also includes simple definitions of the more common rocks and minerals found in karst terrain, common terms of hydrology, and a number of the descriptive terms used by speleologists. The glossary contains around 450 terms. Unesco's Glossary and Multilingual Equivalents of Karst Terms (1972) was launched by the General Conference of Unesco in order to promote cooperation in scientific hydrology research around the world. The glossary includes 227 terms with definitions in English, and translation equivalents in eight languages (French,German,Greek,Italian,Spanish,Turkish,Russian and Yugoslav 4 ). In addition, it incorporates a classification of karst terms in a separate chapter.
In 1990s, Cave and Karst Terminology (Jennings, 1997) was published as a result of the efforts of the Australian Speleological Federation (Matthews, Matthews, 1968;Jennings, 1979;Australian karst index, 1985). The glossary is a highly selective list of terms recommended for use within the borders of Australian karst research and does not aim to be a comprehensive collection of global karst terminology.
About the same time, the British Cave Research Association (B.C.R.A.) published and updated a dictionary that covers the general area of karst and caves, namely the Dictionary of Karst and Caves: A Brief Guide to the Terminology and Concepts of Cave and Karst Science (Lowe, Waltham, 2002).
Since then, many new terms related to karst in general have come into use throughout the world mostly related to the upsurge in environmentalism. A Lexicon of Cave and Karst Terminology with Special Reference to Environmental Karst Hydrology (Field, 2002) was published by the U.S. Environmental Protection Agency with the aim to unify karst terminology and serve as a technical guide for karst researchers. It includes karst-specific terms and terms related to the field of environmental karst.
Since Slovenian karst terminology is an important part of international terminology and karstology is among the few scientific disciplines that originate from Slovenian territory, we would expect important works by Slovenian authors in this field. Nevertheless, only three basic works covering the field of karstology have been published so far (Gams et al., 1962;Gams, Kunaver, Radinja, 1973;Šušteršič, Knez, 1995).
The first attempt to collect and systemize karst terminology in Slovenian was made in the 1960s in the form of scientific article. It was presented as a report at the symposium organised by the Association of Slovenian Geographers and the Slovenian Geological Society held in 1962 on the topic of karst terminology. The article was written by Gams, Kunaver, Novak, Jenko and Savnik, and published in the Geographical Bulletin, the Association of Slovenian Geographers' official publication, under the simple title Karst Terminology (Gams et al., 1962). The contents of the article are divided into sections based on the short papers presented at the symposium, each paper addressing a subcategory within the karst domain, e.g. larger karst landforms, karst hydrology, karst caves etc. The terminology is thoroughly described and discussed in the Slovenian language, and approved by the symposium's programme committee.
This contribution was followed by the publication of Slovenian Karst Terminology (Gams, Kunaver, Radinja, 1973) a decade later. The all-encompassing collection of karst terminology in Slovenian was published under the auspices of the Department of Geography (Faculty of Arts, University of Ljubljana). Approximately 200 core dictionary entries, consisting of karst terms and their descriptions are often further elaborated and expanded by related karst expressions and definitions. The dictionary entries and all accompanying parts of the dictionary are presented in Slovenian. The majority of entries, however, include English, French and German translation equivalents as well. The dictionary remains the most important reference in terms of karst terminology in Slovenia to this day since it has never been fully revised yet.
The third important work covering karst terminology in Slovenian, A Contribution to the Slovenian Speleological Glossary, was published as a scientific article in the Bulletin of the Speleological Association of Slovenia by Šušteršič, Knez (1995). The collection includes the explanation of 88 terms with references to Slovenian Karst Terminology, Slovenian Technical Vocabulary and to the Dictionary of the Slovenian Language, focusing on the latest developments in the field of speleology and related scientific fields. The presented entries do not include translation equivalents.
In both international (English) and Slovenian karst terminologies, the authors attempted to be as inclusive as possible in that their glossaries incorporated terms related to karst geomorphology, speleology, hydrology, and karst rock geology. The glossaries usually include a sufficient number of terms describing karst and do not differ significantly from each other in terms of coverage. However, there are major inconsistencies in the description of terms, reflecting the author's expertise and focus which may be either geological, hydrological or geomorphological, but also sometimes resulting from hereditary citing of definitions from older sources (e.g.: sifon je odsek rova, kjer sega skalni strop do vode / a syphon is a section of a passage where the cave ceiling is reaching water (Gams, Kunaver, Radinja, 1973); sifon je kolenasta poglobitev jamskega dna, kjer naj bi na krajšo razdaljo podzemska reka tekla ob pritisnjeni gladini / a syphon is a knee-shaped lowering of cave floor where a short section of the subsurface stream flows along a lowered watertable level (Šušteršič, Knez, 1995). Furthermore, traditional definitions typically focus on one or two selected attributes of a term rather than presenting a comprehensive overview of all known attributes.
Our approach aims to overcome these drawbacks. Firstly, we rely on data-driven methods to determine the relevance of terms. This means that we first compiled a balanced and representative corpus of texts including the above-mentioned glossaries which we use to extract terms and definitions (see Section 3.2). Thus, our coverage is more comprehensive and less subjective. Secondly, the frame-based approach defines a definition template, a so-called "ideal definition" for each concept category in our domain model. This allows us to generate term descriptions which contain all known attributes of a term, even if these attributes are not explicitly mentioned in any of the definitions. Finally, our approach is not aimed towards building a glossary but a knowledge base, the main difference being that all karst concepts are parts of a large knowledge network where the underlying structures reveal true facts about the domain.

Building the domain model
A systematic description of individual shapes, processes and materials is possible by combining existing geomorphological methods with a systematic and comprehensive approach. Amongst different approaches, we believe that the analytical geomorphological method (Pavlopoulos, Evelpidou, Vassilopoulos, 2009) is the most appropriate and the most systematic for the description of geomorphologic features and processes.
The analytical geomorphological method (Pavlopoulos, Evelpidou, Vassilopoulos, 2009) includes five basic aspects of analysis, namely morphographic or morphological, morphometric, morphogenetic, morphochronological, and morphodynamic. To this set of methods we added the morphostructural analysis (Gerasimov, 1946) which is not included in the classical analytical geomorphological approach, but is also crucial from the point of view of an integrated geomorphological approach.
The morphographic (or morphological) analysis contains the identification and qualitative description (documentation) of geomorphic forms and their distribution in the studied area or characteristic environment (geome) of occurrence. The morphometric analysis refers to the quantitative description of geomorphic aspects. The morphostructural analysis is a set of methodological approaches aimed at explaining the direct or indirect connections between today's relief and the structure of the Earth's interior, or to determine important elements of geological structures in the study area (Gerasimov, 1946). The morphogenetic analysis is a detailed description of the formation of geomorphic forms and includes processes, morphogenetic systems and mathematical simulations of relief design. The morphochronological analysis is the determination of the age of an individual geomorphic form on the basis of absolute and relative dates, correlations of sediments and geomorphic forms on the basis of their age and position. The morphodynamic analysis includes all the dynamic processes on Earth that form a relief. It is a study of geomorphic processes operating today and those processes that will be active in the future.
Top-level categories (Figure 1) indicate the type of individual elements in terms of geomorphological form (A. Landform) or process (B. Process). Since typical geomorphological or hydrological environments also appear in definitions, we defined them as geomes (C. Geome). In addition, we also encounter landforms, materials and their characteristics that are not directly related to karst geomorphology or hydrology but still contribute to domain knowledge (D. Element / Entity / Property), as well as methods of study (E. Instrument / Method). All elements are divided into subcategories according to their spatial distribution (A.1 Surface landform, A.2 Underground landform) and according to the predominant hydrological function (A.3 Hydrologic landform). Forms that are directly related to karst and could not be classified in any of the above subcategories were labelled as such (A.4 Other). We also divided the processes according to their mode of operation into transport (B.1 Movement), erosion or denudation (B.2 Loss), accumulation and aggradation (B.3 Addition) and transformation (B.4 Transformation). Abiotic (D.1 Abiotic) and biotic (D.2 Biotic) forms and processes and their characteristics (D.3 Property) were classified in category D. Under this category, we also include geolocation (D.3.1 Geolocation), which is of special importance in understanding karst geomorphology and hydrology. In the last category (E.) we used two subcategories that define the methods of study to instruments (E.1 Instrument) and methods (E.2 Methods). Relations are more or less closely tied to a more detailed interpretation of individual categories. The category Landform (A.) invokes relations linked to the geomorphological analytical method (Pavlopoulos, Evelpidou, Vassilopoulos, 2009) and defines morphographic (HAS_FORM), morphometric (HAS_SIZE), morphostructural (COMPOSITION_MEDIUM), morphogenetic (HAS_CAUSE) and morphochronologic (HAS_TIME_PATTERN) attributes of surface, subsurface and hydrological karst features. In addition to the relations that are closely associated to the geomorphological analytical method, we also use relations which spatially associate the categories with geomes (HAS_LOCATION) and geolocations (HAS_POSITION). The category of karst processes (B.) invokes semantic relations connected to the effects and results of these processes (AFFECTS and HAS_RESULT). The category of geomes (C.) is usually tied to the characteristic landforms, materials or groups of processes that shape them, so in addition to other semantic relations, their definition frequently lists typical karst elements they encompass (CONTAINS). The category defining the activities related to karst studies (E.) invokes the relations defining those activities (MEASURES and STUDIES). The category defining forms, processes and characteristics that are not directly related to karst geomorphology or hydrology (D.) may invoke all the listed semantic relations. In the event that semantic relations denote any other property of categories, we have defined them generally (DEFINED_AS, HAS_ATTRIBUTE).
The typical and expected combinations of categories and relations explained above constitute frames; cognitive templates which represent fragments of specialized knowledge about the domain (Faber et al., 2012).

Resources
Within the project we built English, Slovene and Croatian specialised corpora 6 . All three corpora are comprised of relevant contemporary works on karstology which were carefully selected. The corpora include specialised texts (books, articles, doctoral and master's theses, glossaries and dictionaries) from the field of karstology, whereby individual works partly overlap with one or several related fields such as geomorphology, geology, hydrology, speleology, biology etc. Since the exploration of differences between the international karst terminology in English and local Croatian and Slovene terminologies lies at the core of our project, we took great care to include all major reference works in English, e.g. Karst Hydrology and Geomorphology (Ford, Williams, 2007), Karst Hydrology and Physical Speleology (Bögli, 1980), Encyclopedia of Caves and Karst Science (Gunn, 2004) as well as other relevant works published in the past four decades of karst research (see Section 2). For Croatian and Slovene karstology, fewer comprehensive books had been published, we therefore included more PhD theses and scientific articles.
For definition extraction we used the Clowdflows definition extractor (Pollak et al., 2012). The tool tries to identify sentences which could be definitions on the basis of various language-specific patterns, e.g. X is a subtype of Y which […]. The definition candidates were later manually validated and only examples with valuable explanatory information about karst concepts were retained (yield ~ 20%). All definitions types (intensional, extensional, functional, paraphrase etc.) were considered, therefore not all obtained definitions have the traditional structure: the definiendum may appear in different positions in the sentence, the genus may or may not be present, the term may be defined only through its hyponyms etc. After validation the yield was 215 and 259 definitions for English and Slovene respectively.

From definitions to structured knowledge
As pointed out in Section 3.1, a systematic approach to describing karst phenomena would propose for each category of concept (e.g. Surface landform, Underground landform, Process etc.) a set of attributes which need to be specified in order to make the description complete. Such attributes include SIZE, FORM, CAUSE, COMPOSI-TION, FUNCTION, LOCATION or RESULT, but they vary depending on the type of concept we are describing. Thus, a surface landform should ideally be described through its FORM, SIZE, CAUSE, LOCATION and COMPOSITION or MEDIUM in terms of typical geological and/or geographical environment, but it will almost never be described through its FUNCTION or RESULT, as these can be expected in more dynamic karst entities such as hydrological forms and processes.
We can see from the example below that definitions in existing reference works focus on different aspects of the definiendum, but rarely list all of them. In a), beddingplane cave is defined through its SIZE (has not enlarged by growth into a major tube or canyon) and LOCATION (remained almost entirely on the bedding plane). In b), we have LOCATION and CAUSE (difference in susceptibility to corrosion in the two beds), and in c), we have LOCATION and FORM (elongate in cross-section).
a) The term bedding-plane cave is strictly applied to a passage that has not enlarged by growth into a major tube or canyon, but has remained almost entirely on the bedding plane. b) bedding-plane cave: A passage formed along a bedding plane, especially when there is a difference in susceptibility to corrosion in the two beds. c) bedding-plane cave: A cavity developed along a bedding-plane and elongate in cross-section as a result.
Our aim is to overcome such limitations of "natural" definitions and aggregate knowledge from different sources in order to create the most comprehensive concept description possible. The definitions we collected from different sources were loaded into the WebAnno annotation environment (Castilho et al., 2014) and manually annotated on several levels. For each definition we mark: • the definition elements: DEFINIENDUM, GENUS, DEFINITOR • concept categories: e.g. Surface landform, Underground landform (see Figure 1) • relations describing the concept, e.g. FORM, SIZE, CAUSES, LOCATION (see Section 3.1) Each definition was annotated by two persons and any discrepancies between the two annotators were later resolved by a domain expert. In addition to this, regular meetings of annotators and domain experts took place in order to discuss borderline cases and ensure the consistency of annotations.
The two examples below illustrate the result of multi-level annotation, where the term anchialine is defined through its form (pools with no surface connection to the sea), its contents (salt or brackish water) and its time pattern (fluctuates with the tides), and cave is defined through its origin or cause (natural; formed by solution of limestone), location (underground), form (room or series of rooms and passages) and size (large enough to be entered by man).

Figure 2: Examples of annotated definitions for anchialine and cave.
Despite the care taken to produce consistent and logical annotations, many contexts may have multiple meanings or could be assigned different relations. In the example below, the fault cave is defined through its location (developed along a fault or fault zone), but this also indicates the cause of its formation. In such cases the decision was to retain the most overt meaning and not to assign double or triple relations to the same part of a sentence.

RESULTS
At the time of writing this article, the annotation of English and Slovene definitions is complete and for Croatian still in progress. The English data set contains 844 defined terms and the Slovene one 903. For many karst terms the data set contains several annotated definitions which allows us to combine different attributes and generate a more comprehensive description of the concept.
The multi-layered and multilingual annotated database of definitions allows us to explore patterns of knowledge on a large scale, and to compare conceptualisations across languages. Using the visualization tool NetViz  which was developed specifically for the purposes of this project, we can draw graphs of the entire knowledge network or just of selected parts thereof. A visualization of the entire network of terms and their categories (Figure 4) will help the expert identify the most common groups of karst concepts and explore their members. For English, the largest group is centered around the category Underground landforms, followed by Surface landforms, Abiotic and Hydrological forms. Looking at the network for Slovenian ( Figure 5), we can see that the category of Geomes is more productive than in English, with 156 members as opposed to 103 in English.  Since analytical definitions usually contain the genus, i.e. the hypernym of the term explaining it according to the common pattern An X is a Y which…, it is especially interesting to explore the visualization of terms and their hypernyms. We can quickly find members of the class closed depression (Figure 6), and contrast it to the class depression. There are of course inconsistencies which stem from the fact that our database contains definitions from different sources, and some authors define polje as closed depression, others as karst depression and still others as depression. Finally, the structured knowledge base allows us to explore the relevant analytical aspects of selected terms as they are typically expressed in different languages. Thus, the surface landform uvala is in English defined mainly through its morphographic characteristics (with undulating floors, floored by sinkholes, large depression), and the morphogenetic aspect is also provided (coalescence of several dolines). In Slovenian, the focus is also on the morphographic attributes (v tlorisu nepravilnih oblik, v obliki skledaste vdolbine, dolasta ali vrtačasta), the morphometric attribute specified in relation to its related form (manjša od kraškega polja), while the morphogenetic aspect is not present.

Figure 8: Uvala and its attributes in Slovene.
By using the systematic domain model which predicts the typical attributes for each category of karst concept, we can generate structured and complete descriptions which inform the user of the most salient properties (Table 1). Such a structured knowledge base also allows us to query according to specific criteria, e.g. surface landforms above a specific size or landforms caused by movement of material. For the concepts where the complete set of attributes cannot be retrieved from annotated definitions, several experiments using state-of-the-art text mining and natural language processing techniques are underway in order to extend the manually constructed database and discover new elements of karst knowledge (Miljkovic et al., 2019;Vintar et al., 2020).

CONCLUSIONS
We presented the contribution of the TermFrame project towards a comprehensive representation of karst terminology and knowledge. The laborious and complex procedure of compiling the corpora, constructing the domain model, annotating definitions and aggregating knowledge into the final knowledge base required the concerted efforts of an interdisciplinary and multilingual team of experts, including linguists, terminologists, karst researchers, computer scientists and cognitive linguists.
The planned output of the project is a public website delivering the main results of the project through a user-friendly web interface. The basic level of information will provide search and browse functions through the TermFrame Karst Knowledge Base in all three languages. Upon submitting a query, the user will be presented with all the definitions of the query term from different sources, their synonyms and also graphic material. The basic level will be intended primarily for a wider audience and lower grade students interested in karst. Another level of querying the knowledge base will show a visual representation of the relationships between terms (categories) and semantic relations, thus providing the user with a more detailed and comprehensive overview and allowing for comparisons between languages.