THE INL DICTIONARY WRITING SYSTEM

The INL-DWS is a Dictionary Writing System (DWS) for compiling monolingual and bilingual dictionaries. It has been developed at the Institute of Dutch Lexicology (INL) since 2007 and is now being used for the production of a monolingual dictionary at INL and a bilingual dictionary at the Fryske Akademy. This paper describes the functionalities of the system, on the one hand, from a lexicographical point of view, and on the other hand, from a more technical perspective. The paper concludes with a short evaluation of the advantages and disadvantages of in-house systems versus off-the-shelf systems.


I N T R O D U C T I O N
The INL Dictionary Writing System (INL-DWS) originated as a 'homegrown' system which has been developed within the context of the Algemeen Nederlands Woordenboek1 (ANW) at the Institute of Dutch Lexicology in the Netherlands.The ANW is an online corpus-based, scholarly dictionary of contemporary standard Dutch in the Netherlands and in Flanders, the Dutchspeaking part of Belgium.As well as being an online dictionary through which a range of users can explore the Dutch vocabulary, the ANW is also a linguistic data resource from which especially language professionals can extract data necessary for their research.Consultation of the ANW is free.
Although the INL-DWS was originally developed within the context of a [73] particular project, the original set-up was such that the system can also be used for future projects.Within the last year, an effort has been made to isolate the parts of the software code that are specific to the ANW project and to make the code more generic so that it is easier to customise the software to new projects.
The INL-DWS system is currently being used to compile a Dutch-Frisian dictionary at the Fryske Akademy.
Section 2 describes the functionality of the INL-DWS from the point of view of a lexicographer.Section 3 gives a technical overview of the system and is thus more relevant for software engineers.Section 4 discusses the reasons for developing an in-house system instead of using an off-the-shelf one.

O V E R V I E W O F T H E I N L -D W S F O R T H E L E X I C O G R A P H E R
The INL-DWS system consists of two parts: an editor and a lexicographic workstation.The lexicographic workstation is basically a menu bar which appears at the top of the screen and allows lexicographers to invoke various tools and resources facilitating the editing process from raw material to finished dictionary article (Section 2.1).The editor is a program for editing dictionary articles (Section 2.2).Both are discussed from the perspective of the ANW project.

The Lexicographic Workstation
The menu bar of the lexicographic workstation looks like this: From left to right, the following menu items are provided: Offers a link to the editor (see Section 2.2).

 Corpora
Offers a link to corpus query systems including Dutch corpora, e.g. the template', whereas semagram refers to such a 'type template' populated with concrete word data.Each semantic class has its own predefined type template with its own slots.(Moerdijk 2008) [76] No data available The metadata for a lemma is edited in a separate panel by the lexicographer editing the entry (see Figure 2).This is a manual task.
7 For a description of the resource, see van der Vliet (2007).
[77]  Opening a lemma for editing opens the editor tool. [78]

The editor
The editor has a user-friendly interface.It has been designed in such a way that the lexicographers editing the entries do not need to learn any special markup language or to have any advanced computer skills.The editor window is divided into two panels, a navigation panel on the left and an editing panel on the right (see Figure 4).

A QUICK OVERVIEW OF COMPLEX ARTICLE STRUCTURES
The navigation panel uses a tree structure representing the article structure.
For definitions, collocations etc. the first part of the text is shown, so that it is immediately clear which element a label in the tree represents.Colours are used to indicate whether information is inherited from elsewhere.Blue typeface means that the information in the elements has been inherited.Information that can be inherited is shown in green.The inheritance feature will be explained in more detail below.The elements in the tree structure can be opened and closed at will.This is beneficial to the general overview of the lexicographer during the editing process, as the ANW, being a scholarly dictionary, has a rather rich microstructure.There are ten main categories, each subdivided into one or more subcategories, depending on the complexity of the data category.For instance, the main category 'Lemma' contains the subcategories 'Lemma form' and 'Lemma type'.In a number of cases the choice of a specific element in the main category determines the subcategories to be shown.If a lexicographer chooses the option 'noun' as the value for 'syntactic category type', he is shown the data sheet for nouns to complete (Figure 5), whereas if he had chosen 'verb', the data sheet for verbs would have opened up.The editor panel is for editing the dictionary entry.To support the lexicographer, different types of fields are used in the editor ranging from simple text input fields (e.g. for definitions) to select boxes (e.g. for lemma [80] type).Select boxes lead to greater consistency as they enable the lexicographers to unify the values in certain places in the microstructure throughout the whole dictionary and prevent them from introducing typing errors.Apart from offering lexicographers a clear overview of even a complex microstructure, the INL-DWS also supports the lexicographers in managing the structure of the entries.By right-clicking on an element in the navigation tree, a menu is opened allowing the lexicographer to add, delete and reorder elements or groups of elements (see Figure 6).When elements are added, deleted or reordered, the system automatically takes care of re-numbering the whole entry, as well as making the appropriate [81] changes to the sense numbers in any cross-references.Obviously, not just any element can be added, deleted or reordered.This is defined in the microstructure of the dictionary project.

CROSS-REFERENCES
A cross-reference module has been developed in order to define relations between entries (cf.work on Vernetziko (Meyer 2011)).Relations are always defined between two elements, a source and a target element, and they can be only one-sided in the INL-DWS.Bidirectional relations are not yet supported.
Cross-references can be inserted only at predefined places in the dictionary entry (e.g. in the synonym field).A pop-up window appears and allows the lexicographer to create a reference to another entry by typing the target lemma in the lemma field of the pop-up.This lemma field supports an autocomplete function to make the process easier for the lexicographer.As soon as the lemma has been typed, all numbered meanings of the target lemma, as well as any idioms or proverbs including the target lemma, are loaded in the pop-up, allowing the lexicographer to choose the desired one.
A full overview of all cross-references in the dictionary is given in the cross-reference overview window which can be invoked from the menu bar within the editor.
[82] Figure 7 shows the cross-reference overview for the article 'paard' (horse).It shows the source lemma ('Bronartikel'), the type of cross-reference ('Verwijzingstype', i.e. in which element the cross-reference can be found), the target lemma ('Doelartikel'), the target type ('Doeltype') and a description of the target lemma ('Beschrijving').There are 31 cross-references from the entry for 'paard' in the dictionary database.
The cross-reference overview can be filtered on the basis of the spelling of the source lemma, its cross-reference type, the target lemma, its cross-reference type, and/or the state of the lemmas in the lemma list.
Slovenščina 2.0, 2 (2014) [83]  It is also possible to preview and export dictionary articles either as XML, HTML or as Word documents.

LINK BETWEEN DWS AND CQS
One of the advantages of having full control over the system is that we can easily build in options for information exchange with other applications such as a Corpus Query System.For the ANW project such a link has been built to the Sketch Engine (Kilgarriff et al. 2004).The lexicographers use the Sketch Engine functionality has been fine-tuned to the particular needs of the ANW dictionary in such a way that not only example sentences, but also the related metadata are automatically copied from the corpus into the right fields in the editor tool.
In the future, links to other INL databases (e.g.spelling, morphology) are foreseen, so that information can be shared between them.

INHERITANCE
A special feature of the INL-DWS is 'inheritance', which is used in the ANW dictionary project.Each dictionary article contains a general part before the sense units which is called the 'header'.In the ANW, information from the header is automatically inherited to the different sense units in the article.
Inherited values are marked in blue, whereas values that can be inherited are marked in green.An inherited value can be overridden lower down in the entry.
In that case, the new value is shown in black.
Inheritance seemed like a useful feature which would save precious editing time, as information such as word class and spelling is often shared by different sense units.However, practice has shown that the lexicographers often forget to check the inherited information and consequently do not always adjust it when needed.To prevent such mistakes, it is actually easier to complete or copy the information to the right place, rather than having to remember to adjust any incorrect information.Therefore, this functionality has been switched off in the Dutch-Frisian dictionary project.

T E C H N I C A L O V E R V I E W O F T H E I N L -D W S
In this section, we give a flavour of the technical details of the INL-DWS.For full details, the reader is referred to the software documentation.The INL-DWS is written in Java and uses Swing for the graphical user interface (GUI).The dictionary articles and their metadata are stored in a central MySQL database in Unicode UTF-8 encoding.The article XML is simply stored as a binary column.It would of course be desirable if the XML could be stored in a way [85] which makes fast searches possible (i.e. a dedicated XML database or an XML column type in a DBMS), but this possibility has not been explored yet.
The editor interface (as discussed in Section 2.2) is automatically generated from the XML Schema for the project.This allows us to identify whether elements in the microstructure can be best represented as text fields or selection lists, and whether the input can be validated while the user is typing.
Some aspects of the interface do not follow directly from the XML Schema but are configured in separate configuration files: making certain text fields larger than others, replacing certain selection lists with text fields with automatic suggestions, making certain fields read-only, etc.For instance, the definition and the mini-definition elements in the ANW microstructure are both free text fields.However, as the name suggests, the definition element will normally contain more text than the mini-definition and it is thus appropriate to show it as a larger input box in the interface.
This system works well.Changes in the XML schema update the interface automatically; no additional programming is required.
The general formatting of the dictionary article is realised by XSLT and can not be changed by the lexicographers.However, sometimes special formatting within certain text fields is required, e.g. in example sentences.This formatting is currently done using tags, e.g.<b> for bold face.These are the only kinds of special marking the lexicographers need to know.It would of course be nice to offer limited WYSIWYG editing of certain fields, but this functionality has not been built in yet.
The INL-DWS application does not need to be installed on the lexicographer's computer; all that is needed is a shortcut to the application file on a network drive, and for the intended user their network username needs to be added to the list of authorised users in the database.
Being based on Java, the INL-DWS is expected to work on Windows, Mac and Linux (Windows and Linux have been tested; Mac has not, but should not [86] present any major problems).
As the INL-DWS was originally designed for a monolingual Dutch dictionary, using the software for a different project will require a certain amount of customisation.A different XML Schema is certainly needed to reflect the microstructure of the new dictionary.Certain content in the MySQL database needs to be changed (for example, which parts of dictionary articles require a separate completion state).Finally, it is likely that some of the Java code will need be customised.The 'hooks' for customisation have, however, been isolated to a single class, making this easier.
The INL-DWS is complemented with a number of Java programs which support the import and export of data.For instance, there is a program which has been used to import spelling data from the spelling guide in the ANW and there are programs to extract individual data categories from the dictionary articles such as all neologisms and their earliest date.Each of these programs are different enough not to be reusable as-is, but the common code between these programs has been collected into reusable classes where possible, making it easier to write new scripts for importing and exporting data.
Although the INL-DWS offers lexicographers the possibility to preview dictionary articles as HTML, the system does not allow users to generate a complete online dictionary.The online ANW dictionary is a separate application (Tiberius and Niestadt 2010).
New features will be added to the INL-DWS when needed.As the database grows, options such as user-friendly entry filtering and bulk correction become more attractive features.

I N -H O U S E V E R S U S O F F -T H E -S H E L F S Y S T E M
The development of the INL-DWS started in 2007 and was instigated by the need to replace the dictionary editor which was then used within the institute.
The old dictionary editor used Altova's Style Vision in combination with the [87] Altova XmlSpy editor.During the course of the project, the program was refined until the disadvantages of the approach of customising commercial software became too much of an obstacle.One of the disadvantages of using Altova XmlSpy was that it turned out to be difficult to link to external databases and other software applications (including Corpus Query Systems).Another disadvantage was that, Altova XmlSpy being a commercial product, we were forced to keep up with its release schedule.At regular intervals, new versions of the software were released.However, more than once these new versions turned out to be slower than the previous version or there were compatibility issues with certain aspects of our customisation of the software.Sticking to an older version of the software was not an option either, as sooner or later an update to a newer release would be necessary.
When the need for a new system became clear, a comparative assessment was made of developing an in-house system (based on the older system), using an open-source system or buying a commercial product.Ultimately, the development of an in-house system was chosen for a number of reasons which are discussed in Niestadt (2009).De Schryver (2011: 648) argues, however, that this decision is questionable, particularly as many of the required features given in Niestadt (2009) such as the need of a clear overview, the possibility to inherit information and the need to build in project-specific functionality, already exist in off-the-shelf tools.
Indeed, commercial systems seem to have boomed in recent years.They are quickly developing away from pure editing systems and/or authoring tools towards increasingly versatile, multifunctional 'all in one' tools that work as a dashboard from where a series of processes and tasks in the dictionary production process can be controlled, managed and implemented (Abel 2012: 104).However, in 2007, when the development of the INL-DWS started, the situation was different and less clear-cut (see also Mangeot 2006: 185-186). [88] Building in inheritance may have been possible in Tshwanelex (Tlex) 8 using the built-in scripting language (Lua) and TLex may be maximally extendable as de Schryver (2011: 648) writes.However, whichever way you look at it, a serious amount of customisation would have been required to tailor TLex or another off-the-shelf package to the ANW dictionary project.The amount of customisation is also mentioned in Barbierik et al. (2014) as one of the main reasons for developing their own system.
Another important consideration for choosing an in-house system is the advantage of having full control over the software (Niestadt 2009;Barbierik et al. 2014).Requests for changes can be processed and implemented almost immediately as one is not dependent on communication with an external party where one is only one of many customers.The unsatisfactory experience of being dependent on an external party with the old editor system was probably the key factor for the ANW in deciding to develop its own system.
Furthermore, the price tag of commercial products is often mentioned as another decisive factor in favour of in-house development (Barbierik et al. 2014;Abel 2012).
So although publishers have tended to switch to off-the-shelf DWS packages (e.g. the Oxford English Dictionary uses a customised version of the IDM DPS system since 2005 (Atkins and Rundell 2008: 114)), the in-house solution still seems to be the most common approach in academic and non-commercial contexts (cf.elsewhere as we think that building a completely new dictionary writing system from scratch in this day and age is a bad choice.It is much better to start from an existing, freely available system and add the features you need.If these additions are kept generic and contributed back into the system, others can benefit from them as well, and the lexicographic community can together create a dictionary writing system on par with commercial ones, but with full control over each aspect of it, and with the possibility of customisation.The European Network of e-Lexicography9 can also play an important role in this.
More and more in-house systems are made available as open source products these days (EELEX10 , Dictionary System DWS11 , Viennese Lexicographic Editor12 , Matapuna13 , etc.).The INL would also be happy to share its INL-DWS software (and the lessons learned while developing it) with any interested parties.

C O N C L U S I O N
In this paper, we have discussed the INL-DWS.Although the system was originally developed within the context of a particular project (i.e. the monolingual Dutch ANW dictionary project), the set-up has been such that the system can also be used for future projects.Within the last year, an effort has been made to isolate the parts of the software code that are specific to the ANW project to make the software more generic and easier to customise to new projects.The INL-DWS system is currently used at INL for the ANW project and at the Fryske Akademy for the compilation of a bilingual Dutch-Frisian dictionary.


Style Guides and User Manuals ('Handleiding') Contains links to the user manual of the INL-DWS as well as to editorial guidelines.Dictionaries and other reference resources ('WDB e.d.') Contains links to online dictionaries (e.g.WNT, OED, elexiko), encyclopedias (Encyclo, Wikipedia) and other reference resources (e.g. an acronym finder).This menu item also contains a link to a definition panel, which can be used to invoke the definition of a lemma in two existing Dutch dictionaries (i.e.WNT and Van Dale Groot Woordenboek van de Nederlandse Taal). Lemma lists ('Nomenclatuurlijsten') Contains links to the full lemma list of the ANW corpus 3 and the resulting candidate lemma list. Notes/Memorandums ('Nota's') Contains reports on specific topics that are relevant for the editing of the dictionary, e.g. a report on abbreviations, on the use of labels, on collocations, etc.  Templates ('Sjablonen') Contains information related to the semagram 4 in the ANW.[75]  Linguistics ('Taalkunde') Contains links to linguistic resources: Haas and Trommelen (1993) for morphology and Haeseryn et al. (1997) for syntax. Web Contains links to search engines, e.g.Google, WebCorp 5 , the Wortschatz-Portal 6 .o Opens the administrative tool, showing the lemma list together with metadata.o Minimalises the workstation menu.o Closes the workstation.The administrative tool gives an overview of all the lemmas in the dictionary database.This overview can be filtered by status (i.e.online, goes online, to chief editor, being edited, list of neologisms), orthographic features (i.e.lemmas beginning/ending/containing a particular letter), editing lexicographer, and time of editing.The administrative tool also helps to keep track of the progress of the project.It shows which lemmas are currently being edited by different lexicographers and as such are locked for editing by others.

Figure 1 :
Figure 1: Administrative tool with lemma overview.The overview also indicates for each lemma whether it occurs in other word lists (e.g. the Referentiebestand Nederlands 7 (RB), the spelling list (GB), the frequency list (FL) by Tiberius & Schoonheim (2014)), the initials of the lexicographer who last edited it ('red'), when it was last edited ('bewerkt'), what its status is ('fase') and its metadata.The metadata is marked by abbreviations, e.g.SP stands for spelling, UI for pronunciation and WV for morphology.The status of each of those is indicated by means of pictograms: Data is under construction Data has been completed

Figure 2 :
Figure 2: Metadata panel.A lemma can be opened for editing by right-clicking on the lemma in the overview (see Figure 3) or selecting the Article ('Artikel) item in the menu bar of the lexicographic workstation.

Figure 3 :
Figure 3: Selecting a lemma from the lemma list.
2.2.4 DIFFERENT VIEWS AND EXPORT OPTIONSThe interface of the INL-DWS offers the lexicographers the possibility of different views.Articles can be edited using the 'whole article' mode (as shown in Figure4) or the 'explorer' mode where article elements are shown separately, i.e. 'Lemma' (as shown in Figure8).

Figure 8 :
Figure 8: Explorer view of the article screen.
to search for example sentences in the ANW corpus which has been loaded into the Sketch Engine.Selected examples are copied onto the clipboard in the Sketch Engine.As the INL-DWS recognises example sentences from the Sketch Engine, they can be copied and pasted in two clicks into the INL-DWS.This [84] Niestadt (2009)de Schryver 2011: 647;Barbierik et al. 2014).We do not agree with de Schryver that this is necessarily a bad situation.As noted inNiestadt (2009), we believe that innovative scientific research requires new software with new possibilities.It is therefore important to not only rely on ready-made software packages, but to keep control over possible technical solutions by also developing one's own software.At the same time, it is important to keep one's eyes open for what is happening 8 http://tshwanedje.com/tshwanelex[89]