AN ANALYSIS OF SIMPLIFICATION STRATEGIES IN A READING TEXTBOOK OF JAPANESE AS A FOREIGN LANGUAGE

Reading is one of the bases of second language learning, and it can be most effective when the linguistic difficulty of the text matches the reader's level of language proficiency. The present paper reviews previous research on the readability and simplification of Japanese texts, and presents an analysis of a collection of simplified texts for learners of Japanese as a foreign language. The simplified texts are compared to their original versions to uncover different strategies used to make the texts more accessible to learners. The list of strategies thus obtained can serve as useful guidelines for assessing, selecting, and devising texts for learners of Japanese as a foreign language.


Introduction
Reading is one of the bases of second language learning, and it can be most effective for the purpose of improving a reader's language skills when the text being read is not only appealing but also of the appropriate difficulty level for its reader.The development of reading skills through extensive reading can be supported on one hand by selecting appropriate material from existing texts and grading them according to objective or subjective readability criteria; and on the other hand by Another early attempt at measuring the readability of Japanese texts was made by Sakamoto (1962), who manually analysed Japanese language textbooks for elementary school grades 1 to 6, using school grades as the scale of difficulty, and found that the ratio of frequent vocabulary, sentence length and the proportion of kanji characters in the text correlate with school grades.
A similar way of estimating the difficulty of written sentences is also proposed in a writing stylebook by Yasumoto (1983), who uses the average number of characters per sentence and the percentage of Chinese characters as indicators of text difficulty, but does not combine these two factors into a single formula.
Two decades after Sakamoto's research, when computers were already available for lengthier calculations, Tateishi et al. (1988aTateishi et al. ( , 1989b) ) proposed the first readability formula for Japanese on the basis of four surface characteristics: the proportion of types of characters (Roman letters, hiragana, katakana and kanji); the length of continuous strings of the same type of character; the length of sentences; and the number of commas per sentence.
A more recent and very productive stream of research is work on readability formulae to predict the difficulty level of texts for Japanese school-children, to be used in mother tongue education (Shibasaki & Tamaoka, 2010).Other formulae have also been developed by Sato et al. (2008, for young native speakers of Japanese), and Lee & Hasebe (2016, for learners of Japanese as a foreign language).Another approach to the readability of Japanese, proposed by Sano and Maruyama (2008) is based on Halliday's concept of lexical density within the framework of Systemic Functional Grammar (Halliday, 1993).In this approach, lexical complexity is defined as the ratio of content words to ranking clauses in a text.
A second stream of research on Japanese readability is work on information accessibility and text simplification, aimed at facilitating communication with handicapped and elderly readers (Ichikawa, 2006), and paraphrase generation to assist handicapped readers with limited linguistic capabilities (Yamamoto et al., 2000, Inui and Yamamoto 2001, Inui and Fujita 2004, Nakano et al. 2005, Sato et al. 2004).
A third stream of research which bears on readability is work on computer-aided text revision, where readability criteria are used to highlight potentially incomprehensible passages and suggest more readable substitutions (Hayashi, 1992, Inui and Okada 2000, Ono et al. 2006, Oono and Inazumi 2007).These projects often use advice on clear writing from style manuals such as Kabashima 1979, Kinoshita 1981, Honda 1982, Mishima 1990 etc., which do not deal with numerical measurements of readability, but give hints on what factors can affect readability and should be considered in its measurement.Linguistic factors which have been found to correlate with text readability in previous research can be divided according to the traditional levels of linguistic analysis: script (ratio of character type, punctuation, phonetic guides etc.), vocabulary, syntax (sentence length, clause length, ellpisis etc.), text and discourse (length, cohesion etc.).Statistical correlations between these factors and collections of graded texts have been described in previous research.However, the factors influencing the readability of a text for learners of Japanese have not been yet thoroughly researched.The following sections present a comparison between simplified texts and their originals and an analysis of the strategies used in this process.

Data
The texts are reading passages in a textbook for intermediate learners of Japanese, the second in the set of textbooks developed by the International Student Center of Sanno University: 日本語を 楽 しく読 む 本 ・ 中級 Enjoyable task reading in Japanese: Intermediate, published by Bonjinsha in 1991 and reprinted multiple times, a popular textbook for reading instruction.
These texts were chosen because they are one of the very few available collections of pairs of authentic and simplified Japanese texts targeted at foreign learners of Japanese.
The reading passages included in the textbook were selected and simplified by the textbook authors, experienced teachers of Japanese as a foreign language.Selection criteria, as stated in the foreword, were: content (that should be interesting to adult learners of Japanese: worth reading, intellectually challenging, ), text type (as varied as possible, including narratives, expository and scientific writing, in order to offer learners the opportunity of practicing different reading strategies).Another criterion that is not stated in the foreword but was evidently applied, is length: not exceeding the length that can be read in a 90-minute lesson.The longest texts are approximately 1300 characters long, spanning one to two pages.
The foreword mentions that texts were rewritten for their target audience, learners of Japanese, while the afterword mentions that all textbook material was developed and used for two years in the Japanese course of Sanno University before being re-edited for publication in book form.Only vocabulary is mentioned in the foreword as a simplification criterion, but it is conceivable that strategies applied to text rewriting were based on the authors' experience as language teachers and empirically verified or found to be useful in their language classes.
The textbook has been used by the present author for Japanese language instruction in a class of 2nd year students of Japanese and received a positive response from the students, indicating that it is a good example of readable writing for students of Japanese.The pairs of texts were also given to read to a group of eight advanced learners of Japanese, who were asked to choose the easier in a pair of texts, the original and its simplified version.All participating students indicated the simplified versions as the easier to read.
In the forewords to each textbook in the series (この本を使う先生へ To the teachers using this book), the authors mention vocabulary as the main criterion used both when assessing the difficulty of texts included and also when rewriting texts for their intended readers.All texts in this textbook series are graded from one to three stars (one star indicating the easiest texts and three stars indicating the most difficult texts) according to the percentage of vocabulary included in the text but not present in the vocabulary lists used as a yardstick, and thus expected not to be known by readers at the given level.These percentages and vocabulary lists are shown in Table 1 on the next page.In the preface to the first volume the authors explicitly mention that vocabulary was overall the main criterion used in assessing text difficulty, while adding that the average length of sentences was also used as a secondary indicator of structural complexity, but no concrete data are given for these aspects of complexity.In the prefaces to the second and third volume in the series, only vocabulary is mentioned as the yardstick for assessing text difficulty.
Similarly, the level of proficiency which is expected from readers of each of the three volumes is defined in terms of hours (or months) of Japanese instruction received, which is supposed to reflect their vocabulary knowledge: readers who have studies Japanese for a certain period of time are expected to know a certain number of words, which should approximately correspond to the vocabulary prescribed for a certain level of the Japanese Language Proficiency Test (JF and AIEJ, 2004).
Table 2 shows the number of words which readers (learners of Japanese) are expected to know after different periods of study, as stated in the forewords.As can be seen from the above descriptions, the authors of the textbooks have carefully controlled the vocabulary used in the reading passages, considering it the main factor of text difficulty.
Each reading passage is also preceded by lists of 10 to 20 keywords used in the text, with exercises to learn or reinforce vocabulary knowledge, including written form (Chinese characters), morphological, syntactic and collocational patterns, again emphasising the importance of depth and breadth of vocabulary knowledge for reading comprehension.
The textbook is divided into 9 chapters, each chapter containing one or two reading passages.All reading passages were analysed, except reading no.4, where the original text was in English and only the simplified text used as reading material was in Japanese.
The following table presents the data used in this analysis: the titles of the simplified passages as they appear in the textbook, the title of their originals, the length of both (expressed in number of characters) and their sources.

Procedure and results
All pairs of original and simplified texts were scanned, OCR-processed and the resulting files were manually checked to correct OCR errors.Pairs of files were then automatically compared using the document comparison software JDiff X (Matsumoto, 2010), all differences found were transcribed into a spreadsheet file and marked according to type, linguistic level and content of modification.Modifications within the same sentence or clause which stem from different rewriting strategies, or are carried out at different linguistic levels, were counted separately.For example, the following rewriting of one original sentence into shorter sentences involved multiple strategies at distinct linguistic levels.
Original sentence:

「亡くなった飲み友だちと約束してね、僕が飲みに行くときは、必ずオレ の分も注文して飲んでくれという遺言を実践しているんだ」
Literally: Having made a promise to a drinking pal who's dead, I'm executing his will that says, "always order my part too and drink it" when I go drinking.-Quotation marks are not used in Japanese.
Equivalent modified sentences:

「先週、僕の親友が亡くなったんだが、彼が亡くなる前に約束してね」
Literally: A good friend of mine died last week, and having made a promise before he died ...

「僕が飲みに行く時は、必ず、彼の分も注文して飲むということになった んだ。それで、その約束を実行しているってわけさ」
Literally: ... it was decided that when I go drinking, I would always order his part too and drink it.
Firstly, one strategy was simplification at the syntactic level: both adnominal clauses in the first and last part of the original complex sentence (亡くなった飲み友だち drinking pal who died; ... オレの分も注文して飲んでくれという遺言... the will that states always order my part too and drink it...) were split into separate simple sentences, avoiding adnominal modification, a known hurdle for learners of Japanese.
Secondly, a simplification at the discourse level retained the same entity (僕, the first person narrator) as the subject of all clauses, avoiding the shift from the first person narrator (僕 boku -I) to the dead friend (using a more informal first person pronoun (オレ ore -I) within a short clause of reported speech (not marked by quotes: 必ずオレの分も注文して飲んでくれ ore no bun mo chuumon shite nonde kurealways order my part too and drink it) and then back to the first person narrator (遺言 を実践している yuigon o jissen shite iru -I am executing his will), which could be confusing.This simplification also brought with it the omission of the very informal pronoun オレ (ore -I), thus resulting in a standardisation of register.
Thirdly, at the semantic level, two pieces of information were made more explicit: a concrete time setting (先週 senshuu -last week) was added, and the pronoun 僕の (boku no -my) was added to the noun 飲み友だち／親友 (nomitomodachi / shin'yuu -drinking pal / good friend).
Fourthly, at the level of vocabulary, three simplifications were made by substituting the less common word 飲み友だち (drinking pal) with a less specific but more common one: 親友 (good friend), and the words 遺言を実践 (yuigon o jissenexecute a will) with 約束を実行 (yakusoku o jikkô -keep a promise).
Fifthly, two explicitations occurred at the script and punctuation level: the word と き (toki -when) written in hiragana was rewritten with its commonly used and unambiguous Chinese character 時, and a comma was added after the adverb 必ず (kanarazu -always).
One further boundary explicitation was carried out by inserting a back-channelling expression (はあ Haa -Oh) by the other participant in the conversation, in the middle of the longest remaining sentence.
In all such cases of multiple modifications, each modification was counted and transcribed separately, resulting in a list of 815 modification occurrences in the whole corpus.All transcribed modifications are reported in Appendix 1. Repeated modifications of the same item: e.g. the rewriting of わたし as 私 for three times in the same text was counted as 3 modifications; in such cases, the modification was transcribed once and the number of modifications was noted in the second column of the table in the appendix.

Analysis
Differences found between the simplified texts and their originals were grouped into three categories: simplification (including deletion), explicitation and standardisation (including visualisation).Strategies belonging to these categories were found to be used at different levels of linguistic analysis: from script, to vocabulary, morphology, syntax, semantics, to discourse and style.Let us consider each one in turn.

Simplification
Strategies of simplification were the most commonly used, amounting to 472 (of which 96 deletions) out of the 815 modifications found.

Script simplification
Script simplification occurred a few times, where non-standard or low-frequent Chinese characters were rewritten with hiragana:

Vocabulary simplification
Vocabulary simplification was the most frequent of all changes, obtained by:  substituting less common with more common content words: Difficult words are sometimes substituted with an explanation or definition: 古典 → 昔の文学作品; sometimes even substituted with words with a completely different meaning, if it does not change the overall gist of the text, such as 語学 instead of the less frequent word 手工芸 as an example of a hobby: Often, the substitution of a vocabulary item brings with it also syntactical modifications, such as: Vocabulary simplification at times also involves a change in cohesive devices, such as deictics instead of synonyms or paraphrases: 江戸時代に趣味として蓄積してき た高度な教養 → このような高度な教養; 店のおやじさん → 彼.

Syntactic simplification
Syntactic simplification was also frequent, by means of:  dividing sentences with coordinate clauses into separate, shorter sentences:  separating subordinate clauses and turning them into separate sentences, especially in the case of adnominal modifiers: The separation of complex sentences into shorter, simpler ones, at times resulted in (or was motivated by a desire of) bringing subject and predicate of the original complex sentence nearer to each other:

Discourse simplification
Discourse simplification was obtained in three cases by maintaining topic continuity and avoiding topic-shift from one agent to another:

Deletion
Deletion was the most drastic form of simplification, used quite often: 95 instances of deletion were found, including deletion of: In some instances, whole paragraphs were omitted, such as the underlined part in the following example, which includes culturally-bound terms, where not only the words, but also the words' referents are probably not known to learners, and at the same time, being only exemplifications of the previous general statement, are not essential to convey the general meaning of the passage:

Explicitation
Strategies of explicitation were also very frequent: 203 instances of explicitation were found, encompassing all linguistic levels.

Semantic explicitation
Semantic explicitation was the most common, occurring in 110 cases, such as:  adding concrete time or place settings, sometimes even with an extensive description:  using a more specific word instead of a more general hypernym, e.g.: とうとう、 → 次の日、; 連絡をとった → 電話をした; 店のおやじさ ん → 店の主人; こいつ → この客; やっている → 経営している  using a hypernym and adding a definition: ... ためになされた法行為であって → ... ための行為であった。つまり、 法律的な行為であった; or adding a hypernym to a word that readers might not know, instead of a definition: 水割りを → ウィスキー、水割りを;  in one case even adding loan-word synonyms as furigana (here written in parentheses):

利己的遺伝子の乗り物 → 利己的遺伝子［セルフッシュジーン］の乗り 物［ヴィークル］
Almost half of the semantic explicitations occurred at the script level (42 cases): words that were written in hiragana in the original text, but can be and usually are written with Chinese characters, were rewritten using these characters, which made them less ambiguous, both visually, in terms of word delimitation, and semantically, distinguishing between homophones:

Boundary explicitation
Boundary explicitation was the next most common type of explicitation (70 cases), mostly by means of added punctuation: Boundary explicitation was also obtained by splitting long complex sentences into shorter ones, which also implies syntactic simplification, as mentioned in the previous sub-section, and by using Chinese characters instead of hiragana where possible, as mentioned in the previous subsection, to mark the delimitation between words, which is not marked by blank spaces in Japanese standard script.

Syntactic explicitation
Syntactic explicitation was obtained by:  adding an omitted argument to a predicate:  substituting an intransitive expression, where all agents are not immediately obvious, with a transitive one, where agent and patient are more clear (here the semantic value of the verb is also more specific): 一杯目が終わったら → 一杯目をお飲みになったら;  using polite or humble forms which disambiguate the subject of the predicate:  adding particle の to split a compound noun into a noun with a nominal modifier: Other, less common cases of explicitation, were cohesion and phonetic explicitation.

Standardisation
The third strategy used in the adaptation of texts for foreign language learners was the use of more standard or basic linguistic forms, i.e. forms which are usually learned at the beginning of Japanese language courses, instead of stylistically marked forms, which are usually learned later.128 modifications were counted in this category, including the following means.

Tense levelling
Changing single predicates in non-past form to past form in texts which are otherwise written in the past form, to make the tense uniform throughout the text: 確立されるのは → 発見されたのは;

Formality levelling
Formality levelling from de-aru to plain style: であった → だった; or from plain to formal style: あった → ありました ; だから → ですから; in texts which are otherwise written in this style, to make the style uniform throughout the text

Visualisation:
Throughout the texts, numbers written in Chinese characters in the original texts, printed vertically, were replaced with arabic numerals in the textbook reading passages which are printed horizontally: One modification was found that does not clearly belong to any of the four categories proposed, but could tentatively be categorised as standardisation, or could also be termed familiarisation or domestication.It was only used in one text: the setting of a story, originally happening in New York to a Mr. Steinberg, apparel vendor, was reset in Ginza, one of the most famous Tokyo districts, with Mr. Sato, publisher, as the main character: ニューヨーク → 東京の銀座; スタインバーグ → 佐藤; 小さな衣料 品会社をやっている → 小さな出版会社を経営している.

Other modifications
Some modifications were found for which no clear motive could be guessed: they may have been made for stylistic purposes, according to the rewriter's tastes, or may be the results of multiple modifications, where the original motive became blurred in subsequent modifications.One substitution was probably just a spelling mistake, resulting in a colloquial きてます instead of きます in the sentence: Other cases where the motive for the substitution were not clear were:  one separation of a clause indicating reported speech into a separate paragraph, probably for dramatic effect:  the substitution of a more standard full stop with a less standard comma after a polite predicate:  one substitution of a causal connective with a more polysemic and not easier connective: In six cases, commas separating phrases in short sentences were deleted, which is slightly surprising, given that in other 39 cases, commas were added in such positions:

Discussion
As could be seen in the previous section, multiple strategies were used when rewriting texts which were originally written for native speakers of Japanese, to be included in a reading textbook for intermediate learners of Japanese as a foreign language.A summary of all modifications, counting their number at different levels of linguistic analysis and by type of strategy, is given in the following tables.While the quantities of modifications on different linguistic levels cannot be objectively compared, since they refer to linguistic elements which occur in different scales of magnitude (the number of elements of vocabulary in a text is always larger than the number of phrases, clauses, sentences and paragraphs, thus making a comparison impossible), it is still interesting to see how modifications were made on all levels, multiple times.
The modifications at different linguistic levels confirm the central role of vocabulary as declared in the foreword to the textbook and as could be inferred from the structure of the textbook containing many vocabulary exercises.It is not, however, the only level at which modifications were made: a considerable number of modifications was made at the script level, and all other linguistic levels were also touched by the modification process.Vocabulary modifications in many cases (167) brought with them also syntactic changes, and other aspects of the text were modified irrespective of the vocabulary used: many were structural modifications, touching syntax and discourse, to simplify syntactic and discourse structures or make them more explicit, cohesive devices were introduced, and stylistic changes (standardisations) were made at the level of formality.
As for type of strategy used, simplification was the most common strategy, accounting for more than half of the occurrences.It occurred at the level of vocabulary, where less frequent words were substituted with more common synonyms, explanation, definitions or paraphrases, at the level of morphology, where less common predicate forms were substituted with more basic ones, at the level of syntax, where long sentences with coordinate and subordinate clauses were split into smaller units, at the level of discourse, where roles were switched to maintain topic continuity, paragraphs of narrative were divided and shorter conversation turns introduced in dialogues.
However, alongside simplification, explicitation was another strategy that should not be overlooked, as it accounts for one quarter of the number of modifications, indicating that the authors of the rewritings considered it a useful device and found it useful when rewriting texts for their students.
It was used on the semantic level, adding information that could otherwise be inferred from the text, or cultural background that is not likely to be know to learners, or just inventing concrete settings to help readers reconstruct the narrative being told.Semantic disambiguation also occurred at the script level, where strings of hiragana were often rewritten in Chinese characters to disambiguate homophones.Structural explicitation was also observed at different levels: boundaries between linguistic units were made clearer by the use of punctuation, script or layout; omitted predicate arguments were made explicit, polite forms were used to disambiguate the subject of the sentence, and some phonetic information was added by means of rewriting Chinese characters in hiragana or by adding furigana.
The third strategy used, standardisation, was used at the script level, to standardise punctuation and to make the text visually more familiar (using Arabic instead of Chinese numbers), and at the discourse level, both to uniform the use of the same tense or level of formality within one text (choosing one of two possibilities, such as past/non-past, or formal/informal, which are both known to learners), or to standardise the text as a whole by removing marked forms which are typical of less standard registers (very colloquial, literary etc.) and not likely to be known by intermediate learners.

Conclusion and further work
Overall, it could be seen that the rewriters used some strategies which could be applied to the simplification of texts for most weak readers of Japanese, not only foreign learners of Japanese: short sentences and frequent vocabulary are two aspects of language that have been found to be easier to read in most research on readability in different languages.
However, it is also clear that the authors (rewriters) of these texts, teachers of Japanese as a foreign language, were conscious of the typical progression of formal Japanese language instruction, and tended to prefer vocabulary, morphology and syntactic structures that are learned earlier in language courses.These are very often also the most frequent forms in the Japanese language as a whole and learned earlier by Japanese children (especially in the case of content words), but some linguistic elements, such as standard polite language (as opposed to very colloquial or very formal speech) are typical of beginning language courses for foreigners, while colloquial language (including vocabulary and contracted or otherwise colloquial morphology), which is learned quite early by Japanese children, is learned later in formal language instruction and therefore relatively difficult for foreign learners of Japanese.
All the strategies which were highlighted in this analysis could be useful as guidelines when assessing the readability of texts for foreign learners of Japanese, both in an overall assessment of readability, and when devising methods and systems to pinpoint difficult aspects of particular texts as a first step to text simplification.Especially in the first case, when assessing overall readability, i.e. grading multiple text on one scale of readability, further and more extensive analysis of the weight of each of these aspects on overall readability is needed.In both cases, it would be useful to devise a system for automatic discovery and assessment of particular aspects of readability.

Table 1 :
Vocabulary lists used as yardstick and percentage of new vocabulary

Table 2 :
Expected proficiency of learners of Japanese using the textbook series Enjoyable Task Reading in Japanese
adding commas to separate ambiguous or just long strings of hiragana:

Table 4 :
Number of modifications by level of linguistic analysis

Table 5 :
Number of modifications by strategy type