PREFIXATION ABILITY INDEX AND VERBAL GRAMMAR CORRELATION INDEX PROVE

All suggestions about reality of the Buyeo group were based on the representation of a language as a heap of lexemes: such method allows different scholars to make different conclusions and does not suppose verification. Language is first of all structure/grammar, but not a heap of lexemes, so methods of comparative linguistics should be based on comparison of grammars. Prefixation Ability Index (PAI) and Verbal Grammar Correlation Index (VGCI) are typology based tools of comparative linguistics. PAI allows us to see whether languages are potentially related: if values of PAI differ more than fourfold, it's a sign of unrelatedness, if PAI values differ less than fourfold, there is a possibility for some further search to find proves of relatedness. VGCI completely answers questions about relatedness/unrelatedness: if VGCI value is 0.4 and more then languages are related, if VGCI is 0.3 and less then languages are unrelated. PAI of Japanese is 0.13, PAI of Korean is 0.13; it means they can be related. VGCI of Japanese and Korean is 0.57, it's almost the same as VGCI of English and Afrikaans that is 0.56, so it means that Japanese and Korean belong to the same group, but not just to the same family.


Problem introduction
Buyeo is a conventional name of hypothetical stock that includes Japanese language, Ryukyuan languages and Korean language.
Main problem of all these suggestions (as well as of most of hypotheses about certain languages relationship) is that they are not based on any verifiable methods.All such suggestions are based mainly on the idea that language is just a heap of lexemes but not grammar.Such approach does not suppose any verification and so different scholars are allowed to make contradictory conclusions about the same material: probably some conclusions are right, but the absence of appropriate methods of verification makes it impossible to understand what is right and what is wrong.Actually it pushes comparative linguistics outside of the field of science: science always supposes verification and also supposes rejection of unproven hypotheses, while methodology based on "artist sees so" principle does not suppose any verification and so contradictory conclusions can coexist.
As long as language is first of all grammar, conclusion about genetic affiliation of certain language should be made on the base of analysis of grammar (Akulov, 2015d).
Current paper are represents the proofs based on typology/grammar which show that Japanese and Korean are closely related.

Prefixation Ability Index (PAI)
Prefixation Ability Index (PAI) allows us to see whether two languages can potentially be genetically related.
PAI is a method to estimate the percentage of prefixes in a language.It presupposes that any language has its own prefixation ability, which is then measured as percentage of prefixes among affixes.In order to estimate percentage of prefixes (PAI), the following steps should be undertaken: 1) Count total number of prefixes; 2) Count total number of affixes; 3) Calculate the ratio of total number of prefixes to the total number of affixes.
It is generally believed that PAI of genetically related languages is close in its values; and tests of PAI on the material of firmly assembled stocks (Indo-European, Austronesian, Afroasiatic) show that PAI values of distant relatives can differ maximum fourfold.(A detailed description of PAI method can be seen in Akulov, 2015a.)Thus PAI can be used as a tool that allows us to see whether certain languages can potentially be related: no conclusions can be made when PAI values differ fourfold or less (for instance in the case of Indo-European and Austronesian), but if PAI values of certain languages differ, for instance, tenfold (the case of Ainu and Nivkh; see Akulov, 2015a, p. 13) it is an evidence that considered languages are not related.
PAI could be calles a safety valve of comparative linguistics: if its values do not differ more than fourfold then there are no obstacles for further search for genetic relationship; if values differ about fourfold then should be found ferroconcrete proves of genetic relationship (like for instance those that were shown in the case of Semitic group and Coptic language); if values differ sevenfold -tenfold or even more then considered languages belong to completely different stocks.
It is possible to say that PAI shows direction in which looking for potential relatives of certain language can be perspectives.

Unrelatedness
An important point of current consideration is possibility of proving the unrelatedness of languages.This is a necessary tool of any classification as well as possibility of proving of relatedness: if there would be no possibility to prove unrelatedness then even a single stock hardly could be assembled.
Possibility of proving of unreltedness is discussed and proved in the following papers: Akulov (2015c) and Brown (2015).

Verbal Grammar Correlation Index (VGCI)
VGCI is thought to be the main tool in a search for language relatedness, so a more detailed description of VGCI method is given below.

VGCI method background
As seen in a previous section, PAI allows us to see whether languages are potentially related.However, in order to be able to say whether two languages are related, we need the that would pay attention to grammar and consequently give precise results.
As long as language is structure, i.e. grammar, language relatedness should be understood on the comparison of their grammars.
Grammar is first of all positional distributions of grammatical means, i.e. ordered pair of the following view: <A; Ω> where A is a set of grammatical meanings and Ω is a set of operations defined on A or positional distributions.
In order to understand whether two languages are genetically related we should analyze the degree of correlation among sets of grammatical meanings and estimate the proximity of positional distributions of common grammatical meanings.

Why does the method entail verbs?
Why is itpossible to give conclusions on the relatedness or unrelatedness of languages considering only verbal grammar?The answer lies in the fact that there are many languages with a poor or almost no grammar of nouns while there is no language without verbal grammar.In other words, there are languages with no grammatical case or gender (even very closely related language can differ in that case, for instance, English and German, or Russian and Bulgarian), but there are no languages without modalities, moods, tenses, and aspects.Therefore a verb is thought to be the backbone of any grammar, and the backbone of comparative method.

General scheme of VGCI calculation
As written in section 2.3.1, the following steps should be taken to estimate grammar correlation: 1. Correlation of grammar meanings sets is estimated in the following way.First, the intersection of two sets of grammatical meanings should be found.After calculating the intersection ratio to each set, arithmetical mean of both ratios should be taken.The value represents the index of sets of grammar meanings correlation.

Sets of meanings alone
do not yet fully describe grammar systems.The second step is to estimate the correlation of positional distributions of common grammatical sets of meanings.Intersection of two sets of grammatical meanings would give us information on the degree of positional correlation.
3. In order to calculate values of VGCI we should take a logical conjunction of the correlation of the degree of grammatical meanings and the degree of correlation for positional distributions of common grammatical meanings.In other words, VGCI as a multiplication of the two indexes.
4. It is obvious that languages which are genetically closely related demonstrate higher values of VGCI -the more sets of grammatical meanings are alike, the higher the intersection ratio, and consequenlty, the more alike positions of common grammatical meanings -, while languages with low or no relatedness will demonstrate low values of VGCI.
5. According to the previous step, there should be a threshold value of VGCI which determines the border of stocks, i.e.: if certain languages execute values lower than the threshold, such language evidently do not belong to the same stock.In order to determine the threshold I will compare distant languages of well assembled stocks.
The above method enables a direct comparison of natural languages that exist or have existed, but not of their reconstructions or constructed languages.Descriptions of the latter are under the influence of personal views of the authors and can not be verified anyhow.
I am also to note that the method supposes comparison of meanings and their positional distributions only an does not pay any attention to material exponents at all.It is not a response to radical adepts of megalocomparison (the term introduced by James Matisoff: see Matisoff 1990), which harshly ignores typological issues.It is rather a matter of reality and practice since material correlation (regular phonetic correspondence) between languages that are only related weakly can be very complicated.The method is intended to prove genetic relatedness or unrelatedness by pure typology.Therefore the attention is not paid to technical meanings such as markers of transitivity for example, but rather to the so-called contensive grammatical meanings such as markers of tenses, aspects, modalities etc.In other words, the attention is paid to those grammatical categories that have certain contents expressed by lexical means.Only if necessary, items that express technical meanings (meanings of agreement) can also be taken into account.
It can be a rather complicated task to distinguish obligatory features of verbs from the facultative ones, so first of all attention should be paid to the following categories: a) tense and aspect; b) mood and modality; c) voice; d) agent, patient, object, subject, numbers There can be certain categories used as evidences for a kind of modality or spatial orientation/versions, and are considered as a development of triggers system.Therefore it is very important to make precise descriptions of the languages compared though sometimes same items can be described in a slightly different way.

Results of VGCI testing: values of thresholds
Tests of VGCI on the material of firmly assembled stocks have given us the following values: VGCI Hawaiian and Lha'alua ≈ 0.39; VGCI (Chinese and Tibetan) ≈ 0.39.
On the other hand, tests of VGCI on the material of unrelated languages have shown us the following: VGCI (Chinese and English) ≈ 0.32; VGCI (Chinese and Latin) ≈ 0.30; VGCI (Khmer and Latin) ≈ 0.29; VGCI (English and Tibetan) ≈ 0.13.
If value of VGCI is around or above 0.4 then languages are related, i.e. they belong to the same stock).If on the other hand value of VGCI is about 0.3 or less than 0.3 then languages are not related.Values such as 0.39 and 0.38 are closer to 0.4, while 0.31 and 0.32 are closer to 0.3.The closer languages are related the higher is their corresponding VGCI.

Measurement error
Details on measurement error are described in a separate paper (Akulov 2015b).It was calculated to about 2%.

Applying PAI and VGCI methods to the Buyeo problem
The main problem of the Buyeo stock is the question of relatedness of Japanese and Korean.In this paper I try to show the relatedness of Japanese and Korean with the use of PAI and VGCI methods.

PAI suggests that Japanese and Korean can potentially be related
Lavrent'yev (2002) calculated the PAI for Japanese to be 0.13, and Mazur (2004) reported that the PAI for Korean is 0.13.
It is rather interesting that the PAI values demonstrate such similarities.However, as it has been noted in section 3.2.1,no conclusions can be made from similar PAI values.Nevertheless, such similarity is promising and further research might bring us to the proofs of their relatedness.

List of Japanese forms
The following list of Japanese forms has been compiled by Lavren'tyev 2002.used in different contexts), they are numbered as prp1-/prp2-/prp3-and distinguished by slash.Positional elements that are components of the same implementation are expressed as prp-+ -sfx., where -means that certain positional element can optionally be omitted and if written in square brackets, it is not obligatory.Such a notation shows grammatical meanings and their positions in relation to a nuclear position rather than their absolute positions in a linear model of word or phrase.To state an example, it is of no importance which prefix is placed closer to the nuclear position; for the current tasks it is sufficient to know that al prefixes are placed left from the nuclear position.
It is important to note that this way of notation carries information on places and technical means of expressions concerning grammatical meanings.The so-called "school grammar" offering the number of verbal stems in a certain language, for example, is not of my interest.I consider language as something like a dark box with many holes, and implementation of certain grammatical meanings is the light coming out of those holes.My task is to record in what holes the light appears, and then to compare recordings of different boxes (i.e.different languages).

List of Korean forms
List of Korean forms has been compiled by Mazur 2004.its ways of expressions of the grammatical meaning.Finally, the number that expresses degree of correlation is written down.If a certain meaning can be expressed in several ways, options are separated by a slash; in case there are some similar items expressing the same meaning, they are marked by lower index numbers.Also, if there is no difference in positional expressions schemes, this point is counted as 1, and if there is no correlation, corresponding point is counted it as 0, while in other cases particular degree of correlation is estimated.It is supposed that, for instance, the case of -sfx and -pp execute the same full correlation as -sfx and -sfx; while -sfx and -sfx + -sfx show zero correlation.
VGCI of Japanese and Korean is higher than VGCI of Khmer and Vietnamese (VGCI=0.53) or VGCI of English and Russian (VGCI = 0.52), and this brings us to the conclusion that Japanese and Korean belong to the same group rather than just to the same stock.In order to verify if this is so, VGCI of Japanese and Korean is compared with VGCI of languages that evidently belong to the same group.

List of English forms
The list of English forms has beend compiled by Barhkhudarov et al., 2000.

Conclusion and further perspectives of the Buyeo group
First, I suppose that it has been shown rather evidently that Japanese and Korean are not just languages of the same stock, but rather languages of the same language group.
Second, Ryukyuan languages show great proximity with Japanese so there is no problem at all to show their closeness with Korean.
Third, in the context of Altaic hypothesis it is traditionally supposed that the Buyeo group is related with Tungusic languages, Mongolian languages and Turkic languages.However, I suppose that the reality of the so-called Altaic stock/family is a highly doubtful issue since the PAI value of Buyeo languages is about 0.13, while Turkic languages show PAI value of around 0.012 (Tenishev, 1996), cf. a tenfold difference.According to section 3.2.1,such a difference of PAI values is a serious reason to doubt the relatedness of thelanguages considered.Anyway, whether the Buyeo group is related to other the above mentioned groups/stocks is matter of further research.

Figure 1 :
Figure 1: Positional distributions of Japanese and Korean grammars.Lines 1-38 represent grammar meanings whereas columns B-G are positional realizations.Japanese (J) is marked green and Korean (K) is marked red.Common positions are marked yellow.Numbers inside cells show the degree of use of corresponding positions by the two languages respectfully.

Figure 2 :
Figure 2: Positional distribution of Japanese grammar and its comparison with Korean.

Figure 3 :
Figure 3: Distribution of Korean grammar and its comparison with Japanese.

Figure 4 :
Figure 4: Positional distributions of English (E) and Afrikaans (Af) grammars.English is marked red, common positions are marked yellow.

Figure 5 :
Figure 5: Positional distribution of English grammar compared to Afrikaans.

Figure 6 :
Figure 6: Positional distribution of Afrikaans grammar compared to English.