‘Robust chronologies’ or ‘Bayesian illusion’| Some critical remarks on the use of chronological modelling

Appendix 1 Plot() ∑ Sequence() ∑ Boundary("Start")< Phase("earliest") ∑ Phase("southeast") ∑ Phase("Apc") ∑ R_Date("OxA-25187", 6290, 40)< }< Phase("Balatonszarszo") ∑ R_Date("OxA-13650", 6292, 33)< R_Date("OxA-13651", 6330, 33)< R_Date("OxA-13655", 6339, 32)< }< }< Phase("east") ∑ Phase("Rosenburg") ∑ R_Date("VERA-3965", 6245, 40)< R_Date("VERA-3966", 6180, 40)< R_Date("VERA-3967", 6210, 35)< }< Phase("Vedrovice") ∑ R_Date("OxA-16650", 6299, 35)< R_Date("OxA-15367", 6219, 35)< R_Date("OxA-15385", 6332, 37)< R_Date("OxA-16617", 6240, 45)< R_Date("OxA-15131", 6266, 36)< R_Date("OxA-15429", 6268, 37)< R_Date("OxA-15425", 6298, 34)< R_Date("OxA-16621", 6244, 40)< R_Date("OxA-15363", 6305, 40)< R_Date("OxA-15426", 6272, 37)< }< Phase("Kleinhadersdorf") ∑ R_Date("VERA-2170", 6135, 35)< }< }< Phase("Eilsleben") ∑ R_Date("OxA-1624", 6140, 90)< R_Date("OxA-1625", 6030, 100)< R_Date("OxA-1626", 6070, 100)< R_Date("OxA-1627", 6190, 90)< }< Phase("Pfäffingen") ∑ R_Date("ETH-18615", 6115, 70)< R_Date("ETH-18616", 6325, 70)< }< }< Boundary("End 1")< }< }<

• The succession of house generations as a base for my absolute chronology is not "identified only by study of ceramic motifs" (Bánffy et al. 2018.130), but also by detailed studies of site-formation processes (Strien 2018.94-95, 97-98 and further;illustrated Strien 2014.Abb. 1-2): "The knowledge of the stylistic development is fundamental for this purpose, but it is supplemented by other, independent information such as the position of pits relative to houses, spatial relations between houses, and stratigraphy" (Strien 1989b.364-365;own translation; in more detail and with comprehensive literature cf. Zimmermann 2012.12-13).
• It should be noted that using (1) the lowest existing estimate for the number of inhabitants of a house, (2) a low estimate for the mean number of houses per settlement based on a model with a low duration of houses (23-25 years), (3) only actually known settlements, and (4), a very high population growth to calculate the minimum number of immigrated people is usually termed a 'conservative estimate' and not (Bánffy et al. 2018.129) 'demographic speculations'.
What should be discussed in more detail are some other points: 'robust chronologies' require dates with a statistical error as small as possible, which in 14 Cdating is at first hand a technical problem. However, the statistical error of a typochronological date in the case of Neolithic ceramics is mainly a function of the number of sherds found in the feature. In consequence, using Correspondence Analysis (hereafter CA) is no guarantee for a 'robust chronology' of all dated features; a critical look at dates based on small samples is necessary. In regions not reached by modern statistical methods of relative dating the uncertainties of individual typochronological judgement enlarge the potential errors considerably.
Looking first at the Transdanubian earliest LBK (eLBK), the only available CA consists of all accessible features of this phase from all over Central Europe (Strien 2018). The alleged earlier date of the so-called 'formative phase' compared to the Bíňa phase and the expansion horizon, which plays a cen- li 1994). This was complemented by first modelling of 14 C dates, mainly aiming at estimates for the absolute duration of the LBK as a whole and of the house generations of the compound model (ger. Wohnplatzmodel;Stehli 1989). The estimated absolute date for the LBK of the lower Rhine Valley (5300-4950 cal BC) was soon confirmed by dendrochronological dates from the Kückhoven wells (Fig.  2). Later on, other regional chronologies were added (e.g., Lefranc 2007;Denaire 2009;Pechtl 2009), but without great changes for the overall scheme. In the south-east, until recently chronologies relied mainly upon individual typochronological estimation (e.g., Pavúk 1980;∞i∫mař 1998;Marton, Oross 2012.Fig. 10).
While the start of the early LBK (known also as Flomborn and Notenkopf phase) somewhere around 5300 BC is widely accepted, the absolute date of the formation and expansion of the earliest LBK (eLBK) remains contested, with postulated dates up to 5700 BC, but rarely later than 5500 BC. The model of an at least partial parallelization of earliest and early LBK based mainly upon 14 C dates from taphonomically problematic contexts (Stäuble 2005;Cladders, Stäuble 2003) has not received general approval.
However, recently the previous consensus on the relative and absolute chronology of the beginning as well as the end of LBK was disturbed by the approach of formal modelling of 14 C dates, applying Bayesian statistics. The first attempts (Jakucs et al. 2016;Denaire et al. 2017), postulating an unexpectedly late start of the expansion of the eLBK around 5350 cal BC, and a long-lasting hiatus between the final LBK and the beginning of the Middle Neolithic, provoked concerns (Strien 2017). Consequently, this led to a reply in which the claims of the criticized papers were restated (Bánffy et al. 2018). The problems with 14 C-dates on bone collagen (as discussed in Strien 2017) were rejected by the authors, mainly based on the conviction that 14 C dating is technically mature to a degree excluding major problems. This point shall be addressed below with additional evidence.
To come to an overall sound line of argument, it is helpful to briefly review some statements of Eszter Bánffy et al. (2018) concerning the alleged methodical deficits of my line of argument: • The absolute chronology proposed by Hans-Christoph Strien (2017) is not "based on informal inspection of selected radiocarbon dates" (Bánffy et tral role in the argument of János Jakucs et al. (2016), is in clear contradiction to the results of this CA (Fig. 1), showing an anteriority of Bíňa, not 'formative phase' inventories. The detailed results of the CA might be questioned for edge effects (as discussed in Strien 2018.24-25), but an earlier start of Bíňa (Donau-eLBK) seems most probable, although a synchronous start remains possible, and the reverse sequence can be excluded 1 1 . These results are backed by maps (Strien 2018.Abb. B4-B5) showing that contemporaneity between the 'formative phase' and Bíňa phase, and even some early Moravian sites, all synchronized by CA, is geographically plausible.
It remains to be noted that: • The only argument for the anteriority of the 'formative phase' mentioned by the authors, the presence of Star≠evo-like pottery at Szentgyörgyvölgy-Pityerdomb and "the Star≠evo presence in southern Trandanubia and the Balaton, ending perhaps in the 56 th century" (Bánffy et al. 2018.128), is somewhat surprising since not less than five out of the 11 authors of this paper had strongly dismissed this in another paper only a few months earlier (Jakucs et al. 2018): at Versend-Gilencsa Star≠evo and early (not 'formative' nor earliest!) LBK were shown to have been contemporaneous in some households, following formal modelling as late as 5200 cal BC (Jakucs et al. 2018.112), far beyond the suggested start of the Earliest LBK at about 5350 cal BC. It remains unexplained why Bánffy et al. (2018) nevertheless claim an end date of Star≠evo anterior to the Earliest LBK and in consequence also for the 'formative phase', in straight contradiction to their own paper.
• At Szentgyörgyvölgy-Pityerdomb, the main site of the 'formative phase', i.e. pit 16 and together with pit 11 forming the long pit of house 1 (house numbers according to Lüning 2016), provided one of the earliest inventories from the site according to the CA 2 2 . One of the pots shows a motif composed of three lines, forming an arc standing on the carination of the biconical bowl (Bánffy 2004.138.141, Fig. 71). The same motif in the same position on recipients of related form is not only well known from but most typical for the Bíňa phase (Pavúk 1980); the technical differences (narrow, smoothed and finely incised lines instead of broad deeply incised lines) at the same time link it with early Vin≠a parallels (Horváth 2006).
After all, there is no argument left for the postulated anteriority of the so-called 'formative phase', but manifold evidence against it. Bánffy et al. (2018. 128), complain that this "simply reduces the proposed 'formative phase' to a regional variant" -in fact it simply is a regional variant. The term should in consequence be disregarded as misleading; the phase preceding the expansion of eLBK is constituted not only of the earliest pits of the sites in the region between western Balaton and Vienna (only the earliest part of the so-called 'formative phase'), but by all Bíňa phase sites, too.
Changing to the Alsatian chronology, Anthony Denaire et al. (2017) tend to an uncritical optimism concerning the reliability of CA dates and at the same time to a readiness to adjust them without mathematical foundation, as may be shown by some examples: • In the case of Osthouse 227, a single pot is dated to a stylistic phase most probably (84% probability)

Fig. 1. Projection 1./2. EV of a CA of eLBK (after Strien 2018).
spanning not more than 10 years according to the formal modelling (Denaire et al. 2017(Denaire et al. .1106. Dating single pots poses methodical problems like possible stylistic interdependencies of rim and body decoration (Strien 1984.23, Abb. 11) -the main reason why single pots should be excluded from a CA of features (Strien 2000.46). This weak point is combined with a second potential source of dating problems: the assumption that ceramic from graves is representative of the style in use at the time of the funeral. This assumption excludes the possibility that ceramic was produced or at least selected for funerary purposes, the decoration following rules somewhat different from those for everyday items. Indeed, there are hints in this direction at least for the Niedermerz cemetery (Frirdich 1994.336-340). The idea that typochronology based on such a narrow and problematic base could reach a precision in the range of one decade or less is in remarkable contrast with the negative attitude towards the much more refined identification of house generations of an estimated 25 years shown by the same authors.
• In the case of KV107 not only the small number of decorated sherds (Denaire 2013) poses problems, as its typochronological date had also been deter-mined quite arbitrarily by drawing in the projection 1./2.EV of the CA diagonal phase boundaries at strange angles, changing the position of KV107 from between phases IIB and IIC to the beginning of phase III (Denaire et al. 2017.Fig. 5; one may also ask why Bisch 1735 is dated to IVa1 and not to IVa2 where its position in CA fits better) -connecting chronology in this way with 1. and 2.EV of a CA at the same time is at best unusual, and would have required some solid justification.
• Another highly problematic methodical handling is shown by the last example: Talheim and the phase to which it can be dated (8A of the Württemberg chronology) had until now always been attributed to late LBK (Strien et al. 2014.Fig. 5;Lefranc 2007. Tab. 14;Jeunesse, Strien 2009. Fig. 1), corresponding to phases IVa2 or IVb of the Alsatian chronology -dating it without any explanation to the final LBK 3 3 is not what usually is understood under the term 'robust chronology', but looks more like arbitrarily arranging the relative position to fit the 14 C dates to the authors' own chronological ideas.
After all, the results of CAs are treated in very different manners by Denaire et al. (2017) and Bánffy et al. (2018): sometimes accepted even for statistically problematic inventories (Osthouse 227 in Alsace), sometimes 'corrected' (features KV107 and Bisch 1735 in Alsace, Talheim), sometimes completely ignored ('formative phase' of LBK) -this is far from "using a rigorous statistical methodology", as claimed by Bánffy et al. (2018.130), for combining 14 C dating and archaeological evidence.
But 'robust chronologies' require reliable 14 C dates, too, not changed by later alterations of the dated material. Two thirds of the paper (Bánffy et al. 2018. 121-128) provide a lucid argument as to why on both methodological and technical grounds 14 C dates are supposedly highly reliable. In practice, things are a bit different, as some examples show. The first is the start of eLBK expansion, dated by Jakucs et al. (2016) to c. 5350 cal BC, and questioned by me on the grounds of contradictory 14 C dates. The simplest method, if my conclusions on the reliability of collagen dates were wrong, is a comparison of bonebased with charcoal-and-cereal-based formal modelling, and this was not chosen -for obvious reasons, as may be shown. As the original code has not been published, the models had to be rebuilt online (Bronk Ramsey 2009a;2009b; https://c14.arch.ox. ac.uk/oxcal/OxCal.html, Version 4.3). The reconstructed model 2 produces results that are not identical but close to those of Jakucs et al. (2016) (Tab. 1). The differences may be caused by minor errors in typing and by the use of different releases of OxCal. Then the model was split in two (Appendices 1-2), one version with the collagen dates and a second one with the dates on botanical material. The result is quite clear and supports my position: using collagen, the start of the expansion phase is dated to c. 5290 cal BC (the absolute dates mentioned in this paper are the median values according to OxCal; Tab. 1; Fig. 2), about the same date as for the start of Flomborn in Alsace; using botanical dates, the start goes back to c. 5395 cal BC, with a better overall agreement for the latter.
Approaching the correct archaeological model, i.e. removing the 'formative phase' from the botanical dates, results in a start date for the expansion of 5425 cal BC (Fig. 2). Changing the model by putting all dates from features dated by CA to the pre-expansion horizon in a new 'formative phase' alters the results only slightly and therefore is not shown here (5290 cal BC for collagen, 5400 cal BC for cereals/ charcoal), with a date for the start of the pre-expansion horizon of 5325 cal BC and 5440 calBC, respectively. Evidently, there is a difference between the collagen and botanical dates, the latter giving a date that is more plausible, although too late compared with my archaeological findings. Anyhow, it should be noticed that none of the formal models presented here is meant to present a correct alternative. They are only used to highlight the problems of the disputed models. The deficits of the calibration curve, making all actually possible models insecure, will be discussed below.
Another point is the end date for eLBK, left open by Jakucs et al. (2016) as the models produced dates in the 52 nd /51 st centuries cal BC. The authors bypassed the problem by claiming that "for that, a much better data set is required" (Jakucs et al. 2016.318). It remains unexplained why the same dataset should produce robust estimates for the start, but obviously unrealistic ones for the end of eLBK. On the other hand a very simple method for estimating an end date was omitted: the 14 C dates from Vedrovice and Kleinhadersdorf from phase Ib were included as eLBK -why not take phase IIa from these sites plus Alsatian Phases IIb/IIc as post-eLBK? The explanation might be the unwelcome result: Using the model of Jakucs et al. (2016), as above, but excluding all eLBK dates later than 6100 BP as intrusions and including the dates of seven graves from Vedrovice and Kleinhadersdorf and 11 pits from Alsace as LBK II (Appendix 3), the new model shows low overall agreement (A = 36), mainly caused by the two earliest Alsatian dates (SUERC-46497, OxA-27805 Denaire et al. (2017Denaire et al. ( .1106) to realize a contradiction between the archaeological and 14 C chronologies, which had been denied by Bánffy et al. (2018).
The last example relates to the question of the internal chronology of Großgartach in Alsace. Here formal modelling produced a result according to which the typochronological phases could not be established as chronological units 4 4 . Denaire et al. (2017Denaire et al. ( . 1114 concluded that "alternative explanations have now to be found for contemporary variation". With a bit more scepticism a possible methodological explanation can be found: running separate models with the Oxford, Poznan and SUERC dates (Bruebach-Oberbergen and BORS not included) highlights differences between laboratories (Tab. 2). The Oxford dates are nearest to the usual expectations, with boundaries between main phases 40-70 years earlier compared to SUERC dates (except the end of Bischheim), which on the other hand are the only series in accordance with the typochronology of Großgartach. The reason for the laboratory differences as well as for the lack of chronological differentiation of the Großgartach sequence might admittedly be haphazard, but problems with collagen dates cannot be excluded, which regrettably cannot be checked without 14 C dates from botanical material.
In addition, the SUERC dates (Appendices 4-5) demonstrate another factor, the influence of purely mathematical effects on the results, seemingly completely ignored by the authors: • Comparing the difference between the median of the boundaries (as an estimate of phase duration), there are important differences between a model separating the Großgartach phases and the model taking Großgartach as one phase (Tab. 3; Fig. 3). The question of how fine-grained the development of ceramic styles is differentiated in the regional chronology is of greater importance for the modelled start and end dates of the typochronological units, and even more for the relation between their time spans. This may be an extreme case as the number of dates is quite low, but first experiments with other data sets showed that it is a common effect.
• Even more, sometimes the addition of more phases at the end of a sequence also influences the start date of the whole sequence (Tab. 3). The changes usually seem to be in a range that is at first sight negligible (rarely more than 40 years), but the moment the start or end of the model are inflicted by a plateau the consequences might be quite significant.
• And finally OxCal does not produce absolutely stable results: changing the input order of dates within one phase sometimes slightly changes the results.
Even without laboratory differences the three potential mathematical artefacts identified here further weaken the illusion of 'robust chronologies'.
In the light of the aforementioned problems, the series from Szederkeny should be reconsidered: here the displayed LBK finds show a clear typochronological sequence, from Bíňa in the eastern part (Jakucs, 4 Nevertheless Denaire et al. (2017.1128, claim: "The radiocarbon dates are in good agreement with the sequences suggested by the seriations in both the LBK and Middle Neolithic periods", although for the latter this obviously is not the case. Voicsek 2015. Fig. 10, 11) to a probably late eLBK in the middle (Jakucs et al. 2016.Fig. 8, 8.9) and post-eLBK in the western part (Jakucs et al. 2016. Fig. 9, 1.2; even Notenkopf decoration is mentioned, Jakucs et al. 2016.281). The formal modelling nevertheless shows no chronological difference (Jakucs et al. 2016.293-298). This implies that three or four different typochronological or geographical units of the LBK (earliest phase -Bíňa in the eastern part, Milanovce there and/or in the central part -Notenkopf and Malo Korenovo in the western settlement), plus Vin≠a A and Ra∫i∏te are all present at the same time within a few hundred meters, but with restricted contacts between them. Here again the Oxford dates show no sequence of the different parts, whereas modelling only SUERC and MAMS dates (Appendix 6) produces a different picture similar to that developed at Balatonszarszo (Tab. 4; Fig. 4). A sequence for the eastern-central-western part is in sufficient overall agreement with the dates (A = 73). Of course the low number of dates per part of the settlement (and as a consequence that the differences between the laboratories might as well be pure chance) excludes any definite conclusion on the contemporaneity or sequence of the three parts based exclusively on 14 C, as both models are in accordance with the dates. Nevertheless we should take into account problems with collagen dates, as seen for the Alsatian Middle Neolithic, possibly based on diagenetic processes and the resulting difficulties in removing later contaminations, as typochronology postulates a sequence.
The two last examples clearly reveal the major methodical deficit of the TOTL project, the refusal to date botanical material for the sake of minimizing taphonomic risks at the cost of lack of control for possible problems with collagen dates.
Given the very small number of dates the question of the start date of the Central European Middle Neolithic will not be discussed here in detail, as a handful of new dates -especially based on botanical material -from early Hinkelstein contexts might change the picture entirely. It should only be remarked, that: • Even Bánffy et al. (2018.130) had to admit that there is at least one contact between late LBK and Hinkelstein (Köln-Lindenthal) -the overall number of contacts is irrelevant the moment this single contact is undisputed, so a contemporaneity between late LBK and Hinkelstein cannot be rebutted.
• The alleged "evidence for contacts between users of late LBK and Hinkelstein pottery" in the Worms region has never been shown; the cited papers and books did not present anything of this kind, only Walter Meier-Arendt (1975) postulates, based on merely typological arguments, a development from LBK IV (!) to Hinkelstein I, a view  (Spatz 1996.474-475) examples from Worms and its immediate surroundings are missing, they are more general late and latest Northwestern LBK -so within the same time range as the 'mixed assemblages' rejected by the authors. Even when interpreted as an evolutionary sequence instead of contacts they are no argument for a hiatus.

Fig. 3. Percentage of each cultural unit compared to the duration of the whole sequence Hinkelstein-Rössen (SUERC dates only), with (right column) and without (left) subdivision of Grossgartach (visualisation of
• A phase 'VI', in any case indispensable to render possible the alleged contacts in the Worms region when postulating a hiatus between LBK V and Hinkelstein in the neighbouring regions, has never been described by any author familiar with the LBK around the estuaries of Neckar and Main 5 5 . The only inventories of late LBK from Worms which have been claimed to be near the beginning of Hinkelstein (Meier-Arendt 1972) can be dated to Phase IV (Strien 2000.66).
• The use of CA and more generally the typochronological approach does in no way "tend … to gloss over any possible disruptions or hiatuses" (Bánffy 2018.131). This statement reflects an obvious misunderstanding of the two cited articles (Shennan, Wilkinson 2001 6 6 ; Pechtl 2015), which do not suggest anything like this. In contrast, CA tends to overestimate any disruptions, as experiments with test data sets have shown (Strien 2000.41-47). Rapid innovations are such disruptions, causing larger distances on the 1.EV between stratigraphically immediately neighbouring units, as demonstrated at Vin-≠a-Belo Brdo (Schier 2001) -a well-known effect that has served for the differentiation of stylistic phases for some decades (e.g., Schmidgen-Hager 1993. 89), disproving speculations about "default perspectives of slow change". It may be remarked that a slow change from the Early to Middle Neolithic has never been discussed, although the question of how to explain the obviously rapid change between LBK and Hinkelstein has been noted (e.g., Spatz 2003;Strien et al. 2014.254-255). And when typological similarities and -be it a single one -contact finds suggest it, continuity is indeed and should be the default perspective compared to a large-scale and longtime hiatus (the whole Rhine Valley and its tributaries, deserted for up to two centuries: Denaire et al. 2017Denaire et al. .1132Denaire et al. , 1136, especially if the only argument for this hiatus is a handful of 14 C dates.

Fig. 5. Correlation between number of 14 C-dates per phase and phase lengths of Alsatian LBK (difference between upper and lower boundary; visualisation of Table 5).
A last point to be mentioned is the high degree of confidence in the actual calibration curve demonstrated by the authors. Looking at known problems, e.g., inaccuracies of the calibration curve around the time of the Thera eruption (Pearson et al. 2018) and within the LBK plateau (Weninger 2019), a more modest judgement concerning the allegedly 'robust' models would perhaps have been appropriate. The low density of measurements (IntCal13: 483 dates for the range 4050-6050 cal BC), low density of interlaboratory dating, and the extreme smoothing of the IntCal13 curve compared to IntCal98 -all well-known facts -exclude any reliable dating, especially within plateaus. In consequence the idea that the duration of the stylistic phases of Alsatian LBK, all boundaries between them laying within the plateau around the 52 nd century cal BC, could be reliably estimated at the actual state is highly dubious, so doubts concerning, for example, the duration of phase IVa2 of "only 1-15 years (95% probability)" (Denaire et al. 2017(Denaire et al. . 1106, based on two (!) 14 C dates (plus one outlier and two old charcoal dates, another date arbitrarily put to Phase IVa1, as shown above), seem to be neither overcautious nor overcritical but self-evident, even when neglecting the fact that the stylistic phases are found by a CA with its inherent statistical dating errors, consisting of inventories from several sites and different functional and social contexts, with individual filling histories, which makes typochronological divisions at this fine-grained level highly improbable. Even more, further OxCal mathematical artefacts become visible: (1) for unknown reasons the given estimates for the duration (e.g., "probably for 5-35 years (68% probability)" for phase IIb; Denaire et al. 2017Denaire et al. .1104) are evidently too short, even the sum of the upper boundaries of the 68%-ranges lying slightly below the estimated overall duration (Tab. 5), and (2) there is a correlation between the number of 14 C dates per phase and their length according to Bayesian modelling. Using the means of the modelled boundaries between phases for calculation of durations (Tab. 5) the correlation is clearly significant (Spearman's rank correlation coefficient: rs = 0.8857, n = 6, p = 0.01; Fig. 5); using the above mentioned modelled phase lengths, rs is even higher (Tab. 5). Oxcal seemingly distributes the dates more or less evenly along the plateau of the IntCal13 curve. Using even numbers of dates per phase would not cure the fault but produce equal phase lengths. A robust estimate of phase lengths in the plateau, using the IntCal13 curve, is mathemat-ically impossible. A completely new model for settlement organisation, based on so slippery ground (Lefranc, Denaire 2018) will necessarily be highly speculative and no serious alternative to existing models.
The models of Jakucs et al. (2016) and Denaire (2017), suffering from methodological deficits in the typochronologies on the one hand, and an uncritical attitude towards the reliability of 14 C dates and deficits of the present calibration curve as well as a lack of awareness of mathematical artefacts in Bayesian modelling on the other, are far from being 'robust chronologies', as claimed by Bánffy et al. (2018). A patchwork of contradictory chronologies for different parts of the Danubian sequence in different regions and even at single sites (as shown in Fig. 2) is no chronological model of any explanatory value. The conclusion of the authors concerning the greater effectiveness of "our collective efforts … if the strengths of the various approaches reviewed in this paper were to be applied more regularly and more systematically" (Bánffy et al. 2018.131) can only be underlined. Bayesian statistics will provide a highly valuable instrument for absolute chronology once the main requirements are fulfilled: a precise calibration curve, better control of factors influencing dates, better knowledge of mathematical properties -presently this instrument only produces an illusion of robustness. Appendices