\jyear

2024

[1]\fnmMladen \surPopović

[1]\orgdivQumran Institute, \orgnameUniversity of Groningen, \orgaddress\countryThe Netherlands

2]\orgdivArtificial Intelligence, \orgnameUniversity of Groningen, \orgaddress\countryThe Netherlands

3]\orgdivCenter for Isotope Research, \orgnameUniversity of Groningen, \orgaddress\countryThe Netherlands

4]\orgdivDepartment of Physics, Chemistry, and Pharmacy, \orgnameUniversity of Southern Denmark, \orgaddress\countryDenmark

5]\orgdivDepartment of Chemistry and Industrial Chemistry, \orgnameUniversity of Pisa, \orgaddress\countryItaly

6]\orgdivFaculty of Theology and Religious Studies, \orgnameKU Leuven, \orgaddress\countryBelgium

Dating ancient manuscripts using radiocarbon and AI-based writing style analysis

[email protected]    \fnmMaruf A. \surDhali [email protected]    \fnmLambert \surSchomaker [email protected]    \fnmJohannes \survan der Plicht [email protected]    \fnmKaare \surLund Rasmussen [email protected]    \fnmJacopo \surLa Nasa [email protected]    \fnmIlaria \surDegano [email protected]    \fnmMaria \surPerla Colombini [email protected]    \fnmEibert \surTigchelaar [email protected] * [ [ [ [ [
Abstract

Determining the chronology of ancient handwritten manuscripts is essential for reconstructing the evolution of ideas. For the Dead Sea Scrolls, this is particularly important. However, there is an almost complete lack of date-bearing manuscripts evenly distributed across the timeline and written in similar scripts available for palaeographic comparison. Here, we present Enoch, a state-of-the-art AI-based date-prediction model, trained on the basis of new 14C dated samples of the scrolls. Enoch uses established handwriting-style descriptors and applies Bayesian ridge regression. The challenge of this study is that the number of radiocarbon-dated manuscripts is small, while current machine learning requires an abundance of training data. We show that by using combined angular and allographic writing style feature vectors and applying Bayesian ridge regression, Enoch could predict the 14C-based dates from style, supported by leave-one-out validation, with varied MAEs of 27.9 to 30.7 years relative to the 14C dating. Enoch was then used to estimate the dates of 135 unseen manuscripts, revealing that 79% of the samples were considered ‘realistic’ upon palaeographic post-hoc evaluation. We present a new chronology of the scrolls. The 14C ranges and Enoch’s style-based predictions are often older than the traditionally assumed palaeographic estimates. In the range of 300–50 BCE, Enoch’s date prediction provides an improved granularity. The study is in line with current developments in multimodal machine-learning techniques, and the methods can be used for date prediction in other partially-dated manuscript collections. This research shows how Enoch’s quantitative, probability-based approach can be a tool for palaeographers and historians, re-dating ancient Jewish key texts and contributing to current debates on Jewish and Christian origins.

keywords:
Palaeography, Artificial Intelligence, Radiocarbon Dating, Dead Sea Scrolls

One of the main aims of palaeography—the study of ancient handwriting—is the dating of manuscripts on the basis of their handwriting Nongbri2019 ; Orsini2018-edit ; OrsiniClarysse2012 . Determining the chronology of ancient handwritten manuscripts is essential for reconstructing the evolution of ideas. This is particularly important for the Dead Sea Scrolls from ancient Judaea. These contain the oldest manuscripts of the Hebrew Bible and many previously unknown ancient Jewish texts, mostly written in Aramaic/Hebrew script. The discovery of the scrolls in the 1940s–1960s fundamentally transformed our knowledge of Jewish and Christian origins Brooke2018-kl .

Aramaic/Hebrew script in Judaea evolved from the imperial Aramaic script of the fifth and fourth centuries BCE to the Jewish square script in the first and second centuries CE. For the centuries in between, palaeographers have constructed a model of successive developmental stages, each characterized by distinct features of the script. This model was used to date manuscripts and thus affected the study of religious, cultural, and historical developments Tigchelaar2020 ; Puech2017 ; Cross2003 ; Avigad1965 . However, these palaeographic distinctions are not reliably grounded (see Appendix A).

For palaeographic comparison, one requires enough date-bearing manuscripts that are evenly distributed across the timeline and written in similar script. Yet, only some of the very oldest, fourth century BCE, and the very youngest, first and second century CE, manuscripts have calendar dates (see Section A.1 in Appendix A). To compensate for this scarcity for the centuries in between, palaeographers have turned to inscriptions on other surfaces which would be historically datable Puech2017 ; Cross2003 ; yardeni2000-mq ), but these, too, have no absolute dates (see Section A.2.1 in Appendix A). Historical hypotheses of a slow development of the Aramaic/Hebrew script in the third century BCE and the emergence and rapid development of a national script around the mid-second century BCE Puech2017 ; Cross2003 ; Yardeni1990 remain unsubstantiated. Neither inscriptions nor historical hypotheses enable us to reliably date the Dead Sea Scrolls (see Section A.2.2 in Appendix A).

In this study, we bridge the palaeographic gap between the fourth century BCE and the second century CE, and advance palaeography in general, by combining new radiocarbon (14C) dates with state-of-the-art artificial intelligence(AI)-based writing style analysis. A straightforward approach is to use machine-learning algorithms that are able to learn from a small set of labeled, i.e., dated, examples. This requirement is in conflict with the need for labeled data in current supervised deep learning, typically a thousand examples per class Krizhevsky2017 . An abundance of data points is needed to warrant the stable estimation of, e.g., a neural-network model with millions of coefficients, in order to minimize the risk of arriving at a seemingly ‘good’ model Vapnik2000 . A simple example is a linear function that has two coefficients and consequently needs a minimum of two data points to be determined. It may be appreciated that there exists a fundamental problem if the number of manuscript-date reference points is in the order of two dozen, while a computational model requires hundreds of thousands of coefficients or more. We address this challenge by applying methods that can (a) operate under sparse data conditions, (b) are explainable, and (c) do not require (pre)training from an extraneous, alien image collection. So, while it is tempting to use modern methods of deep learning, as we have done before He2019 ; He2020 ; He2021 ; Zhang2022 ; Ameryan2023 , we will present several arguments for not using such approaches for the proposed style-based date prediction on a very small data set, i.e., at the current stage of Dead Sea Scrolls research on handwriting-style based manuscript dating (Appendix F).

Since a general, large, representative, and labeled data set is not available for the period of the scrolls, we apply dedicated pattern recognition and machine-learning models, only using the relevant scrolls data for training a date-prediction model. Given the importance of the topic, it is expected that the use of pretrained deep transfer learning on the basis of extraneous material would elicit valid concerns among palaeographers about the relation between the scrolls’ target data and training data from a (very) different origin and period. Like the Ithaca approach assael2022restoring , a deep neural network making chronological attributions of ancient Greek inscriptions based on the totality of textual content, we focus on predicting chronological development, but unlike Ithaca, we use shape-style information from handwritten manuscript images instead.

We present Enoch, a machine-learning-based date-prediction model using established handwriting-style descriptors and applying Bayesian ridge regression. Enoch, named after the ancient Jewish science hero, was trained on the basis of new 14C-dated samples of the scrolls, providing reliable, absolute time markers that can bridge the palaeographic gap. Because of possible castor oil contamination issues with previous 14C datings of the scrolls Bonani1992 ; Jull1995 , which would give a misleading 14C age that was ‘younger’ than the true age of the samples, new 14C dating was necessary for this study rasmussen2001effects ; rasmussen2003reply ; rasmussen2009effects .

Enoch integrates multiple dating methods, using both physical, material-based evidence from 14C dating and geometric, writing style-based evidence from AI methods. With a new set of 14C-dated scrolls for temporal reference, the corresponding handwritten style features in those tested manuscripts are used for date estimation for undated manuscripts from the collection by applying machine-learning-based writing style analysis. Subsequently, interpolation of writing style features over time allows Enoch to make estimates for samples that do not have a 14C date and are only available as a digital image. Thus, Enoch offers date predictions as probability-based options that can aid palaeographers and historians in their decision-making and contribute to historical debates.

1 Radiocarbon dating

We performed 14C dating on 30 undated manuscripts from 4 sites, spanning an estimated 5 centuries: 25 from the Qumran caves, 1 from Masada, 2 from the Murabba\laspat caves, and 2 from the Naḥal Ḥever caves. Twenty-eight manuscripts were made of animal skin or parchment, and 2 of papyrus.

Samples were selected because of their script and presumed period, the manuscripts having a sufficient number of characters for Enoch to be trained, and also on the basis of practical and conservational considerations (see Section B.1 in Appendix B). One date-bearing document, Mur19, was used as a validation test for 14C, but did not go into the training of Enoch because of its cursive script.

The scrolls are extremely delicate material. As in the previous attempts made at dating the scrolls Bonani1992 ; Jull1995 , we, too, had to adjust the standard chemical AAA (Acid-Alkali-Acid) pretreatment (see Section B.2 in Appendix B). Also, many fragments are contaminated with castor oil, which scholars in the 1950s used to improve the readability of the scrolls’ text rasmussen2001effects ; rasmussen2003reply ; rasmussen2009effects . This study is the first to apply, prior to 14C dating, a chemical treatment specifically designed for removing fatty materials, employing solvent extraction (see Sections B.2 and B.7.1 in Appendix B). Further specialized analytical chemistry methods were applied before and after the sample pretreatment to demonstrate that the total amount of lipid materials is below a threshold that does not significantly skew the 14C date (see Sections B.7.2B.7.5 in Appendix B).

The samples were dated by two Accelerator Mass Spectrometry (AMS) machines (see Section B.3 in Appendix B).

2 Integration of multiple dating methods

We used 24 manuscripts from the 14C samples with accepted dates as labeled data for the primary training set for Enoch (see Sections B.4-B.6 in Appendix B and Section D.2 in Appendix D). For the data labels, we used OxCal v4.4.2 Oxcal ; Oxcal2 to obtain the raw data points for the probability distributions. This is because the 14C results are not single dates, as with date-bearing documents, but represent date ranges with probability distributions. The 14C data input for training Enoch consists of the probability distributions of accepted 2-sigma (2σ𝜎\sigmaitalic_σ) calibrated ranges (see Section E.6 in Appendix E).

In addition to the primary training set, we created different combinations of training data to perform comparative analyses and further check the robustness of the model. These combinations include the tentative addition or omission of 4Q52, some previously tested 14C samples Bonani1992 ; Jull1995 , date-bearing documents from the fifth–fourth centuries BCE and the second century CE (see Tables 18 and 19 in Appendix I for complete lists of manuscripts), the Maresha ostracon from 176 BCE (see Section A.1 in Appendix A), and leave-one-out of the training data points.

2.1 Deep neural networks for detection of handwritten ink-trace patterns

The physical 24 14C-dated manuscripts are visually available on many individual images of the IAA’s Leon Levy Dead Sea Scrolls Digital Library collection dssllweb . We also use images from Brill Publishers lim1995dead , especially in cases where the manuscript is unavailable in the IAA collection. For this study, these images underwent multiple preprocessing measures to become suitable for pattern recognition-based techniques. It should be noted that the images are extremely difficult to work with (some examples can be seen in Figure 12 in Appendix E and Figure 25 in Appendix G; see also Section G.1 in Appendix G). We are not dealing with digitally encoded text but with pixel images of highly degraded manuscripts as input.

We utilize multispectral band images of each fragment and employ an in-house image fusion technique dhali2019binet to generate a three-channel image. The resulting image representation enhances ink-vs-background contrast and therefore facilitates the effective separation of ink from backgrounds, commonly called binarization. For this purpose, we employ BiNet dhali2019binet , an artificial neural network based on an encoder-decoder U-net architecture designed to binarize the diverse range of scroll images. The resulting binarized images consist solely of black foreground pixels (ink) against a white background, ensuring that subsequent analyses focus exclusively on the handwritten patterns while minimizing inadvertent matches due to material-texture attributes. We further correct the rotation of the binarized images and divide them into multiple parts to maintain a balanced distribution of handwritten characters within each new image. No extraneous image material was used to train this binarization method.

Thus, we obtained a data set of 75 images from the 24 14C-dated manuscripts. We used 62 of these images to train Enoch. The remaining 13 images, chosen deliberately and randomly, were passed as unseen test data to cross-validate the robustness and reliability of Enoch’s performance. The prediction of these 13 images by Enoch gives an 85.14% overlap to the original 14C probability distributions (see Table 17 in Appendix I). The image samples typically contain 150–200 characters, which has been shown to be sufficient for the comparable task of writer identification Brink2008 .

2.2 Extracting features for style attribution

It should be noted that in this context, ‘style’ is not related to textual content or wording. In fact, for characterizing the handwritten shapes, small shapes along the ink trace are used, largely uncoupled from the textual content, because we want to avoid spurious matches or date predictions on the basis of textual content. Once the training images were available, we could perform feature extraction techniques to translate handwriting patterns into feature vectors. The feature vectors relate directly to the shape-based evidence of the ink traces in the manuscripts and have a solid basis in writer identification Bulacu2003 ; Schomaker2004 ; Bulacu2007 and document dating Dhali2020 ; He2016 . We extract features from both the allographic and textural levels of characters PlosOne . An overview of machine-learning methods can be found in Sommerschield2023 .

The first, allographic, method uses a self-organized character map obtained using a Kohonen neural network. As an example, this allographic codebook feature allows for a 93% (±σ=2.3plus-or-minus𝜎2.3\pm~{}\sigma=2.3± italic_σ = 2.3%) accuracy classification of the scripts ‘Hasmonaean’ vs. ‘Herodian’, using PCA, on 590 labeled manuscripts, results averaged over 32 random odd/even splits for training/testing monknet . The second, textural, method uses statistical pattern recognition on angular information. The ‘hinge’ method for estimating the curvature distribution has been used extensively in writer verification and dating studies Bulacu2007 ; Adam2018 ; Dhali2020 . Whereas the allographic feature addresses stylistic elements at the character level, the ‘hinge’ method concerns a micro-level feature directly related to the original writing activity that yielded the curvature of the ink trace. Therefore, we make a weighted combination of textural and allographic features to obtain an adjoined feature vector for each manuscript image. Such a feature vector constitutes the input data to Enoch.

2.3 Bayesian ridge regression

Due to the limited size of the data set, we cannot employ high-parametric models like period-specific temporal codebooks He2016 . Instead, we utilize conditional modeling with Bayesian ridge regression Hoerl2000 . This approach applies Bayesian inference to estimate model parameters for date prediction. By placing a prior distribution on the parameters and updating it with observed data using Bayes’ rule, we obtain the posterior distribution of the parameters and predicted dates. The Bayesian approach is chosen because our target output data represents probability curves for 14C dates (i.e., a vector) containing the accepted 2σ𝜎\sigmaitalic_σ calibrated ‘OxCal’ ranges. This probabilistic approach enables us to incorporate all available information while maintaining interpretability. Moreover, instead of producing a single number for the estimated date of a sample, it provides a comprehensive posterior distribution that allows us to assess the uncertainty associated with the estimated dates. Additionally, Enoch can utilize the Bayesian approach to provide error margins for predictions on unseen data.

2.4 Testing Enoch

Once Enoch was trained, we performed the validation by leave-one-out tests to check its performance. At this point, we took the calibrated style-based date estimation method of Enoch and applied it to a collection of 135 unseen manuscripts from the Dead Sea Scrolls as test data (see Table 16 in Appendix I).

We use two types of data-balancing techniques to compensate for the imbalanced distribution of the training data over different periods. One balancing technique involves data augmentation using random elastic morphing bulacu2009morph to create a balanced training data distribution. The second balancing is done on the output date predictions. This post-data-balancing uses accumulated training probabilities and training data point counts with 5%, 10%, and 20% threshold values to avoid under-sampled time regions.

The general recipe for Enoch’s analysis of manuscript images is presented in Table 1. Before applying this to scrolls manuscripts, we tried out a known mediaeval, dated benchmark data set of charters, MPS He2016 , with success Koopmans2023 .

Table 1: Style-based date prediction recipe for Enoch
  1. 1.

    Select and crop the relevant manuscript images based on scholarly identification criteria;

  2. 2.

    In the images, perform a separation of the ink trace from the material background texture by using a deep-learning-based U-net variant for multispectral image-intensity binarization dhali2019binet ;

  3. 3.

    For each manuscript, compute two shape descriptors: a histogram of allographic fraglet occurrence and a histogram of angular co-occurrence along the ink-trace edges Bulacu2007 ; Schomaker2004 ;

  4. 4.

    Adjoin the two feature vectors, properly weighted, to a single handwriting-style vector bulacu2009morph ;

  5. 5.

    In order to decorrelate the features, avoid collinearity, and minimize the necessary number of parameters in the next stage, perform a strong dimensionality reduction (PCA, 20 dimensions).

  6. 6.

    Take the 14C-dated manuscript-image samples for training Enoch as a style-based Bayesian ridge-regression model with a scalar date estimate as the target output. In this training, augment the image data set by using random elastic morphing to obtain a sufficient and balanced number of examples per 14C-dated reference. This step is an essential, new contribution that allows a merger of 14C-based and style-based information in the date estimation. For validating Enoch, use the leave-one-out approach: each sample that is under evaluation does not occur in the training data;

  7. 7.

    Harvesting: estimate style-based dates for undated manuscripts.

3 14C dates and palaeographic estimates

The AMS results yielded 26 accepted 14C dates (see Sections B.4B.6 in Appendix B), which are shown in Table LABEL:tab:summarized-c14 (Appendix B). The historical date preserved in Mur19 is consistent with the calibrated age range obtained by 14C (see Section B.4 in Appendix B). Overall, we improved and extended the existing series of 14C-dated Dead Sea Scrolls Bonani1992 ; Jull1995 .

Figure 1 shows the comparison between the 2σ𝜎\sigmaitalic_σ calibrated ranges and traditional palaeographic estimates (in blue and red). This demonstrates that 17 of the 26 sampled manuscripts have whole or partial overlap, and 9 out of 26 samples yield calibrated ages that do not overlap with previous palaeographic estimates (see Appendix D).

Overall, the 14C results indicate older date ranges for individual manuscripts as well as for the emergence of the so-called ‘Hasmonaean’ and ‘Herodian’ scripts. Only two manuscripts have date ranges that go in the direction of a younger possible range. The 14C results for most manuscripts confirm the basic distinction between Hasmonaean-type manuscripts that are older, and Herodian-style manuscripts that are younger, and also between so-called ‘Archaic’ and Hasmonaean-type manuscripts.

However, the 14C date ranges for manuscripts that are traditionally considered Hasmonaean and Herodian are quite differently distributed across the timeline. As can be seen in Figure 1 (in blue), Hasmonaean-type manuscripts are all grouped together in a narrower part of the timeline but Herodian-type manuscripts are more spread out across the timeline, extending from the second century CE all the way back to the second century BCE (see Sections D.1.1D.1.3 in Appendix B).

Sample 4Q114 is one of the most significant findings of the 14C results. The manuscript preserves Daniel 8–11, which scholars date on literary-historical grounds to the 160s BCE SchmidSchroeter ; zenger9 . The accepted 2σ𝜎\sigmaitalic_σ calibrated peak for 4Q114, 230–160 BCE, overlaps with the period in which the final part of the biblical book of Daniel was presumably authored (see Section D.1.2 in Appendix D).

4 Validation of Enoch

Figure 1 (in green) also shows the results of cross-validation and leave-one-out tests for training Enoch. The choice for the bandwidths (2σ𝜎\sigmaitalic_σ date ranges for 14C, 1σ𝜎\sigmaitalic_σ uncertainties of the ridge regression for style-based predictions) is based on the intrinsic reliability of the two information sources. 14C date ranges are evidently superior to style-based predictions.

Enoch’s style-based predictions largely follow the 14C results, even though the validation samples (rows) are in no way present in the training data. In the range 300–50 BCE, Enoch’s estimates provide a more fine-grained distribution than the 14C results. For samples 5/6Hev1b, Mas1k, and XHev/Se2, the style-based estimate is earlier and more uncertain. However, 11Q5 shows that in this late date range, a fairly certain style-based date estimate above 100 CE can also be achieved. This may go against historical reconstructions according to which the scrolls were hidden in the Qumran caves before the summer of 68 CE Popovic2012 . Yet, we did not impose here a chronological limit on the model, because of the 14C result for 11Q5, and in order to examine the possibility of style continuation after 70 CE.

Regarding the differences between the 14C date ranges and Enoch’s script style-based estimates, the mean absolute error (MAE) is 30.730.730.730.7 years. The MAE drops to 27.927.927.927.9 years when minor peaks (less than 4%percent44\%4 % in all cases except for 5.2%percent5.25.2\%5.2 % in 4Q2 and 9.4%percent9.49.4\%9.4 % in 4Q416) are ignored (see Figure 29 in Appendix H). In manuscript dating, MAE is commonly used HamidMAE2019 for evaluation of a regression method. The difference with rms error is limited HodsonMAE202 . With the chosen 2σ𝜎\sigmaitalic_σ (14C) and 1σ𝜎\sigmaitalic_σ (AI) bandwidths, the error for the leftmost margin is 6.46.46.46.4 years while for the rightmost margin it equals 38.438.4-38.4- 38.4 years, indicating that Enoch’s style-based estimate range ends earlier than the 14C range. For each sample, the date ranges of the two information sources have partial to full overlap with an average of 88.8%percent88.888.8\%88.8 %. For Ithaca assael2022restoring , AI and epigraphy were used as two information sources to predict dates for ancient Greek inscriptions. Their prediction provides an average distance of 29.3 years from the target dating brackets, with a median distance of 3 years based on the totality of texts. We also aim for date prediction tasks, but, unlike Ithaca, we utilize three information sources: 14C, shape-based writing style analysis (AI), and palaeography.

Refer to caption
Figure 1: Overview of date estimations by three information sources and a calendar date: (accepted) 2σ𝜎\sigmaitalic_σ calibrated ranges 14C (blue), Enoch (green), palaeography (red), and historical (black). The vertical axis contains the manuscript numbers, and the horizontal axis contains dates: BCE in negative and CE in positive.

Figure 1 shows the general result that, on average, 14C date ranges and Enoch’s predictions indicate older dates than palaeography. Only 4Q201 and 11Q5 have older palaeographic date estimates, although there is an overlap with the 14C results (see Section D.1.1 in Appendix D).

5 Harvest of Enoch’s date predictions for previously undated manuscripts

Table 2: Expert validation of Enoch’s date predictions
Prediction is: Subcategory Manuscript count Prozentualer Anteil
Realistic 107 79.26%
Unrealistic Indecisive 4 28 20.74%
Too old 13
Too young 11
Total manuscripts 135 100.00%

Table 2 summarizes Enoch’s date predictions for 135 previously undated manuscripts. Expert palaeographers evaluated the style-based date predictions, condensing the prediction into two main categories: realistic𝑟𝑒𝑎𝑙𝑖𝑠𝑡𝑖𝑐realisticitalic_r italic_e italic_a italic_l italic_i italic_s italic_t italic_i italic_c and unrealistic𝑢𝑛𝑟𝑒𝑎𝑙𝑖𝑠𝑡𝑖𝑐unrealisticitalic_u italic_n italic_r italic_e italic_a italic_l italic_i italic_s italic_t italic_i italic_c, the latter subdivided into too𝑡𝑜𝑜tooitalic_t italic_o italic_o old𝑜𝑙𝑑olditalic_o italic_l italic_d and too𝑡𝑜𝑜tooitalic_t italic_o italic_o young𝑦𝑜𝑢𝑛𝑔youngitalic_y italic_o italic_u italic_n italic_g (see Appendix G).

As can be seen in Table 2, 107107107107 (79%percent7979\%79 %) of the undated manuscripts were dated realistically, according to the palaeographers. Enoch’s date prediction task is not a 50/50505050/5050 / 50, binary decision task but regressive, with many possible years in the interval 300 BCE–200 CE. Assuming a coarseness of 25 years, as in the MPS project He2016 , the date range would consist of 20 bins, with a 5%percent55\%5 % prior-probability hit rate. Therefore, a success rate of 79%percent7979\%79 % is unlikely to be accidental. For 21%percent2121\%21 %of the manuscripts, the palaeographers judged Enoch’s date predictions to be unrealistic. Enoch’s 28 unrealistic predictions were divided between too old (46%percent4646\%46 %) and too young (39%percent3939\%39 %).

Refer to caption
Figure 2: a, from full spectrum colour image to binarized image to 14C plot for 4Q259 that went into the training of Enoch. b, from full spectrum colour image to binarized image to Enoch’s date prediction plot for 4Q319 (see also Fig. 12). Red bars represent the probability of each date bin. The blue curve shows the smoothed distribution. Grey spikes indicate the local uncertainty of the estimate.

Samples 4Q259 and 4Q319 show that Enoch can accurately find the same date estimate for the same writing styles. The accepted 2σ𝜎\sigmaitalic_σ calibrated range of 4Q259 was used to train Enoch. Images of 4Q319 were part of a test set already in 2021. 4Q259 contains text that is part of the so-called Rule of the Community. 4Q319 contains a calendrical text. Because of perceived generic differences, 4Q319 received a separate classification number but is materially actually part of the same manuscript as 4Q259 Hempel2020 . At the time of the test, 6 July 2021, this identity was not known to the AI experts. Figure 2 shows that Enoch was able to give a date prediction estimate for 4Q319 that matches the accepted 2σ𝜎\sigmaitalic_σ calibrated range of 4Q259 (see Section G.5 in Appendix G).

Previously, we demonstrated that two scribes were at work in the Great Isaiah Scroll PlosOne . Now, Enoch shows that there is no temporal difference between the two halves of the manuscript as if one part were written significantly later than the other. On the contrary, both scribes are estimated to have worked on their respective part of the scroll of 1QIsaa in the same period. Figure 3 shows that Enoch dates the two halves consistently between 180–100 BCE.

Refer to caption
Figure 3: Enoch’s date prediction plots for 6 of the 54 columns from the two halves of 1QIsaa (the left 3 columns are from the first half of the manuscript, the right 3 columns are from the second half of the manuscript).

6 Discussion and conclusions

6.1 Aramaic/Hebrew script development in ancient Judaea

This study in style-based date prediction using the Enoch approach is a first step. The 14C data generated in this study in combination with machine-learning-based writing style analysis enabled us to examine Aramaic/Hebrew script in individual manuscripts with an empirically based precision that was not possible before. We combined palaeography, AI, and 14C to create a date-prediction model that leads to a new chronology of the scrolls during the third century BCE until the second century CE. We give four novel insights into Aramaic/Hebrew script development during this period and the date of individual manuscripts.

First, 14C date ranges and Enoch’s style-based estimates are overall older than previous palaeographic estimates. These older dates for the scrolls are realistic. Hasmonaean-type manuscripts have accepted 2σ𝜎\sigmaitalic_σ calibrated ranges that allow for older dates in the first half of the second century BCE, and sometimes slightly earlier, instead of only circa 150–50 BCE. There are no compelling palaeographic or historical reasons that preclude these older dates as reliable time markers for the ‘Hasmonaean’ script. This also applies to the accepted 2σ𝜎\sigmaitalic_σ calibrated range for 4Q70 and its ‘Archaic’ script.

Second, ‘Herodian’ script emerged earlier than previously thought. This suggests that the ‘Hasmonaean’ and ‘Herodian’ scripts were not transitioning from the mid-first century BCE onward, but that they existed next to each other at a considerably earlier date.

Third, this novel approach of palaeography leads to a new chronology of the scrolls that impacts our understanding of the history of ancient Judaea and the people behind the scrolls. Hypotheses about whether the movement behind the scrolls originated in the second or first century BCE will need to be reconsidered in light of Enoch’s second-century BCE date predictions for Hasmonaean-type manuscripts such as 1QS and 4Q163 (see Appendix G), bearing texts that are regarded typical for the movement. Scholars often assume that the rise and expansion of the Hasmonaean kingdom from the mid-second century BCE onward caused a rise in literacy and gave a push to scribal and intellectual culture. Yet, the results of this study attest to the copying of multiple literary manuscripts before this period. One example is 4Q109, a copy of the biblical book of Ecclesiastes, a book which scholars tentatively date to the end of the third century BCE SchmidSchroeter , for which Enoch gives a third-century BCE date prediction (see Appendix G), close to Archaic-type manuscripts such as 4Q52 and 4Q70—copies of the biblical books of Samuel and Jeremiah.

Fourth, this study’s 14C result for 4Q114 and Enoch’s date prediction for 4Q109 now establish these to be the first known fragments of a biblical book from the time of their presumed authors SchmidSchroeter . Also, Enoch’s integration of multiple dating methods yields a strongly improved value of sources of evidence and allows for a mutual confirmation of evidence from the two sources—physical (material) and geometric (shape-based).

The results of this study thus dismantle unsubstantiated historical suppositions and chronological limitations, and call into question the validity of the default model’s relative typology. This relative typology can only be maintained with restrictions. The spread of the Hasmonaean-type manuscripts over the timeline does not affect the default relative typology in a major way, but the older, second-century BCE date ranges of the Herodian-type manuscripts challenge the relative typology. More research is needed to solve this issue.

6.2 The Enoch approach to dating ancient manuscripts

To our knowledge, Enoch is the first complete machine-learning-based model that employs raw image inputs to deliver probabilistic date predictions for handwritten manuscripts utilizing the entire probability distribution from 14C output, and that is completed by palaeographic input while ensuring transparency and interpretability through its explainable design. Palaeographers and historians may now use Enoch’s quantitative, probability-based approach to palaeography as a tool to examine date predictions. The probability-based options can help in decision-making and to explicate qualitative palaeographic reasoning. Also, the methods underpinning Enoch can be used for date prediction in other partially-dated manuscript collections.

It could be argued that the style-based predictions are influenced by the 14C-based training of the model. However, the leave-one-out validation results indicate that unseen samples obtain their interpolated position on the time axis based on the detected handwriting style in the images. The placement of an unseen sample on the time axis is not fundamentally constrained. Any date in the time range of 300 BCE to 200 CE could have been reached, looking at all style-based dates empirically covered by the model.

In this study, we have avoided using palaeographic estimates as target values for machine learning because our goal is to provide physical (material) and geometric (shape-based) evidences for manuscript dating. While the use of palaeographic estimates as target values for machine learning is technically possible, we consider it too risky, given the existing uncertainties and lack of consensus associated with the precise dating of individual manuscripts.

It becomes apparent that a broader time axis, with a sufficient number of samples at the tails—both at the BCE and CE ends—will allow for a larger time range of predictions. It would be very valuable if new manuscript samples could be added to the current collection. The consequences of each newly added manuscript sample to the Dead Sea Scrolls 14C reference collection can now easily be computed using the Enoch approach.

Enoch’s 79797979% success rate in date prediction is potentially interesting in view of the fully undated status of the manuscripts before the analysis was performed. Moreover, the images for the test data were not treated with the same care as those for the training of Enoch. All the training images underwent rotation and alignment correction, followed by a clean arrangement of smaller fragments within each manuscript, to obtain accurate feature extractions for the style periods represented by those manuscripts. If the same image preparation treatment were applied to every single image of the test data, it is to be expected that the percentage of realistic date predictions would exceed 79797979%. The 28282828 (21212121%) manuscripts that received an unrealistic date prediction in the current test may be due to image quality issues (see Figure 13 in Appendix E). The results of the test samples are likely to be better if more accurate manual cropping and rotation correction had been performed, similar to what has been done to the training samples.

In its manuscript analysis, Enoch differs from traditional palaeographic approaches. Enoch emphasizes shared characteristics and similarity matching between trained and test manuscripts, whereas traditional palaeography focuses on subtle differences that are assumed to be indicative for style development. Combining dissimilarity matching and adaptive reinforcement learning can uncover hidden patterns. This interdisciplinary fusion may enrich our understanding of textual content, material properties, and historical context, leading to enhanced interpretations of the past. This remains a task for the future. New 14C evidence or, with new discoveries, a whole range of date-bearing manuscripts can be added to Enoch’s training data for further refinement and precision, continuously improving accuracy. Although the limited data were insufficient for a full deployment of deep-learning in the prediction task (see Appendix F), future research needs to address the problems of sparse labeling and high dimensionality. It is to be expected that new solutions will appear here, because these problems are encountered in many application domains. If palaeographers are willing to accept the use of ‘black box’, pre-trained deep-learning models that are based on completely extraneous large image and photograph collections, future research may be directed at adapting the output of such models to the vectorial regression-based date-prediction task that is proposed in the current article.

7 Online content

All data, code, and test film associated with this article are publicly available on Zenodo with the following DOIs:
Data and prediction plots (v3): https://doi.org/10.5281/zenodo.10998958
Code and feature files (v5): https://doi.org/10.5281/zenodo.11371749
Film (see details in Appendix G): https://doi.org/10.5281/zenodo.8167946

8 Supplementary information

This article has ten supplementary materials:

  • Appendix A: The dating problem of the Dead Sea Scrolls

  • Appendix B: Radiocarbon dating of the Dead Sea Scrolls

  • Appendix C: 14C determinations and calibrated date plots

  • Appendix D: Palaeography and radiocarbon dating of the Dead Sea Scrolls

  • Appendix E: Artificial intelligence (AI) in dating the scrolls

  • Appendix F: On the use of pre-trained deep learning methods for image-based dating

  • Appendix G: Enoch’s date predictions for 135 previously undated manuscripts

  • Appendix H: Comparative plots for different information sources

  • Appendix I: List of images for different tests

  • Appendix J: Radiocarbon sample information

  • Appendix K: Data-sheet radiocarbon runs

  • Appendix L: Worksheet of comparative data for 2σ𝜎\sigmaitalic_σ 14C dates and traditional palaeographic estimates

9 Acknowledgments

The authors thank P. Shor, J. Uziel, T. Bitler, H. Libman, B. Riestra, O. Rosengarten, and S. Halevi at the Dead Sea Scrolls Unit of the Israel Antiquities Authority (IAA) and E. Boaretto (advisor to the IAA from the Weizmann Institute of Science, Jerusalem) for providing physical samples and multispectral images of the scrolls—courtesy of the Leon Levy Dead Sea Scrolls Digital Library; Brill Publishers for the Dead Sea Scrolls images from the Brill Collection; A. Aerts-Bijma and D. Paul for handling and measuring the 14C samples at the Center for Isotope Research (Groningen); S. Legnaioli for the Raman analyses performed at the CNR-ICCOM (Pisa); A. Krauss and T. van der Werff for their contributions to developing and testing Enoch; L. Bouma for cleaning images; D. Longacre, G. Hayes, A.W. Aksu, H. van der Schoor, C. van der Veer, and M. van Dijk for their contributions to preparing images for training Enoch; M.W. Dee for advising on and inspecting the code and data acquisition process from OxCal to the Enoch model at the Center for Isotope Research (Groningen). This project has received funding by the European Research Council under the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 640497 (HandsandBible). M.P. and E.T. were also supported by NWO, Netherlands Organisation for Scientific Research and FWO, the Research Foundation - Flanders (SV-15-29).

Declarations

Please check the Instructions for Authors of the journal to which you are submitting to see if you need to complete this section. If yes, your manuscript must contain the following sections under the heading ‘Declarations’:

  • Funding:
    The work has been supported by an ERC Starting Grant of the European Research Council (EU Horizon 2020): The Hands that Wrote the Bible: Digital Palaeography and Scribal Culture of the Dead Sea Scrolls (HandsandBible # 640497).

  • The authors have no conflict of interest/Competing interests

  • Availability of data, materials, and code: see Section 7

  • Authors’ contributions: all the authors contributed equally to the article.

References

  • \bibcommenthead
  • (1) Nongbri, B.: Palaeographic analysis of codices from the early christian period: A point of method. Journal for the Study of the New Testament 42, 84–97 (2019). https://doi.org/10.1177/0142064x19855582
  • (2) Orsini, P.: Introduction. In: Studies on Greek and Coptic Majuscule Scripts and Books, p.VII–XVI. De Gruyter, Berlin (2018). https://doi.org/10.1515/9783110575446-000
  • (3) Orsini, P., Clarysse, W.: Early New Testament manuscripts and their dates: A critique of theological palaeography. Ephemerides Theologicae Lovanienses 88, 443–474 (2012)
  • (4) Brooke, G.J., Hempel, C. (eds.): T&T Clark Companion to the Dead Sea Scrolls. T&T Clark, London (2018)
  • (5) Tigchelaar, E.: Seventy years of palaeographic dating of the Dead Sea Scrolls. In: Drawnel, H. (ed.) Sacred Texts and Disparate Interpretations: Qumran Manuscripts Seventy Years Later, pp. 258–278. Brill, Leiden (2020). https://doi.org/10.1163/9789004432796_014
  • (6) Puech, E.: La paléographie des manuscrits de la mer Morte. In: Fidanzio, M. (ed.) The Caves of Qumran, pp. 96–105. Brill, Leiden (2017). https://doi.org/10.1163/9789004316508_008
  • (7) Cross, F.M.: The development of the Jewish scripts. In: Leaves from an Epigrapher’s Notebook: Collected Papers in Hebrew and West Semitic Palaeography and Epigraphy, pp. 1–43. Eisenbrauns, Winona Lake, IN (2003). https://doi.org/10.1163/9789004369887_002. (originally published in 1961
  • (8) Avigad, N.: The palaeography of the Dead Sea Scrolls and related documents. In: Scripta Hierosolymitana, Volume IV: Aspects of the Dead Sea Scrolls, pp. 56–87. Magness Press, Jerusalem (1965)
  • (9) Yardeni, A.: Textbook of Aramaic, Hebrew and Nabataean Documentary Texts from the Judaean Desert and Related Material, 2 Vols. The Hebrew University, Jerusalem (2000)
  • (10) Yardeni, A.: The palaeography of 4QJera – a comparative study. Textus 15, 233–268 (1990). https://doi.org/10.1163/2589255x-01501012
  • (11) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 84–90 (2017). https://doi.org/10.1145/3065386
  • (12) Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, ??? (2000). https://doi.org/%****␣ms.tex␣Line␣325␣****10.1007/978-1-4757-3264-1
  • (13) He, S., Schomaker, L.: Deep adaptive learning for writer identification based on single handwritten word images. Pattern Recognition 88, 64–74 (2019). https://doi.org/10.1016/j.patcog.2018.11.003
  • (14) He, S., Schomaker, L.: FragNet: Writer identification using deep fragment networks. IEEE Transactions on Information Forensics and Security 15, 3013–3022 (2020). https://doi.org/10.1109/tifs.2020.2981236
  • (15) He, S., Schomaker, L.: GR-RNN: Global-context residual recurrent neural networks for writer identification. Pattern Recognition 117, 107975 (2021). https://doi.org/10.1016/j.patcog.2021.107975
  • (16) Zhang, Z., Schomaker, L.: DiverGAN: An efficient and effective single-stage framework for diverse text-to-image generation. Neurocomputing 473, 182–198 (2022). https://doi.org/10.1016/j.neucom.2021.12.005
  • (17) Ameryan, M., Schomaker, L.: How to limit label dissipation in neural-network validation: Exploring label-free early-stopping heuristics. Journal on Computing and Cultural Heritage 16(1), 1–20 (2023). https://doi.org/10.1145/3587168
  • (18) Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., de Freitas, N.: Restoring and attributing ancient texts using deep neural networks. Nature 603, 280–283 (2022). https://doi.org/10.1038/s41586-022-04448-z
  • (19) Bonani, G., Ivy, S., Wölfli, W., Broshi, M., Carmi, I., Strugnell, J.: Radiocarbon dating of fourteen Dead Sea Scrolls. Radiocarbon 34, 843–849 (1992). https://doi.org/10.1017/s0033822200064158
  • (20) Jull, A.J.T., Donahue, D.J., Broshi, M., Tov, E.: Radiocarbon dating of scrolls and linen fragments from the Judean Desert. Radiocarbon 37, 11–19 (1995). https://doi.org/%****␣ms.tex␣Line␣450␣****10.1017/s0033822200014740
  • (21) Rasmussen, K.L., van der Plicht, J., Cryer, F.H., Doudna, G., Cross, F.M., Strugnell, J.: The effects of possible contamination on the radiocarbon dating of the Dead Sea Scrolls I: castor oil. Radiocarbon 43, 127–132 (2001). https://doi.org/10.1017/S0033822200031702
  • (22) Rasmussen, K.L., van der Plicht, J., Doudna, G., Cross, F.M., Strugnell, J.: Reply to Israel Carmi (2002): “Are the 14C dates of the Dead Sea Scrolls affected by castor oil contamination?”. Radiocarbon 45, 497–499 (2003). https://doi.org/10.1017/S0033822200032847
  • (23) Rasmussen, K.L., van der Plicht, J., Doudna, G., Nielsen, F., Højrup, P., Stenby, E.H., Pedersen, C.T.: The effects of possible contamination on the radiocarbon dating of the Dead Sea Scrolls II: empirical methods to remove castor oil and suggestions for redating. Radiocarbon 51, 1005–1022 (2009). https://doi.org/10.1017/S0033822200034081
  • (24) Ramsey, C.B.: Development of the radiocarbon calibration program. Radiocarbon 43, 355–363 (2001). https://doi.org/10.1017/s0033822200038212
  • (25) Ramsey, C.B., van der Plicht, J., Weninger, B.: ‘Wiggle matching’ radiocarbon dates. Radiocarbon 43, 381–389 (2001). https://doi.org/10.1017/s0033822200038248
  • (26) Israel Antiquities Authority: The Leon Levy Dead Sea Scrolls Digital Library. https://www.deadseascrolls.org.il/explore-the-archive. Accessed: 2023-04-10
  • (27) Lim, T., Alexander, P.: The Dead Sea Scrolls Electronic Library (Volume 1). Brill Publishers (1995)
  • (28) Dhali, M.A., de Wit, J.W., Schomaker, L.: Binet: Degraded-manuscript binarization in diverse document textures and layouts using deep encoder-decoder networks. arXiv preprint (2019). https://doi.org/10.48550/arXiv.1911.07930
  • (29) Brink, A., Bulacu, M., Schomaker, L.: How much handwritten text is needed for text-independent writer verification and identification. In: 2008 19th International Conference on Pattern Recognition (ICPR). IEEE, Piscataway (2008). https://doi.org/10.1109/icpr.2008.4761908
  • (30) Bulacu, M., Schomaker, L., Vuurpijl, L.: Writer identification using edge-based directional features. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. IEEE Comput. Soc. https://doi.org/10.1109/icdar.2003.1227797
  • (31) Schomaker, L., Bulacu, M.: Automatic writer identification using connected-component contours and edge-based features of uppercase western script. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 787–798 (2004). https://doi.org/10.1109/tpami.2004.18
  • (32) Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 701–717 (2007). https://doi.org/10.1109/tpami.2007.1009
  • (33) Dhali, M.A., Jansen, C.N., de Wit, J.W., Schomaker, L.: Feature-extraction methods for historical manuscript dating based on writing style development. Pattern Recognition Letters 131, 413–420 (2020). https://doi.org/10.1016/j.patrec.2020.01.027
  • (34) He, S., Samara, P., Burgers, J., Schomaker, L.: Image-based historical manuscript dating using contour and stroke fragments. Pattern Recognition 58, 159–171 (2016). https://doi.org/10.1016/j.patcog.2016.03.032
  • (35) Popović, M., Dhali, M.A., Schomaker, L.: Artificial intelligence based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsaa). PLOS ONE 16, 0249769 (2021). https://doi.org/10.1371/journal.pone.0249769
  • (36) Sommerschield, T., Assael, Y., Pavlopoulos, J., Stefanak, V., Senior, A., Dyer, C., Bodel, J., Prag, J., Androutsopoulos, I., de Freitas, N.: Machine learning for ancient languages: A survey. Computational Linguistics, 1–45 (2023). https://doi.org/10.1162/coli_a_00481
  • (37) Schomaker, L.: Monk - Search and annotation tools for handwritten manuscripts. http://monk.hpc.rug.nl/. Accessed: 2023-07-17 (2023)
  • (38) Adam, K., Baig, A., Al-Maadeed, S., Bouridane, A., El-Menshawy, S.: KERTAS: dataset for automatic dating of ancient arabic manuscripts. International Journal on Document Analysis and Recognition (IJDAR) 21, 283–290 (2018). https://doi.org/10.1007/s10032-018-0312-3
  • (39) Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 80–86 (2000). https://doi.org/10.1080/00401706.2000.10485983
  • (40) Bulacu, M., Brink, A., van der Zant, T., Schomaker, L.: Recognition of handwritten numerical fields in a large single-writer historical collection. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 808–812 (2009). https://doi.org/10.1109/ICDAR.2009.8
  • (41) Koopmans, L., Dhali, M., Schomaker, L.: The effects of character-level data augmentation on style-based dating of historical manuscripts. In: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods, pp. 124–135. SCITEPRESS - Science and Technology Publications, Setubal, Portugal (2023). https://doi.org/10.5220/0011699500003411
  • (42) Schmid, K., Schröter, J.: The Making of the Bible: From the First Fragments to Sacred Scripture. Belknap Press, Cambridge, MA (2021)
  • (43) Zenger, E., Frevel, C.: Einleitung in das Alte Testament. Neunte, aktualisierte Auflage. Kohlhammer, Stuttgart (2016)
  • (44) Popović, M.: Qumran as scroll storehouse in times of crisis? A comparative perspective on Judaean Desert manuscript collections. Journal for the Study of Judaism 43, 551–594 (2012). https://doi.org/%****␣ms.tex␣Line␣800␣****10.1163/15700631-12341239
  • (45) Hamid, A., Bibi, M., Moetesum, M., Siddiqi, I.: Deep learning based approach for historical manuscript dating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 967–972 (2019). https://doi.org/10.1109/ICDAR.2019.00159
  • (46) Hodson, T.O.: Root-mean-square error (rmse) or mean absolute error (mae): when to use them or not. Geoscientific Model Development 15(14), 5481–5487 (2022). https://doi.org/10.5194/gmd-15-5481-2022
  • (47) Hempel, C.: The Community Rules from Qumran: A Commentary. Mohr Siebeck, Tübingen (2020). https://doi.org/10.1628/978-3-16-157027-8
  • (48) Gropp, D.M.: Discoveries in the Judaean Desert: Volume XXVIII. Wadi Daliyeh II and Qumran Miscellanea, Part 2. Clarendon Press, Oxford (2001)
  • (49) Naveh, J., Shaked, S.: Aramaic Documents from Ancient Bactria (Fourth Century BCE.). The Khalili Family Trust, London (2012)
  • (50) Porten, B., Yardeni, A.: Textbook of Aramaic Documents from Ancient Egypt, 4 Vols. The Hebrew University, Jerusalem (1986–1999)
  • (51) Benoit, P., Milik, J.T., de Vaux, R.: Discoveries in the Judaean Desert: Volume II. Les grottes de Murabba\laspât, 2 Vols. Clarendon Press, Oxford (1961)
  • (52) Cotton, H.M., Yardeni, A.: Discoveries in the Judaean Desert: Volume XXVII. Aramaic, Hebrew and Greek Documentary Texts from Naḥal Ḥever and Other Sites. With an Appendix Containing Alleged Qumran Texts. Clarendon Press, Oxford (1997)
  • (53) Eshel, E., Kloner, A.: An Aramaic ostracon of an Edomite marriage contract from Maresha, dated 176 BCE. Israel Exploration Journal 46, 1–22 (1996)
  • (54) Geraty, L.T.: The Khirbet el-Kôm bilingual ostracon. Bulletin of the American Schools of Oriental Research 220, 55–61 (1975)
  • (55) Eck, W., Cotton, H.M., Di Segni, L.: Corpus Inscriptionum Iudaeae/Palaestinae: Volume 1, Part 1 vol. 1, pp. 414–416. De Gruyter, Berlin (2010)
  • (56) Avigad, N.: Ancient Monuments in the Kidron Valley. Bialik Institute, Jerusalem (1954)
  • (57) Barag, D.: The 2000-2001 exploration of the tombs of Benei Ḥezir and Zechariah. Israel Exploration Journal 53, 78–110 (2003)
  • (58) Naveh, J.: Dated coins of Alexander Janneus. Israel Exploration Journal 18, 20–26 (1968)
  • (59) Baillet, M., Milik, J.T., de Vaux, R.: Discoveries in the Judaean Desert of Jordan: Volume III. Les ‘petites grottes’ de Qumrân, 2 Vols. Clarendon Press, Oxford (1962)
  • (60) Puech, E.: Discoveries in the Judaean Desert: Volume XXXI. Qumrân Grotte 4.XXII. Textes araméens, première partie: 4Q529-549. Clarendon Press, Oxford (2001)
  • (61) Puech, E.: Inscriptions funéraires palestiniennes: Tombeau de Jason et ossuaires. Revue biblique 90, 481–533 (1983)
  • (62) Cross, F.M.: The oldest manuscripts from Qumran. Journal of Biblical Literature 74, 147–172 (1955)
  • (63) Magness, J.: Ossuaries and the burials of Jesus and James. Journal of Biblical Literature 124, 121–154 (2005)
  • (64) Cross, F.M.: The papyri and their historical implications. In: Lapp, P.W., Lapp, N.L. (eds.) Discoveries in the Wâdī ed-Dâliyeh, pp. 17–29. American Schools of Oriental Research, Cambridge, MA (1974)
  • (65) Cross, F.M.: The development of the Jewish scripts. In: Wright, G.E. (ed.) The Bible and the Ancient Near East, pp. 133–202. Doubleday, Garden City, NY (1961)
  • (66) Cross, F.M.: Palaeography and the Dead Sea Scrolls. In: Flint, P.W., VanderKam, J.C. (eds.) The Dead Sea Scrolls After Fifty Years: A Comprehensive Assessment, Volume One, pp. 379–402. Brill, Leiden (1998)
  • (67) Bonani, G., Broshi, M., Carmi, I., Ivy, S., Strugnell, J., Wölfli, W.: Radiocarbon dating of the dead sea scrolls. Atiqot 20, 27–32 (1991)
  • (68) Popović, M.: Book production and circulation in ancient Judaea: Evidenced by writing quality and skills in the Dead Sea Scrolls Isaiah and Serekh manuscripts. In: Williams, T.B., Keith, C., Stuckenbruck, L. (eds.) The Dead Sea Scrolls in Ancient Media Culture, pp. 199–265. Brill, Leiden (2023). https://doi.org/10.1163/9789004537804_007
  • (69) Longacre, D.: Disambiguating the concept of formality in palaeographic descriptions: Stylistic classification and the ancient jewish hebrew/aramaic scripts. Comparative Oriental Manuscript Studies Bulletin 5, 101–128 (2019). https://doi.org/10.25592/UHHFDM.739
  • (70) Queffelec, A., Bertran, P., Bos, T., Lemée, L.: Mineralogical and organic study of bat and chough guano: implications for guano identification in ancient context. Journal of Cave and Karst Studies 80, 1–17 (2018). https://doi.org/10.4311/2017es0102
  • (71) Dee, M.W., Palstra, S.W.L., Aerts-Bijma, A.T., Bleeker, M.O., de Bruijn, S., Ghebru, F., Jansen, H.G., Kuitems, M., Paul, D., Richie, R.R., Spriensma, J.J., Scifo, A., van Zonneveld, D., Verstappen-Dumoulin, B.M.A.A., Wietzes-Land, P., Meijer, H.A.J.: Radiocarbon dating at Groningen: New and updated chemical pretreatment procedures. Radiocarbon 62, 63–74 (2019). https://doi.org/10.1017/rdc.2019.101
  • (72) Van der Plicht, J., Wijma, S., Aerts, A., Pertuisot, M., Meijer, H.: Status report: the Groningen AMS facility. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 172, 58–65 (2000). https://doi.org/10.1016/S0168-583X(00)00284-6
  • (73) Synal, H.-A., Stocker, M., Suter, M.: MICADAS: A new compact radiocarbon AMS system. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 259, 7–13 (2007). https://doi.org/10.1016/j.nimb.2007.01.138
  • (74) Mook, W.G., van der Plicht, J.: Reporting 14C activities and concentrations. Radiocarbon 41, 227–239 (1999). https://doi.org/10.1017/S0033822200057106
  • (75) Reimer, P.J., Austin, W.E., Bard, E., Bayliss, A., Blackwell, P.G., Ramsey, C.B., Butzin, M., Cheng, H., Edwards, R.L., Friedrich, M., et al.: The intcal20 northern hemisphere radiocarbon age calibration curve (0–55 cal kbp). Radiocarbon 62, 725–757 (2020). https://doi.org/10.1017/RDC.2020.41
  • (76) van der Plicht, J.: Variations in atmospheric 14C. In: Reference Module in Earth Systems and Environmental Sciences. Encyclopedia of Quaternary Science, 3rd Edition, pp. 1–10. Elsevier, Amsterdam (2022). https://doi.org/10.1016/b978-0-323-99931-1.00014-3
  • (77) Tigchelaar, E.: Identification and reidentification of some fragments of 4Q70 (4QJera). Textus 29, 193–200 (2020). https://doi.org/10.1163/2589255x-02901006
  • (78) Koffmahn, E.: Zur Datierung der aramäisch/hebräischen Vertragsurkunden von Murabba\laspat. Wiener Zeitschrift für die Kunde des Morgenlandes 59, 119–136 (1963)
  • (79) Yadin, Y.: The excavation of Masada—1963/64: preliminary report. Israel Exploration Journal 15, 1–120 (1965)
  • (80) Goodblatt, D.: Dating documents in Provincia Iudaea: A note on papyri Murabba\laspat 19 and 20. Israel Exploration Journal 49, 249–259 (1999)
  • (81) Eshel, H.: Documents of the first Jewish revolt from the Judean Desert. In: Berlin, A.M., Overman, J.A. (eds.) The First Jewish Revolt: Archaeology, History, and Ideology, pp. 171–177. Routledge, London (2003)
  • (82) Eshel, H., Broshi, M., Jull, T.A.J.: Four Murabba\laspat papyri and the alleged capture of Jerusalem by Bar Kokhba. In: Katzoff, R., Schaps, D. (eds.) Law in the Documents of the Judaean Desert, pp. 45–50. Brill, Leiden (2005). https://doi.org/10.1163/9789047403999_006
  • (83) Wise, M.O.: Language and Literacy in Roman Judaea. Yale University Press, New Haven, CT (2015)
  • (84) Pajunen, M.: 4QSapiential Admonitions B (4Q185): Unsolved challenges of the Hebrew text. In: Brooke, G., Høgenhaven, J. (eds.) The Mermaid and the Partridge, pp. 191–220. Brill, Leiden (2011). https://doi.org/10.1163/ej.9789004194304.i-310.41
  • (85) Degano, I., Modugno, F., Bonaduce, I., Ribechini, E., Colombini, M.P.: Recent advances in analytical pyrolysis to investigate organic materials in heritage science. Angewandte Chemie International Edition 57, 7313–7323 (2018). https://doi.org/10.1002/anie.201713404
  • (86) La Nasa, J., Modugno, F., Degano, I.: Liquid chromatography and mass spectrometry for the analysis of acylglycerols in art and archeology. Mass Spectrometry Reviews 40, 381–407 (2021). https://doi.org/10.1002/mas.21644
  • (87) La Nasa, J., Biale, G., Sabatini, F., Degano, I., Colombini, M.P., Modugno, F.: Synthetic materials in art: a new comprehensive approach for the characterization of multi-material artworks by analytical pyrolysis. Heritage Science 7, 1–14 (2019). https://doi.org/10.1186/s40494-019-0251-4
  • (88) Colombini, M.P., Modugno, F.: Organic Mass Spectrometry in Art and Archaeology. John Wiley & Sons, Hoboken, NJ (2009)
  • (89) La Nasa, J., Ghelardi, E., Degano, I., Modugno, F., Colombini, M.P.: Core shell stationary phases for a novel separation of triglycerides in plant oils by high performance liquid chromatography with electrospray-quadrupole-time of flight mass spectrometer. Journal of Chromatography A 1308, 114–124 (2013). https://doi.org/10.1016/j.chroma.2013.08.015
  • (90) Ghioni, C., Hiller, J.C., Kennedy, C.J., Aliev, A., Odlyha, M., Boulton, M., Wess, T.J.: Evidence of a distinct lipid fraction in historical parchments: a potential role in degradation? Journal of lipid research 46, 2726–2734 (2005). https://doi.org/10.1194/jlr.M500331-JLR200
  • (91) Charlesworth, J., Cotton, H., Flint, P.: Discoveries in the Judaean Desert: Volume XXXVIII. Miscellaneous Texts from the Judaean Desert. Clarendon Press, Oxford (2000)
  • (92) Tigchelaar, E.: 4Q1 (4QGen-Exoda): Identification of fragments and comments. Textus 32, 19–38 (2023). https://doi.org/10.1163/2589255X-bja10028
  • (93) Ulrich, E., Cross, F.M.: Discoveries in the Judaean Desert: Volume XIV. Qumran Cave 4.IX. Deuteronomy, Joshua, Judges, Kings. Clarendon Press, Oxford (1995)
  • (94) Langlois, M.: Le premier manuscrit du Livre d’Hénoch: Étude épigraphique et philologique des fragments araméens de 4Q201 à Qumrân. Cerf, Paris (2011)
  • (95) Puech, E.: Les copies du livre de Josué dans les manuscrits de la mer Morte: 4Q47, 4Q48, 4Q123 et XJosué. Revue biblique 122, 481–506 (2015). https://doi.org/10.2143/RBI.122.4.3149591
  • (96) Cross, F.M., Parry, D.W., Saley, R.J., Ulrich, E.: Discoveries in the Judaean Desert: Volume XVII. Qumran Cave 4.XII. 1–2 Samuel. Clarendon Press, Oxford (2005)
  • (97) Strugnell, J.: Notes en marge du volume V des “Discoveries in the Judaean Desert of Jordan”. Revue de Qumran 7, 163–276 (1970)
  • (98) Tigchelaar, E.: Lamentations 4:21-22 as another word of consolation in 4Q176. Revue de Qumran 31, 3–9 (2019). https://doi.org/10.2143/RQ.31.1.3286503
  • (99) Milik, J.T.: The Books of Enoch: Aramaic Fragments of Qumrân Cave 4. Clarendon Press, London (1976)
  • (100) Sanders, J.A.: Discoveries in the Judaean Desert: Volume IV. The Psalms Scroll of Qumran Cave XI. Clarendon Press, Oxford (1965)
  • (101) Charlesworth, J.H., Milgrom, J., Qimron, E., Schiffmann, L.H., Stuckenbruck, L.T., Whitaker, R.E. (eds.): The Dead Sea Scrolls. Hebrew, Aramaic, and Greek Texts with English Translations, Volume 1: Rule of the Community and Related Documents. JCB Mohr (Paul Siebeck), Tübingen (1994)
  • (102) Puech, E.: L’alphabet cryptique A en 4QSe (4Q259). Revue de Qumran 18, 429–435 (1998)
  • (103) Puech, E.: Discoveries in the Judaean Desert: Volume XXV. Qumrân Grotte 4.XVIII. Textes hébreux: 4Q521-4Q528, 4Q576-4Q579. Clarendon Press, Oxford (1998)
  • (104) Ulrich, E., Cross, F.M., Davila, J.R.: Discoveries in the Judaean Desert: Volume XII. Qumran Cave 4.VII. Genesis to Numbers. Clarendon Press, Oxford (1995)
  • (105) Baumgarten, J.M.: Discoveries in the Judaean Desert: Volume XVIII. Qumran Cave 4.XIII. The Damascus Document (4Q266-273). Clarendon Press, Oxford (1996)
  • (106) Sirat, C.: Les manuscrits en caractères hébraïques: Réalités d’hier et histoire d’aujourd’hui. Scrittura e civiltà 10, 239–288 (1986)
  • (107) Ulrich, E.: Discoveries in the Judaean Desert: Volume XVI. Qumran Cave 4.XI. Psalms to Chronicles. Clarendon Press, Oxford (2000)
  • (108) Cross, F.M.: The Ancient Library of Qumran, 2nd edn. Doubleday, Garden City, NY (1961)
  • (109) Drawnel, H.: Qumran Cave 4: The Aramaic Books of Enoch, 4Q201, 4Q202, 4Q204, 4Q205, 4Q206, 4Q207, 4Q212. Oxford University Press, Oxford (2019)
  • (110) Broshi, M., Eshel, E., Fitzmyer, J.: Discoveries in the Judaean Desert: Volume XIX. Qumran Cave 4.XIV. Parabiblical Texts, Part 2. Clarendon Press, Oxford (1996)
  • (111) Strugnell, J., Harrington, D., Elgvin, T.: Discoveries in the Judaean Desert: Volume XXXIV. Qumran Cave 4.XXIV. Sapiential Texts, Part 2: 4QInstruction (Mûsār lĕ Mēvîn): 4Q415 ff. Clarendon Press, Oxford (1999)
  • (112) Kimball, S., Mattis, P.: GNU Image Manipulation Program - GIMP (version 2.8.6). https://www.gimp.org/ (2023)
  • (113) Mumuni, A., Mumuni, F.: Data augmentation: A comprehensive survey of modern approaches. Array 16, 100258 (2022). https://doi.org/10.1016/j.array.2022.100258
  • (114) Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982). https://doi.org/10.1007/bf00337288
  • (115) Joachims, T.: Learning to Classify Text Using Support Vector Machines, 2002 edn. The Springer International Series in Engineering and Computer Science. Springer, Dordrecht (2002). https://doi.org/10.1007/978-1-4615-0907-3
  • (116) Bulacu, M., Schomaker, L.: Combining Multiple Features for Text-Independent Writer Identification and Verification. In: Lorette, G. (ed.) Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft, La Baule (France) (2006). Université de Rennes 1. http://www.suvisoft.com. https://hal.inria.fr/inria-00104189
  • (117) Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin/Heidelberg (2009)
  • (118) scikit-learn developers: scikit-learn: Bayesian Ridge Regression. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html. Accessed: 2021-04-15
  • (119) Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin/Heidelberg (2006)
  • (120) Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
  • (121) Villalobos, P., Sevilla, J., Heim, L., Besiroglu, T., Hobbhahn, M., Ho, A.: Will we run out of data? an analysis of the limits of scaling datasets in machine learning. arXiv preprint arXiv:2211.04325 (2022)
  • (122) Epoch: Parameter, Compute and Data Trends in Machine Learning. https://epochai.org/data/pcd (2022)
  • (123) Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., He, Q.: A comprehensive survey on transfer learning. Proceedings of the IEEE 109, 43–76 (2021). https://doi.org/10.1109/jproc.2020.3004555
  • (124) Ribani, R., Marengoni, M.: A survey of transfer learning for convolutional neural networks. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pp. 47–57 (2019). https://doi.org/10.1109/sibgrapi-t.2019.00010. IEEE
  • (125) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv (2020). https://doi.org/10.48550/ARXIV.2010.11929
  • (126) Willemink, M.J., Roth, H.R., Sandfort, V.: Toward foundational deep learning models for medical imaging in the new era of transformer networks. Radiology: Artificial Intelligence 4(6), 210284 (2022). https://doi.org/10.1148/ryai.210284
  • (127) Thambawita, V., Strümke, I., Hicks, S.A., Halvorsen, P., Parasa, S., Riegler, M.A.: Impact of image resolution on deep learning performance in endoscopy image classification: An experimental study using a large dataset of endoscopic images. Diagnostics 11(12), 2183 (2021). https://doi.org/10.3390/diagnostics11122183
  • (128) Haja, A., Schomaker, L.R.B.: A fully automated end-to-end process for fluorescence microscopy images of yeast cells: From segmentation to detection and classification. In: Lecture Notes in Electrical Engineering, pp. 37–46. Springer, ??? (2021). https://doi.org/10.1007/978-981-16-3880-0_5
  • (129) Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Silva, V.W.K., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine 25(8), 1301–1309 (2019). https://doi.org/10.1038/s41591-019-0508-1
  • (130) Ivezić, Ž., Kahn, S.M., Tyson, J.A., Abel, B., Acosta, E., Allsman, R., Alonso, D., AlSayyad, Y., Anderson, S.F., Andrew, J., et al.: Lsst: from science drivers to reference design and anticipated data products. The Astrophysical Journal 873(2), 111 (2019)
  • (131) Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: Computer Vision – ECCV 2018, pp. 19–35. Springer, ??? (2018). https://doi.org/10.1007/978-3-030-01246-5_2
  • (132) Ulrich, E.C., Cross, F.M., Fuller, R.E.: Discoveries in the Judaean Desert XV. Qumran Cave 4.X, The Prophets. Clarendon Press, Oxford (1997)
  • (133) Brooke, G., Collins, J., Elgvin, T., Flint, P., Greenfield, J., Larson, E., Newsom, C., Puech, E., Schiffman, L., Stone, M., Trebolle Barrera, J., VanderKam, J.: Discoveries in the Judaean Desert XXII: Qumran Cave 4. XVII, Parabiblical Texts, Part 3. Clarendon Press, Oxford (1997)
  • (134) Puech, E.: La Lettre essénienne MMT dans le manuscrit 4Q397 et les parallèles. Revue de Qumran 27, 99–135 (2015). https://doi.org/%****␣ms.tex␣Line␣2075␣****10.2143/RQ.27.1.3129665
  • (135) Qimron, E., Strugnell, J.: Discoveries in the Judaean Desert: Volume X. Qumran Cave 4.V, Miqṣat Ma\laspaśe Ha-Torah. Clarendon Press, Oxford (1994)
  • (136) Doudna, G.: Dating the scrolls on the basis of radiocarbon analysis. In: Flint, P.W., VanderKam, J.C. (eds.) The Dead Sea Scrolls After Fifty Years: A Comprehensive Assessment, Volume One, pp. 430–471. Brill, Leiden (1998)
  • (137) Carmi, I.: Are the 14C dates of the Dead Sea Scrolls affected by castor oil contamination? Radiocarbon 44, 213–216 (2002). https://doi.org/10.1017/s0033822200064808

Appendix A The dating problem of the Dead Sea Scrolls

There is broad agreement in scholarship about the long-term lines of development of Aramaic and Hebrew script in Judaea from the fourth century BCE until the second century CE as an evolution from imperial Aramaic chancery script of the fourth century BCE to what became the dominant Jewish square script in the first and second centuries CE. However, when we zoom into the specifics of the centuries in between, the finer typological and chronological distinctions—misleadingly connected with historical-political eras—are not reliably grounded in the data; rather, they rely on so-called absolute pegs that are not absolute at all and on unsubstantiated suppositions about historical processes that would have influenced palaeographic developments.

The main problem is that there is a palaeographic gap between the third century BCE and the second century CE. There is a lack of absolute dates across the time period of the scrolls.

A.1 Too few date-bearing manuscripts to compare with

Palaeographic comparison of undated and dated manuscripts with a similar script is not possible. Only few date-bearing manuscripts have survived and those are at the outer limits of the date range. The oldest, from fourth-century BCE Wadi Daliyeh Gropp2001-dy and fourth-century BCE Bactria Naveh2012-vv , have script comparable to only one or two manuscripts, 4Q52 and 4Q70 (see also Sections A.2.2 in Appendix A and D.1.1 in Appendix B), but not the vast majority of the scrolls. The manuscripts from fifth-century BCE Elephantine are even further away in time PortenYardeni19861999 .

The youngest, from first- and second-century CE Murabba\laspat yardeni2000-mq ; Benoit1961 and Naḥal Ḥever yardeni2000-mq ; CottonYardeni1997 , are mostly in cursive script and cannot be used to compare and date the vast majority of the Hebrew and Aramaic scrolls written in more formal scripts. Those dated manuscripts include about 30 documentary texts, mainly from Murabba\laspat and Naḥal Ḥever. From the same period are 15 undated but datable letters, mostly in cursive script, to and from Simon bar Kokhba, the leader of the revolt against the Romans in 132–135 CE. Dated documents written in formal or bookhand script are limited to a farming contract from Murabba\laspat (Mur24) and three leases of land from Naḥal Ḥever (5/6Ḥev 44, 45, 46), from 133 and 134 CE.

Only one dated ostracon, from 176 BCE from Maresha eshel1996aramaic is known from the crucial period between the third century BCE and the first century CE. Another ostracon, from Khirbet el-Qom, is partially dated, and could date from 277, 241, or 217/6 BCEgeraty1975khirbet . Yet, these can hardly be used for dating formal hands, and cannot even serve as an indicative time marker to tie in manuscripts with a semicursive handwriting.

A.2 Weak workarounds

The way taken by Cross and others around the lack of date-bearing documents in formal, semiformal, or semicursive script from the third century BCE until the first century CE does not solve the problem. The relative development and absolute chronology of the scrolls’ palaeography was determined by taking recourse to a combination of a. supposed absolute pegs and b. unsubstantiated palaeographic and historical suppositions:

A.2.1 Not so absolute time markers

Cross Cross2003 claimed that his model was pegged by a series of absolute datings, in scores if not hundreds of documents inscribed on a variety of materials, especially from the late first century BCE and first century CE. Puech Puech2017 provided additional pegs, specifically for the less formal Hasmonaean hands. Cross and Puech relied on inscriptions on other surfaces such as stone and metal, but here too there are no absolute dates, not even for the most important pegs, such as the Benei Ḥezir tomb and the Jason’s tomb inscriptions. Avigad Avigad1965 acknowledged this, but his caution seems to have been forgotten.

A telling example is the estimated date of the Benei Ḥezir tomb inscription in Jerusalem’s Kidron Valley (CIIP 137 eck2010corpus ), which, according to Cross, had been dated securely, on the basis of archaeological and historical evidence, to the end of the first century BCE. Based on architectural typology of the Hellenistic-style façade and Josephus’s description of the Maccabees’ family tomb in Modi\laspin, Avigad avigad1954 initially suggested to date the tomb to the mid-second century BCE. He then estimated the inscription, which lists eight priests from two generations who had been interred in the tomb, to have been made on the façade one or two generations after the construction of the tomb, in the first half of the first century BCE. Later, he dated the inscription palaeographically to the second half of the first century BCE, or to the Herodian period, and on that basis redated the tomb to the end of the Hasmonaean period Avigad1965 . The precise length of time between the construction of the façade and the writing of the inscription (how many years are one to two generations?) is a conjecture.

After the 2000–2001 exploration of the Benei Ḥezir and Zechariah tombs, Barag Barag2003 put forward new data and interpretations which would indicate that the tomb dated to the period of flourishing in Jerusalem between ca. 132/1 and 63 BCE, most likely in the first century BCE. For example, it features the new type of tombs typical of the Hasmonaean period, which became common in the first century BCE. In the same direction point correspondences with Nabataean tomb architecture (undated but supposed to go back to the first century BCE), which, Barag argued, likely inspired the Benei Ḥezir tomb. As for the inscription, which he conjectured to be 50–100 years younger than the construction of the tomb, he compared its writing to that of the bronze coins of the 25th year of Alexander Jannaeus (79/8) BCE, and posited that the script of the Benei Ḥezir inscription would seem to be slightly later, from the late Hasmonaean or early Herodian period.

Without mentioning the Benei Ḥezir inscription, Naveh Naveh1986 had identified the script on the Alexander Jannaeus coinage as ‘vulgar semiformal’ and saw its closest parallels to the letters found on ossuaries. Cross Baillet1962 had described this style as a “crude simplified derivative” of the formal Herodian hand. Naveh’s aligning of the letters of the coins with those of the ossuaries might suggest that this type of Herodian hand was already anticipated by the Jannaeus coins. Naveh therefore referred to the palaeographical significance of these coins. One should note, however, that neither Naveh nor Barag carefully analysed the letters of the coins.

Summarizing, all scholars associate the Benei Ḥezir tomb with the Maccabaees or the Hasmonaean period (either early or late), and date its inscription to the first century BCE. Yet, Cross’s claim that a late first-century BCE date is secure and an absolute peg, cannot be sustained. The date estimates of the tomb and its inscriptions are not only based on architectural typology, but also on the palaeographic typology. None of the evidence argues against a mid-first century BCE or even earlier date of the inscription.

This is one example to demonstrate that inscriptions in Hebrew and Aramaic on other surfaces, such as stone and metal, cannot fill the void of absolute dating pegs between the third century BCE until the first century CE. In addition to the Benei Ḥezir burial inscription, this applies also, for example, to the so-called Queen Helena inscription (CIIP 123 eck2010corpus ) and Uzziah plaque (CIIP 602 eck2010corpus ) from the first century CE. Strictly speaking, these are not absolutely dated. The same applies to the Jason’s tomb inscriptions (CIIP 392-397 eck2010corpus ), the date of which is not fixed either. Puech Puech2001 ; Puech1983 had initially argued on the basis of his reconstruction of the historical background of the inscriptions that the main one in Aramaic (CIIP 392) must be dated to 82/1 BCE, but more recently he stated that the Aramaic inscription dates palaeographically to about the middle of the first century BCE or slightly earlier Puech2017 . Yet, Yardeni dated the inscription shortly before the destruction of the tomb by an earthquake in 31 BCE eck2010corpus .

Another example are the hundreds of ossuary inscriptions, which Cross Cross1955 said to virtually all belong to the Herodian era. A post-20/15 BCE date for the ossuaries may be archaeologically correct Magness2005 , but the political and historical framing to the Herodian period does not limit the emergence of the script exhibited on the ossuaries to that period. The question when the so-called Herodian script came into being is decided somewhat arbitrarily. Cross took 30 BCE, Milik and Baillet 50 BCE Tigchelaar2020 . Avigad also took 50 BCE or slightly earlier. Furthermore, Avigad already acknowledged that scrolls referred to as ‘Herodian’ may easily be earlier than this period Avigad1965 . In other words, even for the ‘Herodian’ script, just as for the ‘Hasmonaean’ script (see below), the emergence is difficult to establish. In terms of typological development, we have to reckon with the possibility of longer, broader time frames for both scripts.

A.2.2 Unsubstantiated palaeographic and historical premises

Even if one were to accept Cross’s recourse to a series of absolute datings, these would support mainly late first-century BCE and first-century CE comparisons. They do not help to establish the beginnings of the ‘Hasmonaean’ script. Lacking dated material from the third and second centuries BCE, Cross had to take further recourse to two premises to attempt to establish the upper range of the oldest scripts, ‘Archaic’ and ‘Hasmonaean’, from the scrolls, and to limit the earliest dating of the scrolls mainly to the second century BCE, with only a few exceptions for older ‘Archaic’ manuscripts such as 4Q52 and 4Q70.

In addition to a lack of time markers, two palaeographic and historical premises by Cross, Yardeni and others stand out: a slow development of the Aramaic/Hebrew script in the early Hellenistic period (third century BCE); and the emergence of a national script as a watershed around 200–150 BCE.

The presumed slow development of the Aramaic/Hebrew script in the early Hellenistic period is not supported by any dated evidence of that period. The assumption was in part based on a few undated cursive Aramaic papyri from Egypt containing Greek names (hence assumed to be from the third century BCE), but the later discovery of the dated Wadi Daliyeh papyri showed that there were different lines of development, some having taken place much earlier Cross1974 ; Yardeni1990 , thus challenging the premise of the slow development, and reducing the importance of those Hellenistic Egyptian Aramaic papyri for establishing the evolution of the Aramaic/Hebrew script. For Judaea, Cross Cross1955 also referred in passing to a conservative palaeography for the copying of sacred texts, but without further explanation or supporting evidence.

Cross initially dated 4Q52 (4QSamb) to “the last quarter of the third century B.C.” Cross1955 , “no doubt late in the century” Cross1961 , but after the discovery of the Wadi Daliyeh manuscripts, simply to “ca. 250 B.C.” Cross1974 or “the mid-third century BCE” Cross1998 ; Cross2003 . He seems to have been reluctant to date 4Q52 and also 4Q70 (4QJera) earlier, and therefore assumed a very slow evolution of the script, so as not to have a large time gap with the manuscripts written in what he called the “early Hasmonaean” script and which he dated to ca. 150 BCE.

Yardeni, too, regarded 4Q52 and 4Q70 as examples of a transitional stage from the fourth and third century BCE Aramaic script in the direction of the ‘Hasmonaean’ script Yardeni1990 . Her conclusion that these two manuscripts could therefore be dated to the late third or early second century BCE seems to be based rather on the supposed proximity to this national script than on the correspondences with the earlier Aramaic scripts.

However, the palaeographic principle is to date an undated manuscript by comparing its script to that of dated writings with a similar script. This means that the oldest manuscript of the scrolls, 4Q52, must be compared to the Aramaic evidence from Wadi Daliyeh from the fourth century BCE. 4Q52 should then be chronologically closer to those manuscripts, especially WDSP 1 (335 BCE).

The hypothesis of the emergence of a national script around 200–150 BCE and the supposition that the ‘Hasmonaean’ script was a development of the Hasmonaean period after 150 BCE are not supported by any dated evidence but based on historical assumptions, given in passing, about a “nationalistic expansion and resurgent Orientalism” Cross2003 after the death of the Seleucid king Antiochus IV Epiphanes (164 BCE). These unfounded assumptions were then imposed as an interpretative framework on the manuscript evidence. But, given the absence of dated material from the third and second century BCE, there are no historical, typological or other palaeographic reasons for limiting the rise of the script which Cross called ‘Hasmonaean’ to the mid-second century BCE.

This means that manuscripts written in ‘Hasmonaean’ script may date already from the early second century or from the third century BCE. This older dating is also realistic when manuscripts written in so-called ‘Archaic’ script, such as 4Q52 or 4Q70, can be dated earlier in the third century BCE or, for 4Q52, even perhaps in the late fourth century BCE. Furthermore, this older dating can be independently supported by the 14C dating results in this study (see Section D.1 in Appendix D).

A.3 The way out of the gap

Summarizing, the dating problem of the Dead Sea Scrolls, due to the absence of calendar dates, is further confounded by the fact that there are no other date-bearing manuscripts in similar script available for palaeographic comparison. This lack of date-bearing documents cannot be overcome by using inscriptions on other surfaces instead because these, too, have no absolute dates. Also, datable inscriptions mainly date from the first century BCE and first century CE and thus cannot shed light on script developments in the third and second centuries BCE. Historical premises and assumptions remain unsubstantiated and devoid of factual support, and they fail to support a chronological framework for the palaeography and the manuscript evidence. These assumptions cannot determine or sufficiently constrain the dates connected to the writing of the scrolls.

Therefore, 14C dates derived from manuscript samples are needed as absolute time markers to lead the way out of the palaeographic gap. In the absence of an abundance of date-bearing manuscripts written in similar script available for palaeographic comparison, 14C dating, a scientific measurement (“yardstick of time”), provides more reliable time markers, and in combination with our style-based date-prediction model Enoch even more precise time markers.

Appendix B Radiocarbon dating of the Dead Sea Scrolls

Two series of Dead Sea Scrolls were radiocarbon dated in the 1990s, in Zurich and in Tucson, Arizona bonani1991radiocarbon ; Bonani1992 ; Jull1995 . In addition, three samples were submitted to Oxford but in all three cases the chemistry is recorded as having “failed,” i.e., no sample to measure; probably the samples completely dissolved during the pretreatment phase (communication from R. Hedges, Research Laboratory for Archaeology, Oxford, 7 January 2005).

Although scrolls were radiocarbon dated in the 1990s, new radiocarbon dating was necessary because of castor oil contamination issues with these previous dates. Furthermore, since then, radiocarbon dating methods and procedures have improved significantly in terms of better calibration, higher precision obtained by more modern methods and instruments, and also more effective cleaning procedures for dealing with contaminated samples.

In this study, we have taken the following analytical steps for the samples:

  1. 1.

    They were precleaned by a Soxhlet procedure in Odense (see Sections B.2 and B.7.1);

  2. 2.

    Subsequently, they were further pretreated by standard methods in Groningen (see Section B.2);

  3. 3.

    The cleaned samples were dated by Accelerator Mass Spectrometry (AMS) in Groningen (see Sections B.3B.6);

  4. 4.

    During the study, the residual lipids in the extracts of all 30 samples after the Soxhlet cleaning were analysed, and 17 samples have been further investigated by specialized analytical chemistry methods in Pisa regarding the nature of the contamination (see Sections B.7.2B.7.5).

B.1 Selection of Samples

The 30 samples we received from the Israel Antiquities Authority (IAA) were selected on the basis of script and presumed period so as to obtain reliable time markers in the palaeographic gap between the fourth century BCE and the second century CE. We made this selection at the start of the project on the basis of the default model in the field (see Appendix A). The dates associated with the manuscripts according to this traditional model provided balanced coverage of the timeline under investigation (as can be seen in Figure 4). Also, because the 14C dates are needed to go into the date-prediction model, we selected manuscripts that contain a sufficient number of characters in their extant material, 150–200 Brink2008 . The manuscript identity and presumed palaeographic periods of the samples were not known to the staff of the laboratories in Groningen, Odense, and Pisa at the time of the measurement. One of the 30 samples, from a date-bearing document (Mur19), was added as a control text. Its identity and date were also unknown to the laboratories at the time of measurement. Furthermore, in consultation with the IAA, the final selection of samples was determined also on the basis of practical and conservational considerations regarding specific manuscript remains. The IAA provided general indications concerning where the physical samples were taken from (see Appendix J). In our sample set, we have 28 manuscripts of animal skin, and 2 of papyrus (4Q255/4Q433a and Mur19).

From the first century CE onward, a clear distinction appears in the manuscript evidence between the square bookhand script and the standard cursive style yardeni2000-mq , but such a distinction is less pronounced in the manuscript evidence of earlier periods. This also applies to the distinctions made between formal, semiformal, and semicursive styles. Across the continuum of the chronological range covered by the scrolls, exemplary specimens for some styles are lacking Cross2003 . Often manuscripts exhibit a mixture of these presumed styles Popovic2023 ; Tigchelaar2020 ; https://doi.org/10.25592/uhhfdm.739 . Therefore, our sampled manuscripts cover all three categories and their mixtures. The cursive style has been excluded from our sampling, except for Mur19 which was used as a validation test for 14C.

Refer to caption
Figure 4: The selection of 30 manuscript samples according to their traditional palaeographic date estimates.

B.2 Soxhlet Treatment and AAA Pretreatment

Castor oil was used in the 1950s by the original team of scholars reconstructing and editing the Dead Sea Scrolls to clean the manuscripts and to improve readability of the text. But the castor oil needs to be removed, because it would give a misleading 14C age that was “younger” than the true age of the sample. Later testing showed that not all castor oil will be removed even by the standard AAA (Acid-Alkali-Acid) protocol, let alone by the reduced form of the standard protocol used in the 1990s in Zurich and Tucson rasmussen2001effects ; rasmussen2003reply ; rasmussen2009effects .

Before the actual start of the project, we received 2 test samples from the IAA which were relatively large (tens of milligrams). These were materials without context but of scrolls origin according to the IAA. Both samples were subjected to the standard AAA treatment, but the material immediately started dissolving before our eyes during the first acid step. This meant we could not apply the standard method; also considering the test samples were much larger than the identified manuscript samples we were to receive.

In our project the first step was to pre-clean the samples by a liquid extraction with suitable solvents, performed inside a Soxhlet apparatus to remove the castor oil contamination. The Soxhlet treatment was carried out in Odense, initially in three but subsequently in four extraction steps. The latter was done for redundancy, and not because of suspicion that the three-step procedure was not sufficient within the given dating uncertainty. The fourth step furthermore guaranteed that even more potential contaminants were removed. Castor oil is a plant product, which consists of several triglycerides and free fatty acids. The Soxhlet treatment is designed to remove lipid material to a high extent, and the analyses done in Pisa by HPLC-MS and Py-GC/MS are performed to demonstrate that the amount of the remaining lipid material, including fatty acids, is below a threshold which does not significantly skew the radiocarbon date (see section B.7).

The scrolls are extremely delicate material. Their fragility is an issue for the chemical cleaning of samples for radiocarbon dating. In the 1990s, the Zurich and Tucson laboratories had to adjust or stop the AAA pretreatment because the samples were dissolving Bonani1992 ; Jull1995 . Most of these samples were much larger than the samples in the current study. With much smaller sample materials, we also had to adjust the standard chemical pretreatment.

Following the Soxhlet treatment in Odense, the samples were further prepared for dating in Groningen. The pretreatment was adapted to Acid only in a “soft” form: 0.5–1% HCl, refrigerator temperature (ca. 4°C) and only for 10 minutes. Next, we dried the sample in an oven at a temperature of 80°C overnight. Using diluted HCl and skipping the Alkali step is necessary because of the delicate nature of the samples. This is justified because of the conditions the scrolls were kept in. No significant amounts of foreign materials that could cause errors larger than the measurement uncertainties were observed. Our procedure is proven correct because the sample with a known historical date (Mur19) was 14C dated correctly. Combined with the Soxhlet treatment, this is the optimum treatment for this delicate material, and generally effective.

The scrolls were stored in caves in the Judaean desert in the absence of humic acids and constant groundwater. In particular the humic acids constitute a problem for many other archaeological excavations worldwide and they are the main reason that necessitates the alkaline bath in the standard pretreatment protocol (the second A in AAA). The environment in the caves can be characterized as limestone, gypsum and marls — none of which has the potential to inflict alkali-soluble compounds onto the parchments. Similarly, bat guano and excretions from other small animals who have possibly found their way into the caves over the centuries are unlikely to contain humic acids, and therefore their deposits are likely to be dissolvable in either the more polar solvents of the Soxhlet treatment (i.e., the ethanol) or in the acidic bath of the pretreatment in the radiocarbon laboratory. And even further, the pyrolysis-gas-chromatography measurements did not reveal any compounds unaccounted for (see Section B.7.4); that includes the alkanes that can be considered markers for bat guano Queffelec2018 .

B.3 AMS Measurements

After cleaning, the samples were combusted into CO2 gas. For the GrA dates, the gas is subsequently reduced to graphite using H2. Subsequently, the 14C content was measured in this graphite. This method was also applied by the GrM machine for routine dating. However, this machine also has the option to measure the 14C content in CO2, skipping the graphite production step. This is very useful for small samples, as is the case for many scroll samples. Therefore, for scroll samples measured by the Micadas, the gas source was used. For more details on measurement procedures see Dee2019 .

For the 30 samples in this study, there is a grand total of 131 individual AMS runs. This total number includes duplicate samples and multiple runs. In most cases a solid date can be calculated for the separate runs done for a particular scroll, based on averaging. The numbers reported reflect the measurements by AMS. In addition, there are aspects of sample integrity and pretreatment which are hard or even impossible to quantify. We have rejected 10 AMS runs for technical reasons, resulting in a final number of 121 accepted runs.

The 14C content in the sample is measured by AMS. The original AMS was a 2.5 MV Tandetron accelerator van2000status . It was decommissioned in 2017, and replaced by a Micadas system synal2007micadas . This took place during the project, so that both machines have been used to date the scroll samples. This allows for internal intercomparison (see Table LABEL:tab:summarized-c14). The Tandetron dates have laboratory code GrA; for the Micadas, this is GrM.

B.4 AMS Dating Results

Radiocarbon dates are reported by convention in BP, using a defined halflife and reference radioactivity for 14C, and a correction for isotopic fractionation using the stable isotope 13C mook1999reporting . The BP dates are converted to calendar dates, using the IntCal20 calibration curve (reimer2020intcal20 ) and OxCal software Oxcal . The calibration results in a non-Gaussian probability distribution of calendar dates. This distribution is given in 1σ𝜎\sigmaitalic_σ (68.3% confidence) and 2σ𝜎\sigmaitalic_σ (95.4% confidence) date ranges.

For the 30 samples, 27 yielded accepted dates; only 3 samples yielded inconsistent results and had to be technically rejected (4Q216, 11Q20, and Mur88; see Section B.6). Also, it appeared that the sample received for 4Q185 could not be ascertained as belonging to that particular manuscript. This sample is not used in our analysis (see Section B.5).

The resulting 14C dates for the 26 samples are shown in Table LABEL:tab:summarized-c14. Each individual 14C sample receives a unique laboratory number. As the table shows, each scroll is dated at least twice. In addition, many measurement batches were repeated (thus yielding two dates per graphite sample). The resulting 14C age shown is the averaged number for all accepted runs. Overall, the logistics is complex. For example, the sample 4Q114 (4QDanielc) has been dated in 7 runs. Two samples were received from the IAA. Graphite was prepared from all material of the first sample, and it was dated by the GrA machine. There were 3 runs from the same graphite (to increase the 14C statistics), so all have the same GrA number; the 3 runs are triplicates and can be taken together as 1 GrA date. An additional second sample was received later. From this sample we dated 4 subsamples in 4 runs by the GrM machine. Hence there are 4 GrM numbers.

The resulting BP dates are very precise, with 1σ𝜎\sigmaitalic_σ uncertainties of only 15–28 years. For the full results of all runs with more details (in particular Carbon yield and δ𝛿\deltaitalic_δ13C value), see Appendix K.

Table LABEL:tab:summarized-c14 shows the summarized results of 26 accepted 14C dates: laboratory code, sample identification, 14C age (BP), its sigma (BP), and calibrated dates (both 1σ𝜎\sigmaitalic_σ and 2σ𝜎\sigmaitalic_σ ranges). The OxCal plots can be seen in Appendix C.

Although the most recent calibration curve, IntCal20, has a resolution of 1 calendar year that does not mean 1-year resolution is significant. The measurement precision for the 14C dates is, at best, 15 14C years, and often a few decades. Moreover, OxCal can be calculated for 1 year, but the default resolution of OxCal is 5 years without any interpolation. However, if the resolution is set to less than 5 years, the curve will be interpolated by a cubic function. A cubic function is a polynomial function of degree 3, which, in the case of OxCal, performs interpolation of two different data points to obtain intermediate points. This is a mathematical formulation and not a calibration of 1-year resolution. Hence, we do not take a 1-year interpolated resolution but present the raw 5-year resolution data from OxCal. For more details, we refer to https://c14.arch.ox.ac.uk/oxcalhelp/hlp_analysis_inform.html.

Furthermore, for the time range relevant for the scrolls our calibrated results are often bimodal, especially for 2σ𝜎\sigmaitalic_σ distributions which we use for our further analyses for firmer grounding of our date-prediction model. The calibrated results from the 1990s were also often bimodal bonani1991radiocarbon ; Bonani1992 ; Jull1995 . This bimodality is an effect of the calibration curve not being linear, showing peaks and other irregularities caused by variations in the cosmic ray flux which produces 14C in the earth’s atmosphere vanderPlicht2022 .

Table LABEL:tab:result-4q185 shows the valid and acceptable radiocarbon date of the sample received for 4Q185 but the date cannot be used (see Section B.5).

Samples of the 3 scrolls 4Q216, 11Q20 and Mur88 did not produce acceptable 14C dates; these are summarized in Table 4 (see Section B.6).

Table 3: Summarized results of 26 accepted 14C dates: laboratory code, sample identification, 14C age (BP), sigma (BP), calibrated ranges (1σ𝜎\sigmaitalic_σ and 2σ𝜎\sigmaitalic_σ ranges) in 5-year resolution.
lab code scroll age (BP) σ𝜎\sigmaitalic_σ calibrated ranges (1σ𝜎\sigmaitalic_σ) calibrated ranges (2σ𝜎\sigmaitalic_σ)
GrA-68446 P421-Fr004 2164 16 345–320, 205–170 BCE 355–285, 230–150 BCE
GrA-68447
4Q504
(4QDibHama)
GrA-69793 P206-Fr003 2303 26 405–365 BCE 410–355, 285–230 BCE
GrM-10677 4Q52 (4QSamb)
GrM-10678
GrA-69794 P285-Fr002 2153 19 345–320, 205–165 BCE 355–300, 210–100, 70–60 BCE
GrM-10679 4Q176 (4QTanh)
GrM-10680
GrA-69795 P224-Fr001 2168 15 345–315, 205–175 BCE 355–285, 230–160 BCE
GrM-13252 4Q114 (4QDanc)
GrM-13253
GrM-13254
GrM-13255
GrM-10659 P891-Fr003 1940 28 25–45, 55–125 CE 10–205 CE
GrM-10660 5/6Hev1b (Ps)
GrA-69810 P585-Fr001 2028 18 45 BCE–10 CE 90–80 BCE, 55 BCE–30 CE, 45–60 CE
GrM-10661 4Q161 (4QpIsaa)
GrM-10662
GrM-11151 P1111-Fr010 2226 17 365–350, 295–205 BCE 375–345, 320–200 BCE
GrM-11152 4Q70 (4QJera)111This fragment was previously unidentified, but see now for a positive identification Tigchelaar2020B .
GrM-11170
GrM-11171
GrM-11153 P1093-Fr005 2155 19 345–320, 200–165 BCE 355–290, 210–100 BCE
GrM-11154 4Q47 (4QJosha)
GrM-11172
GrM-11155 P271-Fr002 2152 24 350–315, 205–150, 130–120 BCE 355–285, 230–220, 210–95, 75–55 BCE
GrM-11156 4Q23 (4QLevNuma)
GrM-11166 P177-Fr001 2100 17 155–90, 75–55 BCE 170–50 BCE
GrM-11167
4Q255/4Q433a
(4QpapSa/4Qpap
Hodayot-like Text B)
GrM-11184
GrM-11185
GrM-11168 P977-Fr004 1967 18 20–80, 100–110 CE 35–15 BCE, 5–120 CE
GrM-11169 11Q5 (11QPsa)
GrM-11186
GrM-11187
GrM-14380 P393-Fr005 2123 21 175–100, 70–60 BCE 340–325, 200–50 BCE
GrM-14381 4Q3 (4QGenc)
GrM-14228
GrM-14229
GrM-13385 P1081a-Fr002 2115 26 175–95, 75–55 BCE 340–330, 200–50 BCE
GrM-13386 4Q27 (4QNumb)
GrM-13387 Px232-Fr001 2007 18 45 BCE–25 CE 50 BCE–65 CE
GrM-13388 Mas1k (MasShirShabb)
GrM-14175
GrM-14223
GrM-14382 P386-Fr001 2169 21 350–310, 210–170 BCE 360–280, 235–145, 135–120 BCE
GrM-14383 4Q206 (4QEne ar)
GrM-14230
GrM-14241
GrM-14565 P237-Fr007 2182 18 355–290 210–175 BCE 360–275, 260–245, 235–165 BCE
GrM-14566
4Q30 (4QDeutc)
GrM-14395
GrM-14242
GrM-14243
GrM-13389 P904-Fr009 2077 18 110–45 BCE 165–40, 10–1 BCE
GrM-13390
4Q201/4Q338
(4QEna ar/
4QGenealogical List)
GrM-14173
GrM-14174
GrM-14396 P810-Fr011 2148 19 345–320, 205–150 BCE 350–310, 210–100, 70–55 BCE
GrM-14397 4Q259 (4QSe)
GrM-14244
GrM-14245
GrM-14398 P180-Fr004 2130 22 200–100 BCE 345–320, 205–90, 80–50 BCE
GrM-14399
4Q416
(4QInstructionb)
GrM-14246
GrM-14359
GrM-14400 P215-Fr004 2059 20 100–70, 60–35, 15 BCE–5 CE 155–130 BCE, 125 BCE–10 CE
GrM-14401 4Q2 (4QGenb)
GrM-14360
GrM-14361
GrM-14567 P122A-Fr001 2126 23 195–185, 180–100, 70–60 BCE 345–320, 205–50 BCE
GrM-14568
4Q375
(4QapocrMosesa)
GrM-14362
GrM-14363
GrM-13391 P534-Fr002 1998 20 40–10 BCE, 1–30, 40–60 CE 45 BCE–75 CE
GrM-13392
XHev/Se2
(XHev/Se Numa)
GrM-14224
GrM-14225
GrM-14569 P147-Fr019 2148 22 345–320, 205–150 BCE 355–300, 210–95, 75–55 BCE
GrM-14570
4Q541
(4QapocrLevib)
GrM-14364
GrM-14365
GrM-14571 P330-Fr004 2159 22 350–315, 205–165 BCE 355–285, 230–100 BCE
GrM-14572
4Q521
(4QMessianic
Apocalypse)
GrM-14377
GrM-14366
GrM-13393 P107-Fr010 2151 21 345–315, 205–155 BCE 355–290, 210–95, 70–55 BCE
GrM-13394
4Q267
(4QDamascusb)
GrM-14226
GrM-14227
GrM-14573 P879-Fr001 1987 21 35–15 BCE, 5–65 CE 45 BCE–85 CE, 95–110 CE
GrM-14574
Mur19 pap WrDiv
GrM-14378
GrM-14379

As was done in the 1990s Bonani1992 ; Jull1995 , we also tested our procedure by dating a date-bearing manuscript, Mur19. The text of Mur19 refers to “year 6 of Masada”, which is now understood as a reference from the first Jewish revolt against Rome to 71/72 CE Benoit1961 ; koffmahn1963dating ; yadin1965excavation ; goodblatt1999dating ; eshel2003documents ; Eshel2005 ; Wise2015 . The 2σ𝜎\sigmaitalic_σ calibrated range is 45 BCE–85 CE (91.5%), 95–110 CE (3.9%). The 14C date is clearly consistent with the historical date, 71/72 CE.

Refer to caption
Figure 5: OxCal plot for Mur19 with red vertical line indicating the calendar date 71/72 CE.

B.5 Result not to be used for palaeography: 4Q185

From a radiocarbon point of view, the dating of the sample is a valid and acceptable result. However, because the sample fragment cannot be attributed to a larger manuscript, the date cannot be used for our palaeographic analysis.

For 4Q185 (4QSapiential Work), we had requested Plate 801 fragment 1. Because Plate 801 fragment 1 was sewn and encapsulated for exhibition, the IAA sent sample Plate 801 fragment 3 instead. Unfortunately, it is very uncertain that this sampled fragment is part of manuscript 4Q185. From a palaeographic perspective, identification with 4Q185 is doubtful. E.g., the letter ayin is different from other occurrences in the manuscript (see also Pajunen2011 ). For that reason, the measurement results cannot be used for our palaeographic purposes.

B.6 Technically rejected results: 4Q216, 11Q20, and Mur88

The various AMS runs for scrolls 4Q216, Mur88, and 11Q20 resulted in internally inconsistent results. No valid 14C date could be deduced. Therefore, the results are rejected for technical reasons.

For all three scrolls, different samples were received from the IAA in subsequent batches. The first samples were measured by the GrA machine, the subsequent samples were later during the project measured by the GrM machine.

For 4Q216 (4QJuba), the first sample was measured for graphite (GrA-69799). For the second sample, two gas samples were measured (GrM-10675, 10676). The GrA and GrM measurements do not provide mutually consistent dates. In other words, both samples received from the IAA do not give consistent results. In addition, the measurements yield 14C dates which are impossibly old. We conclude that the sample material may not be homogeneous.

For 11Q20 (11QTempleb), the first sample was measured for graphite in triplicate (GrA-69800). For the second sample, two different parts of the scroll sample were taken, and two gas samples measured for each (GrM-10681, 10682, 18827, 18828). The 3 GrA measurements are internally consistent, the same for the 4 GrM results. However, GrA and GrM do not provide mutually consistent dates. Also here, both samples received from the IAA do not give consistent results. We conclude that the sample material may not be homogeneous.

For Mur88 (MurXII), the first sample was measured for graphite in triplicate (GrA-69806). For the second sample, two different parts of the scroll sample were taken, and two gas samples measured for each of them (GrM-10663, 10664, 18829, 18830). The resulting GrA and GrM measurements yield three different 14C dates. Also here, the sample material may not be homogeneous.

For the full results of these runs with more details (in particular Carbon yield and δ𝛿\deltaitalic_δ13C value) see Appendix K.

Table 4: Technically rejected results: 4Q216, 11Q20, and Mur88, laboratory code, sample identification, 14C age (BP), sigma (BP)
lab code scroll age (BP) σ𝜎\sigmaitalic_σ
GrA-69799
P385–Fr011
4Q216 (4QJuba)
2342 51
GrM-10675
GrM-10676
P385–Fr011
4Q216 (4QJuba)
2979 32
GrA-69800
P577-Fr014
11Q20 (11QTempleb)
2027 24
GrM-10681
GrM-10682
P577-Fr014
11Q20 (11QTempleb)
2183 32
GrM-18827
GrM-18828
P577-Fr014
11Q20 (11QTempleb)
2202 26
GrA-69806
P64-Fr001
Mur88 (MurXII)
1950 18
GrM-10663
GrM-10664
P64-Fr001
Mur88 (MurXII)
1951 30
GrM-18829
GrM-18830
P64-Fr001
Mur88 (MurXII)
2053 25

B.7 Analytical Chemistry

B.7.1 Soxhlet extraction

Upon arrival of the samples in Odense, they were photographed, if this was not already done in Groningen. Detailing what was said in section B.2, the chemical cleaning procedure developed to remove later added contamination such as, e.g., castor oil, was the following. Three Soxhlet apparatuses were operated in parallel, with three samples mounted simultaneously one in each chamber. The Soxhlet apparatuses had different volumes: the first one operated with 100 mL of solvent, the second with 70 mL and the third with 50 mL of solvent. All solvents were of the highest quality available (LC-grade for Liquid Chromatography).

The cleaning procedure was initiated by running the whole set of solvents with no sample mounted, intended to clean the apparatus, the stainless-steel cage and glass utensils. Then a sample was placed in the stainless-steel cage mounted in a Soxhlet apparatus chamber. The first solvent was added to the lower flask. The first solvent was LC-grade ethanol LiChrosolv (1.11727.2500 from Merck). This was operated for one hour corresponding to ca. 50 turnovers of the solvent over the sample. The second solvent was LC-grade n-hexane LiChrosolv (1.03701.2500 from Merck), which was operated for four hours, corresponding to ca. 240 turnovers of the solvent over the sample. The third solvent applied was LC-grade ethanol LiChrosolv (1.11727.2500 from Merck), operated for one hour, corresponding to ca. 50 turnovers. After each step in the cleaning procedure samples of 8 mL of each of the solvent were transferred to pre-cleaned glass vials. That is, three samples of 8 mL of ethanol, hexane, and ethanol were procured after each step in the cleaning procedure. They were placed in a heating apparatus operating at 80°C, which evaporated the solvents in the glass vials to dryness, after which the glass vials were sealed with a lid. The condensate was later to be re-dissolved and analyzed by HPLC-MS in Pisa (see section 7.2). After cleaning, the samples were removed from the stainless-steel cages and brought to dryness for one night at 60°C at zero humidity in a Memmert HCP 108 Climate chamber. Following this, the samples were weighed, packed, and shipped to Groningen, there to undergo pretreatment and dating following 14C protocols.

This three-step Soxhlet protocol, which was developed by rasmussen2009effects , was applied to the first batch of 10 samples (4Q52, 4Q114, 4Q161, 4Q176, 4Q185, 4Q216, 4Q504, 11Q20, Mur88, 5/6Hev1b) which were analyzed in the project. Following the chromatographic-mass spectrometric analyses in Pisa of this first set of solvents, it was decided that a fourth cleaning step should be added to the procedure for the remaining 20 samples. This was done for redundancy, and not because of proof or suspicion that the three-step procedure was not sufficient within the given dating uncertainty. The fourth step was added to further ensure that castor oil and many other contaminants were removed even in the worst case scenario. The fourth Soxhlet step was performed using a 30:70 mixture of dichloromethane:hexane, both of LC-grade purity (dicholoromethane CHROMASOLV 34856 by Sigma-Aldrich, and n-hexane as described above), operated for one hour, corresponding to ca. 60 turnovers of the solvent over the sample.

B.7.2 Raman spectroscopy, optical microscopy, Py-GC/MS, and HPLC-MS analysis

The study of the materials constituting the scrolls was performed in Pisa using a multi-analytical approach based on chromatographic and spectroscopic analytical techniques. The use of these complementary approaches allowed us to characterize both the original materials of the parchments and to evaluate the possible occurrence of modern materials used for consolidating/restoring the scrolls. These results were used to define the best cleaning strategy to remove from the scrolls the modern materials that could affect the dating, and to evaluate the efficiency of the purification steps. In detail:

  • Raman spectroscopy and optical microscopy (OM) were used as non-invasive and non-destructive methods to evaluate the general appearance of the parchments and to characterize the possible occurrence of inorganic materials.

  • Analytical pyrolysis coupled with gas chromatography and mass spectrometry (Py-GC/MS) analyses were performed on small (ca. 100 µg) sub-samples of the samples before these went into cleaning treatment by Soxhlet and AAA to characterize the organic material constituting the scrolls and to evaluate the possible presence of modern synthetic materials used as consolidating materials. This technique represents one of the best methods to obtain a complete picture of the organic materials in a sample degano2018recent . Pyrolysis consists of a thermal decomposition of organic materials in absence of oxygen. This process leads to the formation of low molecular weight species that can be separated by gas chromatography and identified by mass spectrometry. This analytical approach allows to obtain specific molecular markers that can be used to identify the source of organic materials.

  • Liquid chromatography coupled with mass spectrometry (HPLC-MS) was applied to evaluate the content of lipid materials present in Soxhlet extracts from parchments during the cleaning steps. This is among the best approaches for the separation and characterization of complex mixtures of lipid materials, such as castor oil. The use of mass spectrometry as detection system allows to obtain information on the glyceride chemical structure la2021liquid . This information cannot be achieved using more conventional analytical approaches such as GC/MS. Moreover, this method allows to detect very low amounts of analytes.

B.7.3 Results of the optical microscopy and Raman spectroscopy analyses performed on 17 samples

The microscopy observations and micro-Raman analyses were performed on samples 4Q2, 4Q3, 4Q27, 4Q30, 4Q114, 4Q201/4Q338, 4Q206, 4Q216, 4Q259, 4Q267, 4Q375, 4Q416, 4Q521, 4Q541, Mas1k, Mur19, XHev/Se2. All these samples were characterized by similar appearance, except for sample Mur19 that showed a different morphology, suggesting the use of a different material as a writing support.

Several samples featured microscopic black spots with diameters in the range of 10-200 µm, except for sample 4Q114 that was characterized by one black spot of approximately 600 µm. Raman spectroscopy was applied to investigate the chemical composition of the spots. For several samples, the Raman spectra featured the typical peaks at 1350 and 1580 cm-1 corresponding to the Raman wavenumbers typical of C-C of amorphous carbon (signals not detected in the background). For example, Figure 6 reports the spectrum obtained from one spot on the sample 4Q216, and Table LABEL:tab:B7XX presents the OM photographs along with a description of the observed surface and summarizes the relevant information obtained by Raman spectroscopy.

The biggest black spot from the scroll 4Q114 was sampled separately, and radiocarbon dated to 2390±60 BP (GrM-13256). The size of the black spot was ca. 600 µm in diameter, with an observed thickness of ca. 50 µm, translating into a calculated mass of ca. 14 µg. Thus, with a sample mass of 6.2 mg for the sample radiocarbon dated for 4Q114, the contamination mass fraction from the black spot would be ca. 0.2% and thus, the effect of such contamination is negligible, whatever its age.

Refer to caption
Figure 6: Raman spectrum obtained for one spot in sample 4Q216
Table 5: Optical microscope pictures and observations
Sample MO side A MO side B Observation
4Q3 [Uncaptioned image] [Uncaptioned image] Only one detectable black spot. Carbon was identified by Raman analysis.
4Q27 [Uncaptioned image] [Uncaptioned image] Only one detectable black spot. Carbon was identified by Raman analysis.
Mas1k [Uncaptioned image] [Uncaptioned image] The surface of the sample was characterized by high fluorescence and few dark spots. One spot was identified as carbon by Raman analysis.
4Q206 [Uncaptioned image] [Uncaptioned image] The surface was characterized by the presence of few black spots and few red spots: Raman analysis revealed the presence of carbon and hematite.
4Q30 [Uncaptioned image] [Uncaptioned image] The sample was characterized by the presence of few black spots with a size larger than 10 µm. Carbon was identified.
4Q201/
4Q338
[Uncaptioned image] [Uncaptioned image] The sample was characterized by several black spots. Several red spots were also detected as for sample 386. Due to the high fluorescence of the writing support, Raman spectra evidenced only the presence of carbon.
4Q259 [Uncaptioned image] [Uncaptioned image] Almost clean, no significant spots were detected.
4Q416 [Uncaptioned image] [Uncaptioned image] The sample was characterized by a rough surface scattering the laser Raman light. The MO observation did not show any significant presence of dark spots.
4Q2 [Uncaptioned image] [Uncaptioned image] The sample was fully covered by a material that did not allow to perform a proper Raman analysis.
4Q375 [Uncaptioned image] [Uncaptioned image] The sample was characterized by few black spots with diameter wider than 20 µm. Carbon was identified.
XHev/
Se2
[Uncaptioned image] [Uncaptioned image] The sample was almost clean. Only two 10 µm in diameter black spots were detected. Carbon was identified.
4Q541 [Uncaptioned image] [Uncaptioned image] The sample was characterized by several black spots. Carbon was identified.
4Q521 [Uncaptioned image] [Uncaptioned image] Only one big black spot (100-150 µm) was detected on the surface of the bigger fragment. Carbon was identified.
4Q267 [Uncaptioned image] [Uncaptioned image] The sample was almost clean from dark spots. The size of the identified spots was too small to be investigated by Raman.
Mur19 [Uncaptioned image] [Uncaptioned image] The sample was characterized by several black spots and an organic protective. Carbon was identified.
4Q216 [Uncaptioned image] [Uncaptioned image] The sample was characterized by black spots. Carbon was identified.
4Q114 [Uncaptioned image] [Uncaptioned image] The sample was characterized by black spots. Carbon was identified.

B.7.4 Results of the Py-GC/MS analysis performed on 17 samples

Py-GC/MS was used in order to evaluate the possible presence of synthetic materials used as consolidating materials on the scrolls and to characterize the original parchment material: the 17 samples (4Q2, 4Q3, 4Q27, 4Q30, 4Q114, 4Q201/4Q338, 4Q206, 4Q216, 4Q259, 4Q267, 4Q375, 4Q416, 4Q521, 4Q541, Mas1k, Mur19, XHev/Se2) were directly analyzed without any prior sample pretreatment using a multi-shot pyrolyzer EGA/PY-3030D (Frontier Lab, Japan) coupled with a 6890 N gas chromatography system with a split/splitless injection port, and with a 5973 mass selective single quadrupole mass spectrometer (Agilent Technologies). The complete instrumental conditions are reported in la2019synthetic .

The pyrolytic profile of all these 17 samples featured molecular markers that can be related to the pyrolysis of animal hide or scroll (pyrrole and diketopiperazines), except for sample Mur19 that was instead characterized by the presence of anhydro sugars and levoglucosan, typical of a cellulose-based material colombini2009organic . This is consistent with the observation that Mur19 is a papyrus fragment. Figure 7 reports the chromatogram obtained for the sample from the parchment of 4Q521.

Samples 4Q3 and 4Q206 showed the presence of the markers of polyethylene glycol. The pyrograms of samples 4Q3 and 4Q30 also contain the peaks due to hexadecanonitrile and octadecanonitrile, which are the Py-GC/MS markers characteristic for egg. Samples 4Q521 and Mur19 were characterized by the presence of an acrylic resin. Finally, samples 4Q2, 4Q267, 4Q541, and Mur19 showed the presence of retene: this molecule is a marker characteristic of the combustion of resinous wood and can be indicative of the exposure of the scrolls to a fire in the space where writing took place or could be due to residues related to the illumination with torches. Table 6 summarizes the materials detected in the different parchment samples.

Pyrolysis allowed us to pinpoint the presence of exogenous materials, as consolidation synthetic materials (acrylic resin), or lipids. After disclosing the nature of the contamination, we were able to design the proper cleaning procedures to remove any unwanted consolidant.

The use of a further cleaning step using dichloromethane ensured the total removal of all the synthetic materials, as proven by pyrolysis analyses performed on a subsection of the samples after cleaning and prior to 14C dating.

Refer to caption
Figure 7: Py-GC-MS chromatogram of sample 4Q521: parchment sample with methyl-methacrylate, an acrylic resin.
Table 6: Summary of the materials detected in the different parchment samples.
Samples Identified organic materials
4Q216 proteinaceous material, lipid material
4Q3 proteinaceous material, lipid material, egg
4Q27 proteinaceous material, lipid material
Mas1k proteinaceous material, lipid material
4Q206 proteinaceous material, lipid material
4Q30 proteinaceous material, lipid material, egg
4Q201/4Q338 proteinaceous material, lipid material
4Q259 proteinaceous material, lipid material
4Q416 proteinaceous material, lipid material
4Q2 proteinaceous material, lipid material, retene
4Q375 proteinaceous material, lipid material
XHev/Se2 proteinaceous material, lipid material
4Q541 proteinaceous material, lipid material, retene
4Q521 proteinaceous material, lipid material, acrylic resin
4Q267 proteinaceous material, lipid material, retene
Mur19 lignocellulose material, acrylic resin, retene
4Q114 proteinaceous material, lipid material

B.7.5 Liquid chromatography-mass spectrometry results of the analysis of residual lipids in the extracts from the 30 samples after cleaning

HPLC-MS was applied to evaluate the presence of lipid materials. The dried extracts were reconstituted in 150 µL of iso-propanol/methanol, 10:90, filtered (PTFE syringe, 0.45 µm pore size) and analyzed. HPLC-ESI-Q-ToF analyses were carried out using a 1200 Infinity HPLC, coupled with a Quadrupole-Time of Flight tandem mass spectrometer 6530 Infinity Q-ToF detector by a Jet Stream ESI interface (Agilent Technologies, USA). The complete instrumental conditions are reported in la2013core .

The analyses were performed on the extracts from the two different sample pretreatments by Soxhlet, i.e., the three-step and the four-step extraction.

The comparison of the results obtained on the extracts with reference blanks allowed us to highlight the effective performances of the cleaning procedures, showing that the glyceride content after the last step was below 7.0 micrograms for both the approaches. The cleaning procedure proved to be effective for removing the lipid materials from the scroll samples, since all the solutions obtained after the last extraction step were characterized by the presence of triglycerides and fatty acids at or below blank level. Figure 8 shows a comparison of all final cleaning steps with the respective blanks for both the fatty acids and the triacylglycerols.

Refer to caption
Figure 8: Top (a): Comparison of the free fatty acid concentrations between the blank samples and the final cleaning steps; Bottom (b): comparison of the abundances of TAGs (triacylglycerols) in the last cleaning step with those found in the blank samples (AU: arbitrary unit).

In particular, the worst case encountered in the entire data set was 7.0 µg of acylglycerols detected in the fourth cleaning step of 4Q3. These triglycerides can originate from the original parchment, or they can originate from later contamination such as castor oil. There is no way to determine the origin; it can also be a mixture of ancient and recent materials. If we, as a worst-case scenario, assume that all the triglycerides detected in 4Q3 were modern contamination, then it would skew a 2000-years old parchment sample with only 12.6 years.

As stipulated, this is a worst-case scenario depending on all triglycerides to be modern, which is an unlikely assumption because triglycerides are a normal ingredient of animal skin ghioni2005evidence . Furthermore, all other samples are well below the 7.0 µg level.

C

Appendix C OxCal plots: 14C determinations and calibrated date plots

Here, we present the OxCal plots for the 26 accepted samples. No plots were produced for the 3 technically rejected samples (see Section B.6), nor for the 1 sample of which the identity could not be ascertained (see Section B.5).

[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]

Appendix D Palaeography and radiocarbon dating of the Dead Sea Scrolls

D.1 Comparing radiocarbon results and palaeographic estimates

We make the comparison between the radiocarbon dates (Table LABEL:tab:summarized-c14 in Appendix B) and previous palaeographic estimates on the basis of the estimates given in the official publication series, Discoveries in the Judaean Desert (DJD), as these are considered the standard in the field, but sometimes we include references to estimates of other scholars when relevant.

However, we also critically assess previous palaeographic estimates. We do that on two levels. First, we reason according to the relative typology of the so-called Cross model and assess its application to individual manuscripts. This leads occasionally to palaeographic assessments that correct previous ones. Second, we desist from translating a relative typology to an absolute chronology. Because of the lack of date-bearing documents for the time-period one cannot impose the traditional framework’s unsubstantiated chronological limitations on when the so-called Hasmonaean and Herodian script features would have started to develop (see Section A.2.2). This also applies to chronological distinctions within the general indications of Hasmonaean-type and Herodian-type scripts. Cross suggested chronological ranges of 50 years, and sometimes even shorter ranges of 25–50 years, as he assumed a rapid development of the script from the Hasmonaean period onward, contrary to the presumed slow development in the third century BCE. However, this assumption of a rapid evolution remains unsubstantiated, too (see below). As more reliable time markers, our study’s 14C calibrated ranges demonstrate older date ranges than previously thought for individual manuscripts as well as for the beginnings of the Hasmonaean/Herodian scripts.

We can compare the radiocarbon dates with previous palaeographic estimates only in a general sense, not as a rigid application of these estimates. The early 1990s guideline that editors of manuscripts in the DJD series would date according to the typological specimens of Cross’s 1961 article has proved unfortunate. One problem is that many of the palaeographic estimates offered in the DJD series since the 1990s suffer from an insufficient understanding of Cross’s model, producing unreliable estimates Tigchelaar2020 . This unreliability is further exacerbated by the problems within Cross’s palaeographic model that conflates supposed historical and political developments with palaeographic style developments.

Cross Cross2003 presented few specimens for other scholars to work with. This makes it difficult to substantiate style developments within, for example, Hasmonaean formal script, and to account for the complexity of script in individual manuscripts. Moreover, Cross also suggested mutual influences between formal, semiformal, and semicursive in such a way that sometimes a typological development of an individual letter is thought to have occurred earlier in, e.g., semicursive than in formal script (e.g., for samek).

For the Hasmonaean formal script Cross Cross2003 ; Cross1998 singled out only three manuscripts, assuming absolute dates between ca. 175–30 BCE: 4Q28, 4Q30, and 4Q51 (for a number of individual letters, Cross also referred to 1QIsaa and 4Q1, as respectively middle and early Hasmonaean formal, as well as to 4Q109 and 4Q504 as early Hasmonaean semiformal). Thus, 4Q30 is said to be “a typical Hasmonaean” script, without explanation why that is so, from the middle of the period, 125–100 BCE (Cross might have used 1QIsaa instead but because he deemed it to have more idiosyncratic forms he gave preference to 4Q30 which he understood to have been copied by a more conventional scribe; no further substantiation is provided for these claims). The other two manuscripts are at the outer ends of the Hasmonaean formal script spectrum, apparently for having script style elements in common with earlier and later periods. So 4Q28 is presented as transitional between Archaic and the beginning of the Hasmonaean development (175–150 BCE) and 4Q51 as a late transitional script from the end of the Hasmonaean period or the beginning of the Herodian period (50–25 BCE).

For the Herodian formal script Cross singled out seven manuscripts, assuming absolute dates between 30 BCE and 70 CE: 1QM, 4Q27, 4Q37, 4Q85, 4Q113, 5/6Hev1b, and Mur24. In fact, only four manuscripts are singled out for the Herodian formal script. 1QM is presented as “a typical early Herodian formal script” (ca. 30–1 BCE), while 4Q113 would represent “a developed Herodian formal script” (20–50 CE) and 4Q37 and 4Q85 late Herodian formal scripts from respectively ca. 50 CE and ca. 50–68 CE. 4Q27 is said to be “a typical exemplar of the extremely popular Round semiformal style” (also called rustic, and considered distinct from the Vulgar semiformal) from the early Herodian period, ca. 30 BCE–20 CE. The final two manuscripts are actually considered so-called post-Herodian, assuming absolute dates between 70–135 CE: 5/6Hev1b was estimated by Cross from 75–100 CE (Flint suggested 50–68 CE DJD38 ) and Mur24 is a date-bearing document from 133 CE. Cross saw in some Herodian scripts the types of individual letters mixed so that semiformal can “invade” formal or Vulgar semiformal “makes its way into the formal character”, e.g., mem,

Apart from evaluating individual letters, there is no method in the field for dating an entire manuscript on the basis of mixed evidence of ‘older’ and ‘later’ forms of individual letters. Perhaps some scholars apply a form of quantification, weighing the instances of ‘older’ and ‘later’ forms, but this is never explicated. Rather, the assumption generally seems to be that ‘later’ forms cannot have developed earlier but ‘older’ forms can still have been in use at a later time, whether or not as a case of ‘archaizing’. While it may certainly be true that ‘older’ forms can have been in use for a long time, the claim that ‘later’ forms cannot have developed earlier remains unproven for lack of dated evidence. This means that what are perceived as, e.g., late Hasmonaean or early Herodian letter forms may have developed earlier than currently thought.

Even if one adopts Cross’s typological development, the issue of the absolute dating or calibration of the types remains Tigchelaar2020 . A mixture of ‘older’ (more ancient) and ‘later’ (more developed) forms can appear in one and the same manuscript. A focus on individual letters alone cannot be indicative for earlier or later chronology, whether relative or absolute. The study of individual manuscripts demonstrates a more complex development (see, e.g., for 4Q1 Tigchelaar2023 ). There are examples of experienced palaeographers coming up with widely diverging dates for the same scrolls. Thus, a range of individual manuscripts cannot be fitted precisely in a sequence on the basis of traditional palaeography.

The radiocarbon dates and the palaeographic estimates are two independent information sources about history, based on two different methodologies: one is a physically measured “yardstick of time”, the other is a cultural and qualitative assessment. At present, in the absence of an abundance of date-bearing manuscripts between the third century BCE and the first century CE, radiocarbon dates (14C) derived from manuscript samples are more reliable time markers. The palaeographic estimates do not provide absolute or fixed dates.

With these caveats in mind, Figure 1 in the main article shows the comparison between the (accepted) 2σ𝜎\sigmaitalic_σ calibrated ranges and previous palaeographic estimates (see the worksheet in Appendix L for the specific data and information). Additional plots can be found in Appendix H where Figure 29 and 30 presents the effect of including or excluding minor peaks to the 2σ𝜎\sigmaitalic_σ calibrated ranges and Figure 31 presents the outcome of selecting 1σ𝜎\sigmaitalic_σ calibrated range.

D.1.1 Whole or partial overlap

Comparing previously given palaeographic estimates and our 14C 2σ𝜎\sigmaitalic_σ results, shows that 17 of the 26 sampled manuscripts in our project have whole or partial overlap. This applies to: 4Q23, 4Q47, 4Q52, 4Q70, 4Q161, 4Q176, 4Q201/4Q338, 4Q255/4Q433a, 4Q259, 4Q504, 4Q521, 4Q541, 11Q5, Mas1k, Mur19, 5/6Hev1b, XHev/Se2.

4Q47 is a good example of how palaeographic estimates cannot be precise or clearly substantiated. Ulrich DJD14 reports that Cross had identified its script as Hasmonaean—thus dating it probably in the second half of the second century or the first half of the first century BCE—but refrained from offering a more precise estimate within the Hasmonaean period. Langlois Langlois2011 and Puech Puech2015 favoured the first half of the first century BCE. Langlois referred to some letters showing a typologically older form (bet, dalet, vav, khet, nun), while others would have a form more in line with those seen in late Hasmonaean or early Herodian periods (aleph, he, tet, samek, pe). However, considering, e.g., aleph one can see two forms, one of them being a typologically older form where the left leg often connects to the middle of the diagonal instead of more toward the top of it; the same for samek that appears in both closed (younger) and open (older) form. Also, ayin is often small, considered an older form, while yod shows the triangular head, seen as typical for the late Hasmonaean period. Instead of trying to fit this manuscript overall into a linear date estimate, the mixed typological evidence can be better explained as demonstrating overlapping or partly adjacent style developments.

In Section B.4 in Appendix B we noted that our calibrated results are often bimodal, especially for 2σ𝜎\sigmaitalic_σ distributions. 4Q47 is an example of such bimodal calibrated results, also for the 1σ𝜎\sigmaitalic_σ distribution. The 14C 2σ𝜎\sigmaitalic_σ calibrated range of 210–100 BCE (61.6% probability) overlaps with the broad palaeographic estimate ‘Hasmonaean’—but less with the more specific ones of Puech and Langlois—and also allows for dating the script style of 4Q47 to the first half of the second century BCE.

The older 2σ𝜎\sigmaitalic_σ calibrated range of 355–290 BCE (33.8%) is far removed on the timeline from previous palaeographic estimates. Although the older 2σ𝜎\sigmaitalic_σ peak represents a mathematically valid solution of the dating process, the younger calibrated peak must be preferred for 4Q47 over the older calibrated peak. Following the palaeographic principle to compare the script of an undated manuscript to that of dated writings with a similar script (see Section A.2.2 in Appendix A), it should be noted that 4Q47 does not compare to the extant typological evidence from date-bearing Aramaic manuscripts from the fourth century BCE. Typologically, the script of 4Q47 does not correspond to that of the script in date-bearing documents from the Persian period such as those from Bactria or from Wadi Daliyeh from the same region as the Dead Sea Scrolls. So, from a palaeographic perspective, 4Q47 is clearly younger than where the older calibrated peaks appear on the timeline.

Prior to the discovery of the Wadi Daliyeh documents, 4Q52 was argued by Cross to be the oldest manuscript among the Dead Sea Scrolls, and it certainly has the best cards for being the oldest biblical manuscript. In the official publication, Cross et al. estimated 4Q52 to ca. 250 BCE DJD17 .

The 14C evidence is bimodal for the 2σ𝜎\sigmaitalic_σ distribution. The younger peak of 285–230 BCE (16.6% probability) agrees well with the palaeographic estimate. The older calibrated range is 410–355 (78.9%). Although it cannot be ruled out completely from a palaeographic perspective, this older date seems typologically slightly too early for the script in 4Q52 in comparison to date-bearing documents from Elephantine from the late fifth century BCE and date-bearing documents from Bactria from 353 to 324 BCE, although it is difficult to factor in consequences of geographical variance for script variations. A date range in the second half of the fourth century BCE would seem more suitable for 4Q52. Following palaeographic principle, 4Q52 would have to be dated chronologically nearer to the Wadi Daliyeh manuscripts, especially WDSP 1 from 335 BCE (see Section A.2.2). But for that date range there is no 14C result.

Hence, from a palaeographic perspective a clear preference for one of the two peaks in the probability distribution cannot be substantiated. The 2σ𝜎\sigmaitalic_σ range of 410–355 is perhaps only a few decades too old and not one to two centuries as for most other bimodal results of our 14C measurements. So, in the case of 4Q52, the older peak cannot be rejected as a possible solution with as much confidence as for most other 14C samples with bimodal evidence.

4Q176 has two script styles (plausibly from two scribes): the script of fragment 1–2 i looks entirely different from 1-2 ii. The 14C sample in this study was taken from fragment 1-2 ii. Strugnell Strugnell1970 and Tigchelaar Tigchelaar2019 characterized its script as ‘middle Hasmonaean’, i.e., ca. 125–75 BCE. Strugnell’s palaeographic analysis can be easily misunderstood. He explains that many of the letter forms of the second script style seem to be Herodian, such as bet, tet, mem, and qoph. Yet, because the script is not formal but semiformal these forms must be dated to the middle Hasmonaean period. Fragment 1-2 ii shows less uniformity in size than fragment 1-2 i, e.g., kaph or medial mem. This can be understood as a typologically older feature where kaph and medial mem are still larger other than letters. The ideal of a base line seems not yet well developed. The three-stroke he and the small-sized ayin seem archaic. On the other hand, the bet has a broad base stroke and protrudes to the right, in formal script generally typologically connected to late Hasmonaean or early Herodian. But if the distinction between formal and semiformal cannot be clearly made, 4Q176 is another example of mixed evidence.

4Q176 is another example of bimodal calibrated results, having in addition also minor peaks of low probability. The 2σ𝜎\sigmaitalic_σ calibrated range of 210–100 BCE (64.2% probability) and the minor peak of 70–60 BCE (0.7%) are consistent with previous palaeographic assessments. The 2σ𝜎\sigmaitalic_σ calibrated range of 210–100 BCE also makes an older dating of the script style possible.

These assessments for 4Q47, 4Q52, and 4Q176 also apply to 4Q23, 4Q70, 4Q161, 4Q255/4Q433a, 4Q259, 4Q504, 4Q521, 4Q541, Mas1k, and XHev/Se2. Only in the case of 4Q201 and 11Q5 do the 14C results indicate a date range that goes in the direction of a younger possible date, whereas in almost all cases the direction is toward an older possible date range.

Regarding 4Q201, Milik’s edition Milik1976 suggested the first half of the second century BCE, and most scholars have accepted this estimate. He considered its script to be quite archaic and connected to the third and second-century BCE semicursive or semiformal scripts, perhaps more dependent on the Aramaic writing of northern Syria or Mesopotamia than on those of Judaea or Egypt. Similar comparisons with northern Syria have been made for 4Q17 and 4Q109, but concrete connections cannot be substantiated. Puech Puech2017 also saw the script as semiformal/semicursive, dating from ca. 200 BCE, while Langlois Langlois2011 gave an estimate of ca. 150 BCE.

4Q201 has a 2σ𝜎\sigmaitalic_σ calibrated range of 165–40 BCE (93.6%) and a minor peak of 10–1 BCE (1.9%). This overlaps with the palaeographic estimates, but instead of an older date, a younger date than previously considered is also possible.

The script of 4Q201 is hard to assess, in part because the scribe used a pen with a thick, worn nib to write small letters, which may account for the atypical aleph. Yet, apart from archaic forms of samek and shin nothing is typologically incongruent with the early Hasmonaean script.

As for 11Q5, Sanders DJD4 understood its script as transitional from early to late Herodian, comparing it to 4Q113 and also 1QM, 4Q27, 4Q37, and 4Q51. He estimated its script to the first half of the first century CE, possibly slightly earlier than 4Q113, Cross’s specimen for “a developed Herodian formal script”. However, clear typological distinctions on the level of individual letters between ‘early’, ‘developed’, and ‘late’ Herodian according to Cross’s specimens are not that easily made. For example, one may consider aleph which from early to late Herodian would advance to an inverted “v” form of the left leg and oblique axis, or dalet, where the horizontal stroke breaks through the right leg, and see that there is no difference here between the ‘developed’ and ‘late’ Herodian specimens of 4Q113, 4Q37, and 4Q85. On the other hand, the sharp bent in the right leg of ayin and sin/shin may be seen in early as well as developed and late Herodian exemplars, whereas in some manuscripts that are considered to be late Herodian the sharp bent is not clearly shown, e.g., Mur88 and 5/6Hev1b. As to more general features of the Herodian formal script, one may consider a generally uniform letter size, a base line, ligatures, and the development of keraiai or serifs. But beyond a general impression, these features are difficult to use for a clear typological differentiation of manuscripts within the Herodian formal script.

11Q5 has a 2σ𝜎\sigmaitalic_σ calibrated range of 5–120 CE (92.2%) and a minor peak of 35–15 BCE (3.3%), showing clear overlap with the different presumed Herodian palaeographic periods, even post-Herodian. The measurement has a standard deviation of only 18 in 14C years (BP). The length of the 2σ𝜎\sigmaitalic_σ calibrated range, 35 BCE–120 CE, is caused by the shape of the calibration curve in this period when converting the BP dates to calendar dates. Scholars of the Dead Sea Scrolls may consider a date later than 70 CE for 11Q5 unlikely because the scrolls found in the Qumran caves are assumed to have been hidden in the summer of 68 CE Popovic2012 .

4Q259 is notorious for its widely varying palaeographic estimates in the second-first centuries BCE. Cross Charlesworth1994 described 4Q259 as written in an unusual semicursive with mixed semicursive and semiformal script features. He gave 50–25 BCE as a date estimate. Earlier, Milik Milik1976 had suggested the second half of the second century BCE (Milik used the older reference number 4Q260), while later Puech Puech1998 suggested the first half of the first century BCE, preferably shortly after 100 BCE. Puech argued for this date on a combined basis of a palaeographic analysis of Cryptic A script (compared to 4Q298 and especially to 4Q249 and 4Q317) and the 14C dating of 4Q317 Jull1995 .

4Q259 has a 2σ𝜎\sigmaitalic_σ calibrated range of 210–100 BCE (69.7%) and a minor peak of 70–55 BCE (1.4%). The 2σ𝜎\sigmaitalic_σ calibrated range of 210–100 BCE agrees with the two older palaeographic estimates of Milik and Puech, whereas the minor peak of 70–55 BCE is nearer to Cross’s estimate. The bimodal evidence for 4Q259 shows an older 2σ𝜎\sigmaitalic_σ calibrated range of 350–310 BCE (24.3%), but, as for 4Q47, this older peak can be rejected as possible solutions based on typological comparison with date-bearing Aramaic manuscripts from the fourth century BCE.

Following Cross’s typology, Puech Puech1998B analysed 4Q521 as a Hasmonaean formal script and estimated it between 100–80 BCE. This manuscript was also radiocarbon dated in the 1990s Jull1995 . That BP date (1984 ± 33) now has to be recalibrated according to the IntCal20 calibration curve (reimer2020intcal20 ), which results in a 2σ𝜎\sigmaitalic_σ date range of 45 BCE–120 CE. According to the bimodal evidence of our study, the younger 2σ𝜎\sigmaitalic_σ calibrated range is 230–100 BCE (57.5%), while the older peak in the 2σ𝜎\sigmaitalic_σ range of 355–285 BCE (38.0%) can be rejected as a possible solution due to comparative typological evidence from date-bearing Aramaic manuscripts from that period. The difference in age between the two radiocarbon tests may be due to the Soxhlet procedure cleaning castor oil from the sample, but it is not possible to quantify or ascertain that. The palaeographic estimate of 100–80 BCE and our 2σ𝜎\sigmaitalic_σ calibrated range of 230–100 BCE connect in the year 100 BCE. So, considering measurement uncertainties, 4Q521 can be taken as a partial overlap.

The script of 5/6Hev1b was considered by Cross Cross2003 to be a post-Herodian formal, estimated from 75–100 CE (Flint DJD38 suggested 50–68 CE). This sample was the least precise 14C result in our study, with a standard deviation of 28 years in BP, and calibrated in 2σ𝜎\sigmaitalic_σ to 10–205 CE. The large calibrated date range, caused by the shape of the calibration curve in this period, clearly encompasses previous palaeographic estimates, but also moves in both a much older and a much younger direction of possible dates.

D.1.2 No overlap

Nine out of 26 samples yield (accepted) 2σ𝜎\sigmaitalic_σ calibrated ages that do not overlap with previous palaeographic estimates. In all 9 cases, the 14C results give calibrated age ranges that are older than previous palaeographic estimates. Yet, in light of our critical assessment, the older 14C age ranges are in most cases also palaeographically possible and realistic. This applies to: 4Q2, 4Q3, 4Q27, 4Q30, 4Q114, 4Q206, 4Q267, 4Q375, 4Q416.

4Q30 was Cross’s “typical Hasmonaean” script specimen from the middle of the period, 125–100 BCE, like 1QIsaa Cross2003 . The calibrated result for this sample in our study is bimodal. According to the 14C measurement, 4Q30 has a 2σ𝜎\sigmaitalic_σ calibrated range of 235–165 BCE (36.7%) and a minor peak of 260–245 BCE (1.4%). The older 2σ𝜎\sigmaitalic_σ peak of 360–275 BCE (57.4%) can be rejected as a possible solution based on palaeographic comparison with date-bearing manuscripts in Aramaic script from the period. Though Cross gave the more narrow estimate from 125–100 BCE, White Crawford estimated more broadly from 150–100 BCE DJD14 . An earlier date range, say in the first half of the second century BCE, as indicated by 14C, is realistic and possible. In general, there is no reason to chronologically limit the script identified as Hasmonaean to the upper range of the political-historical period of the same name in the mid-second century BCE (see Section A.2.2). The sequence of relative typology can chronologically easily be moved to an older age range. Though in general 4Q30 shows a more uniform letter size, at the level of individual letters, the often not yet ‘standard’ letter size of aleph and the often small ayin point to earlier typology in the Hasmonaean script. As we also argued for, for example, 4Q47 and 4Q176 (Section D.1.1), 4Q30 shows mixed typological evidence.

4Q27 was Cross’s “typical exemplar of the extremely popular Round semiformal style”, initially estimated by him to be early Herodian (ca. 30 BCE–20 CE) but later slightly revised by Jastram and Cross to the latter half of the first century BCE DJD12 . 4Q27 has a 2σ𝜎\sigmaitalic_σ calibrated range of 200–50 BCE (94.2%) and a minor peak of 340–330 BCE (1.3%) that can be rejected as a possible solution for palaeographic reasons. The 2σ𝜎\sigmaitalic_σ calibrated range of 200–50 BCE comes near the revised palaeographic estimate. The calibrated date has a large range. This is caused by the measurement’s standard deviation of 26 14C years (BP) in combination with the shape of the calibration curve in this period.

Interestingly, the 2σ𝜎\sigmaitalic_σ calibrated range for another specimen of the Herodian round semiformal, 4Q161, is 55 BCE–30 CE (92.1%), with two minor peaks of 90–80 BCE (1.7%) and 45–60 CE (1.7%). This may suggest a longer and somewhat older age range for this Herodian-type script than only the latter half of the first century BCE. Palaeographically, there are also many differences between 4Q27 and 4Q161. In 4Q161 the long extending base strokes of kaph, broad dalet, ligatures, and strikingly penned tet and shin stand out, whereas 4Q27 shows less tendency to broadening of letters. Although possible from a 14C perspective, a date in the first half of the second century BCE for 4Q27 seems unlikely from a typological perspective in comparison with other manuscripts. Nonetheless, there are four more Herodian-type manuscripts dated to that range by 14C in our study: 4Q3, 4Q267, 4Q375, and 4Q416.

4Q267 is another example of Cross’s early Herodian round semiformal. Yardeni related 4Q267 to 4Q397 as possibly written by the same scribe and estimated it from 30 BCE–20 CE DJD18 (Yardeni did not take over Cross’s round semiformal categorization and understood its script as formal). 4Q267 was also radiocarbon dated in the 1990s Jull1995 . That BP date (2094 ± 29) now has to be recalibrated according to the IntCal20 calibration curve (reimer2020intcal20 ), which results in a 2σ𝜎\sigmaitalic_σ calibrated range of 200–40 BCE (94.0%) and a minor peak of 10 BCE–5 CE (1.5%). According to our study, 4Q267 has a 2σ𝜎\sigmaitalic_σ calibrated range of 210–95 BCE (65.3%) and a minor peak of 70–55 BCE (1.6%), whereas the older 2σ𝜎\sigmaitalic_σ range of 355–290 BCE (28.6%) can be rejected as a possible solution due to comparative typological evidence from date-bearing Aramaic manuscripts from that period. The difference in age between the two radiocarbon tests may by due to the Soxhlet procedure cleaning castor oil from the sample, but it is not possible to quantify or ascertain that.

From a typological perspective it is difficult to understand the script of 4Q267 being chronologically so near to quite different typological specimens in the second century BCE. We may have to reckon with overlapping or partly adjacent style developments but in this case it would severely impact the relative typology dominant in the field, not just moving it chronologically and keeping the relative typology intact.

4Q267 might be an outlier, yet this 14C result raises the fundamental issue of how the absolute, chronological dating of typological differences in a linear sequence has been substantiated. Cross assumed a slow development of the Aramaic/Hebrew script in the third century BCE and he assumed a rapid evolution of the script in the Hasmonaean and Herodian eras, but he could not substantiate either assumption, due to the lack of date-bearing documents. He assumed but did not demonstrate that the finer typological distinctions had to be chronologically sequenced one after the other instead of existing partially next to each other (see, similarly, Sirat1986 ).

Cross wavered with his palaeographic estimate of the ‘semicursive’ 4Q114 from the late second century BCE (125–100) to ca. 100–50 BCE, and, under influence of the finds of Wadi Daliyeh, back to the late second century BCE DJD16 , “no more than about a half century younger than the autograph”, Cross said Cross1961B . Interestingly, Cross dated 4Q114 contemporary to the formal hand of 4Q30. 4Q114 preserves Daniel 8–11, a part of the book which scholars argue on literary-historical grounds to have been composed in the 160s BCE. 4Q114 has a 2σ𝜎\sigmaitalic_σ calibrated range of 230–160 (45.9%) and an older 2σ𝜎\sigmaitalic_σ range of 355–285 (49.5%) that can be rejected as a possible solution based on comparative typological evidence from date-bearing Aramaic manuscripts from that period.

Because of its scribal errors, it is unlikely that the scribe of 4Q114 was the author. But the early date and low scribal quality of 4Q114 shed new light on the production and circulation of literature in ancient Judaea: its date is indicative for the speed of the text’s spread, and the low quality of the manuscript may indicate it originated in a social context close to the original author Popovic2023 ; future research may further validate this. 4Q114 would then have been copied very soon after the assumed composition of Daniel 8–11. The 14C 2σ𝜎\sigmaitalic_σ date of 230–160 BCE for 4Q114 is matched by a very much comparable older 14C date of 4Q30.

For 4Q206, Milik Milik1976 gave an estimate from the first half of the first century BCE, and simply referred to four of the exemplary Hasmonaean manuscripts given by Cross (4Q30, 4Q51, 4Q114, 4Q398), apparently with no concern for their differences in style and for Cross dating these quite differently. In his recent edition in consultation with Puech, Drawnel Drawnel2019 estimated 4Q206 to be from the middle of the first century BCE. It is interesting that two of Milik’s typological comparanda, 4Q30 and 4Q114, have 14C results in our study similar to 4Q206: the 2σ𝜎\sigmaitalic_σ calibrated range for 4Q206 is 235–145 BCE (45.8%) with a minor peak of 135–120 BCE (1.1%); the older 2σ𝜎\sigmaitalic_σ range of 360–280 BCE (48.6%) can be rejected as a possible solution for palaeographic reasons. In each of these cases the 14C results indicate an earlier chronological date than the palaeographic estimates. But typologically some letters are slightly different and commonly seen as a later development of the letter form, e.g., bet, mem, and ayin. Yet, other letters show varied forms within 4Q206 and some compare well with instances from 4Q30, e.g., aleph, he. So 4Q206 may be another example of mixed typological evidence.

Then there are four Herodian-type manuscripts whose 14C dates extend into the second century BCE: 4Q2, 4Q3 and 4Q375 and 4Q416.

The script of 4Q2 has been described as late Herodian or even post-Herodian (ca. 50–68+ CE), in part because of the increasing use of keraiai DJD12 . While the script is typologically certainly Herodian, the assumption that calligraphic features are typical for its latest period cannot be substantiated. 4Q2 has a 2σ𝜎\sigmaitalic_σ calibrated range of 125 BCE–10 CE (90.3%) with a minor peak of 155–130 BCE (5.2%), providing a date range up to 10 CE, which seems realistic to us.

The case of 4Q3 is more difficult. Its script was tersely described as “an Herodian formal hand dating from the middle to end of that period (c. 20-68 CE)” DJD12 . Indeed, the script of 4Q3 features several letters and elements generally regarded to be developed Herodian, like the small tick above the crossbar of the final mem. Yet, some letters have older shapes which are uncommon in those developed Herodian formal hands, such as the ‘horned’ dalet. 4Q3 has a 2σ𝜎\sigmaitalic_σ calibrated range of 200–50 BCE (92.0%) and a minor peak of 340–325 BCE (3.5%) that can be rejected as a possible solution for palaeographic reasons. The palaeographic rule of thumb that the latest forms are indicative for its age would militate against the 2σ𝜎\sigmaitalic_σ range of 200–50 BCE. Yet, the exact moment when those latest forms have arisen has not been substantiated in the field. Future evidence may further validate this.

Strugnell provided a judicious analysis of the palaeography of 4Q375, comparing its style to that of the round or rustic semiformal series which is generally associated with early Herodian, but also arguing that, typologically, it must be an early exemplar since some letters do not yet have the typically Herodian forms DJD19 . True to his custom, he did not translate this typological assessment into a calendar date, but in Cross’s correspondence between hand and style this would amount to ca. 50–25 BCE, which would nearly agree with the 2σ𝜎\sigmaitalic_σ calibrated range of 205–50 BCE (89.5%). Considering the uncertainties in the palaeographic estimate, this is acceptable. The 2σ𝜎\sigmaitalic_σ peak of 340–320 BCE (6.0%) can be rejected as a possible solution for palaeographic reasons.

Also for 4Q416, Strugnell carefully analysed its individual letters, arguing that in most cases these should be placed between 4Q51 and 1QM, hence “in a date transitional between the late Hasmonaean and the earliest Herodian hands” DJD34 . He judged the script of 4Q416 to be earlier than those of 4Q415, 4Q417, and 4Q418 by some twenty-five years so that a palaeographic estimate of 50–25 BCE presents itself. 4Q416 has a 2σ𝜎\sigmaitalic_σ calibrated range of 205–90 BCE (78.1%) and a smaller peak of 80–50 BCE (9.4%). Considering the uncertainties in the palaeographic estimate, this is acceptable. The 2σ𝜎\sigmaitalic_σ peak of 345–320 BCE (8.0%) can be rejected as a possible solution for palaeographic reasons.

D.1.3 Concluding the comparison between radiocarbon results and palaeographic estimates

Based on this comparison between (accepted) 2σ𝜎\sigmaitalic_σ calibrated dates and previous palaeographic estimates we make the following concluding observations.

Overall, the 14C results indicate older date ranges for individual manuscripts. Only two manuscripts, 4Q201 and 11Q5, have date ranges that go in the direction of a younger possible range (5/6Hev1b has a range both a bit older and much younger). Thus, Hasmonaean-type manuscripts have 14C date ranges that allow for older dates in the first half of the second century BCE, and sometimes also up to the latter part of the third century BCE, instead of the late second century or early first century BCE. There are no compelling palaeographic or historical reasons that preclude these older dates as reliable time markers for the Hasmonaean script (this also applies to the solid third-century BCE range for 4Q70 and its Archaic-type script).

The 14C results for most manuscripts confirm the basic distinction between Hasmonaean-type manuscripts that are older, and Herodian-style manuscripts that are younger, and, for that matter, also between Archaic-type (4Q52 and 4Q70) and Hasmonaean-type manuscripts. However, the 14C date ranges for manuscripts that are traditionally considered Hasmonaean and Herodian are quite differently distributed across the timeline.

As can be seen in Figure 1 in the main article, the twelve Hasmonaean-type manuscripts in our sample set have (accepted) 2σ𝜎\sigmaitalic_σ calibrated date ranges from the second and first century BCE, as expected, and most extend also into the late third century BCE. Three Herodian-type manuscripts (4Q161, Mas1k, XHev/Se2) have (accepted) 2σ𝜎\sigmaitalic_σ calibrated date ranges from the latter half of the first century BCE and the first century CE, as expected. Two Herodian-type manuscripts have date ranges in the first century CE, as expected, but also extend into the second century CE (11Q5 and 5/6Hev1b, the latter even into the early third century CE). 4Q2 has a date range extending from the early first century CE back to the early second century BCE. And five Herodian-type manuscripts have (accepted) 2σ𝜎\sigmaitalic_σ calibrated date ranges in the second century BCE (4Q3, 4Q27, 4Q267, 4Q375, and 4Q416), though 4Q27 extends into the first century BCE.

This adds a third component to our critical assessment. In addition to critiquing the application of traditional typology to individual manuscripts and dismantling unsubstantiated historical suppositions and chronological limitations, the results of this study also question the validity of the relative typology as such. The traditional relative typology can be maintained but not in all cases. The spread of the Hasmonaean-type manuscripts over the timeline does not affect Cross’s relative typology in a major way but the older, second-century BCE date ranges of the Herodian-type manuscripts do affect the relative typology potentially in a major way.

Individual manuscripts frequently show mixed typological evidence: a manuscript can have different forms of individual letters that are considered ‘older’ or ‘younger’ according to the traditional palaeographic framework. There is, however, not a good method to assess what this means in terms of relative placement of the individual manuscript, nor, for that matter, what it means for relative typology in general. The rule of thumb that the typologically latest forms should determine one’s palaeographical estimate of a manuscript, presupposes existing palaeographical date markers and a decision which features are typologically important or indicative. The dated Wadi Daliyeh discoveries showed that features that were supposed to be significantly later, already appeared many decades earlier than previously expected. Moreover, it is assumed but not substantiated that typological differences must be translated to chronological linear sequences. Instead of a linear development, as Cross and others have assumed, the possibility of overlapping or partly adjacent style developments must be considered.

So, the so-called Hasmonaean script can indeed be regarded older than the so-called Herodian script but the 14C results of this study indicate that the Herodian script was present earlier than previously thought. This suggests that these scripts were not transitioning from the mid-first century BCE onward (the so-called late Hasmonaean/early Herodian category of manuscripts) but that much earlier they already existed partially next to each other.

This study shows that there are no cogent reasons for limiting the palaeographic dating of style developments to political-historical periods such as Hasmonaean or Herodian. The terms ‘Hasmonaean’ and ‘Herodian’ might still be employed for types of script, but these cannot be converted to specific date ranges. For date estimations of individual manuscripts, one should rather use concrete age ranges.

D.2 Combining palaeography and radiocarbon data to train the artificial intelligence-based date-prediction model

In this study, we combine palaeography and radiocarbon dating to train our date-prediction model. It should be stressed that this means a combination of qualitative and quantitative approaches and methods. Palaeography is a qualitative approach, based on expert knowledge, which is similar to the role of, for example, epigraphy as expert knowledge in assael2022restoring . This is different from, for example, an archaeological or geological stratigraphy that often can be quantified on the timeline. From a palaeographic approach alone it is not possible to pinpoint an exact date, date range, or date limit on the timeline of the period of the Dead Sea Scrolls, because palaeographic dates are estimates and do not provide absolute or fixed dates. For example, in our research, palaeography tells us that most scrolls cannot date to the fourth century BCE but we cannot assign a limiting number like 300 BCE. Moreover, typological script development is a gradual process, not necessarily following a linear time trajectory.

Most Dead Sea Scrolls are typologically younger when compared to the script in Aramaic date-bearing documents from the fourth century BCE. Therefore, when any knowledgeable palaeographer is presented with the bimodal evidence in the 2σ𝜎\sigmaitalic_σ range as is the case with our study, then they certainly will reject the older peak as a possible solution, as we have already explained (Sections D.1.1 and D.1.2), a principle also tacitly applied by Bonani1992 ; Jull1995 . Hence, in these cases of bimodal evidence typologically younger is also chronologically younger.

However, it is an open research question whether some of the oldest Dead Sea Scrolls might be older than their palaeographic estimated date in the mid-third century BCE. Following the palaeographic principle to compare the script of an undated manuscript to that of dated writings with a similar script (see Section A.2.2 in Appendix A), we have already argued that 4Q52 would have to be dated chronologically nearer to the Wadi Daliyeh manuscripts, especially WDSP 1 from 335 BCE (see Section D.1.1). This may also apply to 4Q70, which has 2σ𝜎\sigmaitalic_σ calibrated ranges of 320–200 BCE (79.2%) and 375–345 BCE (16.3%). The older 2σ𝜎\sigmaitalic_σ peak can be rejected as a possible solution based on typological comparison but the younger 2σ𝜎\sigmaitalic_σ peak’s extending into the fourth century BCE cannot be completely ruled out, although 4Q70 is typologically further removed from the date-bearing Aramaic manuscripts from the fourth century BCE than 4Q52 and a date in the third century BCE for 4Q70 is more likely from a palaeographic perspective. However, such a qualitative assessment cannot be characterised as a specific quantitative prior (an expected date with mean and standard deviation) in the timeline. Therefore, palaeographers cannot give an exact date such as 280, 300, or 320 BCE as the year before which none of the Dead Sea Scrolls can be dated.

In order to train our artificial intelligence-based date-prediction model, we use the accepted 2σ𝜎\sigmaitalic_σ calibrated data from 24 of the 26 accepted 14C results (Table LABEL:tab:summarized-c14). For the training of Enoch, the data from two manuscript samples are not used: Mur19 and 4Q52. Because of its cursive script, the papyrus fragment Mur19 is, at the moment, not relevant for Enoch. We also leave 4Q52 out of consideration because, in this case, we cannot decide between the two peaks in the probability distribution (see Section D.1.1). This is why we work with the tentative addition or deletion of 4Q52 in the training of our algorithm. This is how we get from 26 accepted 14C results to 24 manuscripts used as the primary training set for Enoch.

Appendix E Artificial intelligence (AI) in dating the scrolls

In this project, we do use deep learning for image processing (binarisation) but have refrained from properly using it for the date prediction. See Appendix F, explaining the objections to the use of deep learning for the date prediction, including an analysis of an experiment we executed, using transfer learning starting with a state-of-the-art foundational deep-learning model.

E.1 Data preparation

Our first step is to collect and prepare the data for the date prediction model. We collect the images of the manuscripts for each of the 14C samples with accepted dates. We have used 24 manuscripts as the primary training set for our date-prediction model (a complete list can be found in Appendix I in the supplementary materials). The physical 24 radiocarbon-dated manuscripts are visually spread out on many individual fragment images of the IAA’s Leon Levy Dead Sea Scrolls Digital Library collection dssllweb . In addition to this primary training set, we have created different combinations of training data to perform comparative analyses and further check the robustness of the model (see Subsection E.9 for details). We obtained a data set of 75 images from the 24 radiocarbon-dated manuscripts. We use 62 of these images to train our model (Figure 9 shows the size distribution of the training images after the preprocessing steps). The remaining 13 images, chosen deliberately and randomly, are passed as unseen test data to validate the robustness and reliability of the model’s performance. We also select a large number of images to perform tests on the date prediction model. Once the images are selected, we start with the preprocessing task, where we use BiNet, the neural network architecture, to extract the characters. The binarization, along with alignment correction and fragment arrangement, provides better-quality images (see Subsection E.1.1). It is extremely important to obtain the highest quality of binarized images. This is because the image quality determines the success of the feature computation and the ultimate date regression model.

Refer to caption
Figure 9: Box-plot showing the height and width spreads of the 62 training images.

E.1.1 Preprocessing: binarization, alignment and arrangement correction

We start with the multispectral band images for each fragment to use an image fusion technique designed in-house dhali2019binet to create pseudo-colour images from a weighted combination of the band images (see Figure 10). The resultant pseudo-colour images offer high contrast and facilitate better separation of ink from backgrounds, a task commonly known as binarization. Both training and test images go through the same preprocessing techniques. It is important to note that although many modern deep-learning methods can be trained directly using the colour/grayscale without binarization, this approach is not suitable for dating the scrolls. Direct end-to-end solutions, i.e., classification or clustering of the training images with testing images, may seem feasible, but there is a risk of obtaining completely inaccurate results. Artificial neural networks, for instance, may make decisions based on superficial correlations with the texture of the parchment, leading to erroneous outcomes. Therefore, isolating only the ink traces (foreground) and excluding any other material features in the images (background) is crucial. BiNet is a deep-learning-based method specially designed to binarize scroll images. Instead of using a simple filtering technique, BiNet uses a neural network architecture for the binarization task and therefore yields better output dhali2019binet (see Figure 11).

Refer to caption
Figure 10: An illustration of creating a pseudo-colour image. The first image from the left is the full spectrum colour image of 11Q5 from plate 974974974974. The next three are the band images (for the formation of three channels using multispectral images) with wavelengths of 595nm595𝑛𝑚595nm595 italic_n italic_m, 924nm924𝑛𝑚924nm924 italic_n italic_m, and 638nm638𝑛𝑚638nm638 italic_n italic_m, respectively. On the right side is the resultant pseudo-colour image (fused image).
Refer to caption
Figure 11: The BiNet architecture shows the encoder (contracting path) at the left half and the decoder (expanding path) at the right half of the image. Each step in the decoder part receives a concatenation with the corresponding feature map from the encoder part through the skip connections. This concatenation circumvents the bottleneck issue at the deepest layers of the encoder and ensures the precise localization of the foreground-background pixels. The example shows a pseudo-colour image of 11Q5 (Plate 974) as input on the left and the output binarized image on the right.

Once the binarization process is complete, additional cleaning of the images is performed. This cleaning step aims to remove any extra noise or speckles that were not completely removed by the binarization technique. This is a crucial procedure to ensure that features are extracted only from the characters of each image. Subsequently, rotation and alignment correction are also performed. If the images are rotated at some angle to the horizontal axis, it can affect feature calculations that rely on rotation invariance. Therefore, rotation correction is applied to align the text lines horizontally. In some cases, a minor affine transformation and stretching correction are executed in a selective manner. These corrections are specifically intended to align the twisted text lines caused by the degradation of the parchment. In many cases, one manuscript contains multiple fragments. In these cases, we put the fragments together and arrange them into a single image (see Figure 12). The GIMP tool, a free and open-source graphics editor, is used for rotation and arrangement correction gimp . It is important to note that the alignment and arrangement corrections are mostly done with the training images to obtain accurate feature extractions for the style periods represented by those images. However, these corrections are only done for some of the test images due to the limitation of time and resources. Most test images are used directly after binarization, sometimes leading to an unrealistic prediction due to damaged and deformed characters (see Figure 13). If any test images need special attention in the future, extra steps can be performed to obtain a better image for a better prediction from Enoch.

Refer to caption
Figure 12: An example of image preparation for 4Q319: a full plate image from IAA is shown on the left side (Plate 810). Then in the middle column, four different fragment images (full spectrum colour images) and their binarized outputs are presented. Finally, further cleaning, alignment correction, and arrangement are performed to produce the final image of 4Q319 on the right.
Refer to caption
Figure 13: Characters are deformed (marked in red) near the edges of the physical fragments from one of the test manuscripts, 11Q7 (IAA plate 606-1)—these deformities in the binarized images (with slanted or skewed characters) affect the textural and allographic feature calculations.

E.2 Data augmentation

We have a very limited number of radiocarbon-dated manuscripts from which we derive the training images. During the writing process of any document, writers naturally introduce variability even within the same time period. In order to address both issues of data scarcity and writing variations within a period, we perform data augmentation by introducing acceptable variation to the data. The small random shape perturbations will, on the one hand, ensure the system’s robustness and, on the other hand, consider variations of writing styles within a particular period. In machine learning, augmentation is an often-used method to counteract the effects of lack of data and imbalance in sampling MUMUNI2022augmentation . We augment training and testing data by generating synthetic images using random geometric distortions bulacu2009morph .

We perform data augmentation using applying random elastic ‘rubber-sheet’ transforms. For each pixel (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) of the column images, a random displacement vector (Δx,Δy)Δ𝑥Δ𝑦(\Delta x,\Delta y)( roman_Δ italic_x , roman_Δ italic_y ) is generated. The complete image’s displacement field is smoothed using a Gaussian convolution kernel with a standard deviation σ𝜎\sigmaitalic_σ. We then rescale the field to an average amplitude A𝐴Aitalic_A. The new morphed image (i,j)superscript𝑖superscript𝑗(i^{\prime},j^{\prime})( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) is generated using the displacement field and bilinear interpolation:

i=i+Δx,j=j+Δy.formulae-sequencesuperscript𝑖𝑖Δ𝑥superscript𝑗𝑗Δ𝑦i^{\prime}=i+\Delta x,j^{\prime}=j+\Delta y.italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_i + roman_Δ italic_x , italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_j + roman_Δ italic_y . (1)

Two parameters control this morphing process: the smoothing radius σ𝜎\sigmaitalic_σ and the average pixel displacement A𝐴Aitalic_A. Both parameters are measured in units of pixels. In our experiment, we empirically chose a displacement value of 1.01.01.01.0 and a smoothing radius of 8.08.08.08.0 (see Figure 14).

Refer to caption
Figure 14: The original binarized image of 11Q5 (Plate 974) on top left and three randomly augmented morphed images. A close inspection of the images shows small geometric distortion introduced to the characters using the elastic-morphing technique.

E.3 Allographic codebook with neural networks

After binarization with BiNet dhali2019binet , connected components of ink were fragmented on Y-minima, to prevent large blobs of multi-character components, yielding ‘fraglets’. For each fraglet, the contour curve was determined, running over the edge of a connected component in a counter-clockwise manner. Each contour-pixel sequence is ‘time’ normalized to 200 samples, (cosine, sine) pairs, yielding a feature vector of 400 values. Using the Kohonen Kohonen1982 self-organizing map neural network, codebooks of 70×70707070\times 7070 × 70 and 80×80808080\times 8080 × 80 prototypical contours were computed PlosOne (see Figure 15). As a proof of concept, 590590590590 manuscripts from the Dead Sea Scrolls collection were manually labeled as ‘Hasmonaean’ (Nhas=307𝑁𝑎𝑠307Nhas=307italic_N italic_h italic_a italic_s = 307) or ‘Herodian’ (Nher=283𝑁𝑒𝑟283Nher=283italic_N italic_h italic_e italic_r = 283) by a palaeographer. During training on half of the data, each codebook element obtained the counts for its occurrence in ‘Hasmonaean’ or ‘Herodian’ manuscripts, respectively. During testing, each manuscript is characterized by its relative occurrence of Herodian-like vs. Hasmonaean-like fraglets. Using the 80x80 map and applying a linear SVM SVMlite on this 2D feature representation, a classification accuracy of 93 (±plus-or-minus\pm± 2.3%) was obtained, computed over 20 random odd/even splits of the 590 manuscripts. Individual accuracy test results: 90.9, 89.2, 93.9, 91.2, 94.2, 90.2, 93.2, 94.2, 91.2, 95.3, 92.9, 96.3, 94.6, 92.2, 95.3, 92.5, 96.6, 88.8, 94.9, 93.2 (%). On the basis of this pilot experiment and earlier work PlosOne , the allographic feature was deemed a usable candidate for the more fine-grained manuscript-dating algorithm using the carbon-dated training samples.

Refer to caption
Figure 15: A visualization of 70x70 Kohonen map of fragmented connected components (200 x,y points per contour centroid) from the Dead Sea Scrolls collection. Image adapted from PlosOne .

E.4 Textural-level features

Similarly, the ‘Hinge’ feature Bulacu2007 ; PlosOne was chosen because of its ability to capture curvature-related differences between different samples of handwriting (see Figure 16). It addresses the occurrence of different degrees of roundness or sharpness of the path described by the edge between ink traces and paper. Its ability to classify between ‘Hasmonean’ and ‘Herodian’ styles is less powerful than in the case of allographic fraglets. Using the nearest mean and the Chi-square distance on a 195-dim hinge feature delivers 63.5% accuracy (±plus-or-minus\pm±2.9%). This dimensionality is still high, and collinearity problems due to feature correlation need to be avoided. Subsequently using PCA, selecting the 15 largest eigenvectors and applying a linear SVM for this binary classification task yields 73.1% accuracy (±plus-or-minus\pm± 0.24%). Still, on the basis of the complementary nature of the allographic and textural feature methods, it was decided to include the Hinge feature for the manuscript dating problem.

Refer to caption
Figure 16: Hinge kernel; the angles and leg-lengths for two different character shapes. Image adapted from PlosOne .

E.5 Adjoined feature

As shown in bulacu2006 , the combination of a fraglet codebook and the hinge feature proved to be very effective in writer identification. The assumption in the current study is that different historical style periods are revealed by the statistical characteristics both of allographic shape fragments and of angular distributions. Consider, for instance, a manuscript with predominantly vertical and horizontal strokes (‘formal’) and a manuscript written in a more informal (‘cursive’) style, both containing their characteristic shape elements in individual characters. The use of the two feature methods together will capture the underlying shape differences. The feature combination is realized by an adjoining of the two feature vectors: the arrays of feature values are combined in a single array containing the combined descriptor. Adjoined features are the weighted combination of both Hinge and Fraglet. The adjoining results in a feature vector of 5365536553655365 dimensions, preserving the handwriting style description from both feature levels.

E.6 Date-prediction model

We employ our date-prediction model once the features are calculated from the images. Given, for each manuscript, a style feature vector, we now address the transformation of this representation into an OxCal-type curve, i.e., a vector containing the estimated date probabilities for the sample. Because of the small size of the data set, high-parametric models such as period-specific temporal codebooks He2016 cannot be used here. We use conditional modeling using Bayesian Ridge regression Hoerl2000 that applies Bayesian inference to estimate the model parameters for date prediction. First, a prior distribution is placed on the model parameters, which expresses known constraints on the values of the parameters. The prior distribution is then updated with the observed data using Bayes’ rule to obtain the parameters’ posterior distribution and predicted dates.

We propose the Bayesian approach due to the nature of our target output data: the 14C data are not a single point on the timeline but are given as a distribution of probable dates within sigma (σ𝜎\sigmaitalic_σ) ranges. Hence, the probabilistic approach allows the use of all available information while remaining explainable. Furthermore, we can observe a full posterior distribution, which is used to assess the uncertainty of the estimated dates. Finally, the Bayesian approach also allows the model to indicate error margins for predictions on unseen data.

E.6.1 Unmodelled values from OxCal

The input of the date prediction model is the feature vectors of the training images, along with their probability distribution from radiocarbon dating as labels. We obtain the probability distribution as unmodelled raw values of 5-year resolution after the radiocarbon calibration was performed using OxCal v4.4.2 Oxcal ; Oxcal2 . We created a new project code for the 26 sample manuscripts using the 14C age (BP) and sigma (BP) values (see Table LABEL:tab:summarized-c14 for the BP values. For the code, please check the Zenodo repository (https://doi.org/10.5281/zenodo.11371749)). Using the measured BP values, the code (entirely reproducible) uses the simple format:

Plot()
  {
  R_Date("Q-number", age(BP), sigma(BP));
   };

OxCal generates the clean unmodelled (BCE/CE) probability values in a .csv file once the code is run. We obtain these values for each individual sample from OxCal options: View much-greater-than\gg Raw output. Please note that we do not specify any resolution in our code. Hence, our raw data are in the default resolution of 5 years, which is the same as the resolution of the IntCal curves, so no interpolation or binning is needed. It is possible to set the resolution to less than 5. Then, the curve will be interpolated by a cubic (or linear if that option is set) function by OxCal (as explained in Section B.4 in Appendix B).

E.6.2 Calibrated dates from 2-sigma ranges

Having performed the radiocarbon dating in its entirety (Appendix B) and then the palaeographic evaluation of the calibrated dates (Appendix D), as was also done in Bonani1992 ; Jull1995 , we use the calibrated dates from the 2σ𝜎\sigmaitalic_σ range for firmer grounding of our date-prediction model. In the case of bimodal evidence, palaeography determines that in most cases the younger 2σ𝜎\sigmaitalic_σ peak should be used for analysis. But, palaeography cannot be characterised as a specific quantitative prior (an expected date with mean and standard deviation) in the timeline. The issue is that we cannot assume a single point-spread density (Gaussian) along the time axis. Palaeography, in this sense, does not deliver a point-wise prior. What palaeography can deliver, is the identification of a point on the timeline that represents the historical impossibility of a range of dates on the left or right. This knowledge is based on intersubjective expert knowledge (Section D.2). Therefore, the use of expert knowledge in our case is not based on the usual Bayesian-plus-Gaussian method but on a more direct use of existing, qualitative, and intersubjective palaeographic knowledge, which allows splitting the 2σ𝜎\sigmaitalic_σ calibrated date range into a ‘left-half’ vs. ‘right-half’ time region of interest.

Specifically, palaeographic knowledge allows to make a binary split in the OxCal distribution of bimodal 2σ𝜎\sigmaitalic_σ ranges using the Heaviside function with the position of the step being placed at an innocuous low-probability point on the curve, where the probability has a plateau around zero. Applying a Heaviside multiplicative bias on the empirical density function is a valid Bayesian approach to perform peak selection. Another, e.g., smooth logistic variant of the step function or a cumulative Gaussian could have been used. In either case this would require the specification of a steepness parameter, the value of which is unknown. Similarly, from the palaeographic constraints, the standard deviation which would be needed under a Gaussian, i.e., point-localized density assumption in the Bayesian reasoning process, is not known. Note that apart from the Gaussian, many other distribution functions exist, e.g., Poisson, Weibull, Gamma, etc., for other applications in Bayesian reasoning. Such distribution functions could be used, if there are reasons to assume them. We do not have any reason for a detailed distribution choice and can only choose to disregard ‘impossible time regions’. In our case, for most bimodal 2σ𝜎\sigmaitalic_σ calibrated ranges we assume that the palaeographic evaluation that a left-most or right-most) region is impossible is correct (see Section D.2 in Appendix D), leading to a collapse of the probabilities in that range. The shape of the remaining distribution reflects the likelihood of dates.

Thus, the procedure we use is as follows:

  1. 1.

    We perform the radiocarbon dating in its entirety, with the calibrated dates having been generated using OxCal data with only 14C (BP) and σ𝜎\sigmaitalic_σ (BP), see Appendix B.

  2. 2.

    We use the calibrated dates from the 2σ𝜎\sigmaitalic_σ range for firmer grounding of our date-prediction model. Only in the case of bimodal evidence in these 2σ𝜎\sigmaitalic_σ ranges do we apply the Heaviside-function at a near-zero probability point on the curve to reject older peaks and accept younger/‘right-hand’ peaks as a possible solution, on the basis of expert palaeographic knowledge (Appendix D).

  3. 3.

    From OxCal, we obtain the raw data of the probability densities of the 2σ𝜎\sigmaitalic_σ ranges, which are used as input in our date-prediction model.

  4. 4.

    We work with the inclusion and exclusion of so-called minor or smaller probability peaks, which in 12 out of 14 instances have a probability of less than 4%percent44\%4 %; in the remaining two cases, it is 5.2% and 9.4%. The inclusion or exclusion of these peaks has minimal and insignificant consequences for the interpretation of the results (see Section E.7).

  5. 5.

    Because applying the Heaviside function for bimodal evidence leaves less than 95.4%percent95.495.4\%95.4 % of the entire 2-σ𝜎\sigmaitalic_σ probability for each sample, we normalise the accepted 2σ𝜎\sigmaitalic_σ calibrated probabilities. The output probability predictions of the dating model are also balanced and normalised using both weights and data augmentation (see Section E.7).

  6. 6.

    Within the accepted part of the 95.4% confidence range, the points in the probability distribution curve as calculated by OxCal are used as target values in the training of Enoch. The output distribution delivered by the Enoch model is a mixture of Gaussians approximating the shape of the OxCal curve. From those outputs, we select the 1σ𝜎\sigmaitalic_σ range for a clear, narrow visualization of the predicted date ranges (see Figure 1 in the main article). This choice is independent of the original selection of OxCal calibration ranges because the two methods are fundamentally different.

We emphasise that we do not perform modelling within the OxCal programme as is commonly used in radiocarbon dating practice. This does not compromise the transparency and reproducibility of our procedure. Using the 14C dates in BP and their measurement uncertainties (σ𝜎\sigmaitalic_σ), all plots can be reproduced using OxCal. Our reasoning for rejecting part of the bimodal data (see Appendix D) and the exact 2σ𝜎\sigmaitalic_σ probabilities we use (see Table E.6.2) are provided. Yet, other researchers can also use all our 14C data instead of following our reasoning for accepting part of the bimodal data, and justify their reasoning.

Table 8: Unmodelled radiocarbon calibrated dates for 2σ𝜎\sigmaitalic_σ ranges. Please note that the 2σ𝜎\sigmaitalic_σ values are the same as Table LABEL:tab:summarized-c14 in Appendix B. However, in this table, the highlighted date ranges indicate each sample’s accepted 2σ𝜎\sigmaitalic_σ intervals for the Enoch model.
Q-number 2σ𝜎\sigmaitalic_σ range 2σ𝜎\sigmaitalic_σ range 2σ𝜎\sigmaitalic_σ range 2σ𝜎\sigmaitalic_σ range
4Q504 -355 -285 -230 -150
4Q52 -410 -355 -285 -230
4Q176 -355 -300 -210 -100 -70 -60
4Q114 -355 -285 -230 -160
5_6Hev1b 10 205
4Q161 -90 -80 -55 30 45 60
4Q70 -375 -345 -320 -200
4Q47 -355 -290 -210 -100
4Q23 -355 -285 -230 -220 -210 -95 -75 -55
4Q255_4Q433a -170 -50
11Q5 -35 -15 5 120
4Q3 -340 -325 -200 -50
4Q27 -340 -330 -200 -50
Mas1k -50 65
4Q206 -360 -280 -235 -145 -135 -120
4Q30 -360 -275 -260 -245 -235 -165
4Q201_4Q338 -165 -40 -10 -1
4Q259 -350 -310 -210 -100 -70 -55
4Q416 -345 -320 -205 -90 -80 -50
4Q2 -155 -130 -125 10
4Q375 -345 -320 -205 -50
Xhev_Se2 -45 75
4Q541 -355 -300 -210 -95 -75 -55
4Q521 -355 -285 -230 -100
4Q267 -355 -290 -210 -95 -70 -55
Mur19 -45 85 95 110

In the following subsections, we present the mathematical derivation of the Bayesian regression from simple linear regression as used in Enoch, our date prediction model.

E.6.3 Linear regression

Given a set of training data {(𝐱𝐧,tn)}n=1Nsubscriptsuperscriptsubscript𝐱𝐧subscript𝑡𝑛𝑁𝑛1\{(\mathbf{x_{n}},t_{n})\}^{N}_{n=1}{ ( bold_x start_POSTSUBSCRIPT bold_n end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) } start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT comprising N𝑁Nitalic_N observations of dimensionality M𝑀Mitalic_M, where 𝐱nMsubscript𝐱𝑛superscript𝑀\mathbf{x}_{n}\in\mathbb{R}^{M}bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT, tnsubscript𝑡𝑛t_{n}\in\mathbb{R}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∈ blackboard_R, the goal in a regression model is to find a linear mapping f:M:𝑓superscript𝑀f:\mathbb{R}^{M}\rightarrow\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT → blackboard_R which approximates tnsubscript𝑡𝑛t_{n}italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT given 𝐱𝐧subscript𝐱𝐧\mathbf{x_{n}}bold_x start_POSTSUBSCRIPT bold_n end_POSTSUBSCRIPT as close as possible. Furthermore, the mapping should generalize to values outside the training data. From a probabilistic perspective, the aim is to model the predictive distribution p(tn𝐱n)𝑝conditionalsubscript𝑡𝑛subscript𝐱𝑛p(t_{n}\mid\mathbf{x}_{n})italic_p ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). In a linear regression model, the assumption is that the target variable t𝑡titalic_t is given by a deterministic function f(𝐱n,𝐰)𝑓subscript𝐱𝑛𝐰f(\mathbf{x}_{n},\mathbf{w})italic_f ( bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_w ) with added Gaussian noise, such that:

tn=f(𝐱n,𝐰)+ϵsubscript𝑡𝑛𝑓subscript𝐱𝑛𝐰italic-ϵt_{n}=f(\mathbf{x}_{n},\mathbf{w})+\epsilonitalic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = italic_f ( bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_w ) + italic_ϵ (2)

where ϵitalic-ϵ\epsilonitalic_ϵ is a Gaussian random variable with mean 0 and inverse variance parameter β𝛽\betaitalic_β, also called the precision. In a linear model where f(𝐱n,𝐰)=𝐰T𝐱𝑓subscript𝐱𝑛𝐰superscript𝐰𝑇𝐱f(\mathbf{x}_{n},\mathbf{w})=\mathbf{w}^{T}\mathbf{x}italic_f ( bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_w ) = bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x, the predictive distribution takes the form

p(tn𝐱n)=𝒩(𝐰T𝐱n,β1).𝑝conditionalsubscript𝑡𝑛subscript𝐱𝑛𝒩superscript𝐰𝑇subscript𝐱𝑛superscript𝛽1p(t_{n}\mid\mathbf{x}_{n})=\mathcal{N}(\mathbf{w}^{T}\mathbf{x}_{n},\beta^{-1}).italic_p ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = caligraphic_N ( bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) . (3)

Even though the predictive distribution is indirectly used to optimize the parameters of the linear regression model, we do not explicitly model this distribution. We will see in the next section that in the Bayesian interpretation of linear regression, we instead stay within a probabilistic framework and model the full predictive distribution, leading to several advantages over the standard linear regression model.

We first turn to parameter estimation for a linear model. This means estimating a value for the weight vector 𝐰𝐰\mathbf{w}bold_w that fits the data well. Most commonly, the least-squares criterion is used to estimate the weight vector 𝐰𝐰\mathbf{w}bold_w:

𝐰=argmin𝐰n=1N(tn𝐰T𝐱n)2superscript𝐰subscriptargminsuperscript𝐰superscriptsubscript𝑛1𝑁superscriptsubscript𝑡𝑛superscript𝐰𝑇subscript𝐱𝑛2\mathbf{w}^{*}=\operatorname*{arg\,min}_{\mathbf{w}^{*}}\sum_{n=1}^{N}(t_{n}-% \mathbf{w}^{T}\mathbf{x}_{n})^{2}bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (4)

This can be justified using maximum likelihood estimation if we assume that the training data is independent and identically distributed (i.i.d.). This works as follows. Let 𝐗=(𝐱1,,𝐱N)T𝐗superscriptsubscript𝐱1subscript𝐱𝑁𝑇\mathbf{X}=(\mathbf{x}_{1},\dots,\mathbf{x}_{N})^{T}bold_X = ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_x start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT and 𝐭=(t1,,tN)T𝐭superscriptsubscript𝑡1subscript𝑡𝑁𝑇\mathbf{t}=(t_{1},\dots,t_{N})^{T}bold_t = ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. The log-likelihood of the training data can then be written as

lnp(𝐭𝐗,𝐰,β)𝑝conditional𝐭𝐗𝐰𝛽\displaystyle\ln p(\mathbf{t}\mid\mathbf{X},\mathbf{w},\beta)roman_ln italic_p ( bold_t ∣ bold_X , bold_w , italic_β ) =lnn=1Np(tn𝐱n,𝐰,β)absentsubscriptsuperscriptproduct𝑁𝑛1𝑝conditionalsubscript𝑡𝑛subscript𝐱𝑛𝐰𝛽\displaystyle=\ln\prod^{N}_{n=1}p(t_{n}\mid\mathbf{x}_{n},\mathbf{w},\beta)= roman_ln ∏ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT italic_p ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , bold_w , italic_β )
=lnn=1N𝒩(tn𝐰T𝐱n,β1)absentsubscriptsuperscriptproduct𝑁𝑛1𝒩conditionalsubscript𝑡𝑛superscript𝐰𝑇subscript𝐱𝑛superscript𝛽1\displaystyle=\ln\prod^{N}_{n=1}\mathcal{N}(t_{n}\mid\mathbf{w}^{T}\mathbf{x}_% {n},\beta^{-1})= roman_ln ∏ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT caligraphic_N ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∣ bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT )
=n=1Nln𝒩(tn𝐰T𝐱n,β1)absentsubscriptsuperscript𝑁𝑛1𝒩conditionalsubscript𝑡𝑛superscript𝐰𝑇subscript𝐱𝑛superscript𝛽1\displaystyle=\sum^{N}_{n=1}\ln\mathcal{N}(t_{n}\mid\mathbf{w}^{T}\mathbf{x}_{% n},\beta^{-1})= ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT roman_ln caligraphic_N ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∣ bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT )
=n=1Nln{(2π)1/2β1/2exp(β2(tn𝐰T𝐱n)2)}absentsuperscriptsubscript𝑛1𝑁superscript2𝜋12superscript𝛽12exp𝛽2superscriptsubscript𝑡𝑛superscript𝐰𝑇subscript𝐱𝑛2\displaystyle=\sum_{n=1}^{N}\ln\{(2\pi)^{-1/2}\beta^{1/2}\text{exp}(-\frac{% \beta}{2}(t_{n}-\mathbf{w}^{T}\mathbf{x}_{n})^{2})\}= ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT roman_ln { ( 2 italic_π ) start_POSTSUPERSCRIPT - 1 / 2 end_POSTSUPERSCRIPT italic_β start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT exp ( - divide start_ARG italic_β end_ARG start_ARG 2 end_ARG ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) }
=N2lnβN2ln2πβED(𝐰),absent𝑁2𝛽𝑁22𝜋𝛽subscript𝐸𝐷𝐰\displaystyle=\frac{N}{2}\ln\beta-\frac{N}{2}\ln 2\pi-\beta E_{D}(\mathbf{w}),= divide start_ARG italic_N end_ARG start_ARG 2 end_ARG roman_ln italic_β - divide start_ARG italic_N end_ARG start_ARG 2 end_ARG roman_ln 2 italic_π - italic_β italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ) , (5)

where we make use of (3). The ED(𝐰)subscript𝐸𝐷𝐰E_{D}(\mathbf{w})italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ) term represents a sum-of-squares function, defined as

ED(𝐰)=12n=1N(tn𝐰T𝐱n)2.subscript𝐸𝐷𝐰12superscriptsubscript𝑛1𝑁superscriptsubscript𝑡𝑛superscript𝐰𝑇subscript𝐱𝑛2E_{D}(\mathbf{w})=\frac{1}{2}\sum_{n=1}^{N}(t_{n}-\mathbf{w}^{T}\mathbf{x}_{n}% )^{2}.italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (6)

Considering that maximizing the likelihood function with respect to 𝐰𝐰\mathbf{w}bold_w only depends on ED(𝐰)subscript𝐸𝐷𝐰E_{D}(\mathbf{w})italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ), expression (5) can be maximized by maximizing ED(𝐰)subscript𝐸𝐷𝐰-E_{D}(\mathbf{w})- italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ), or equivalently, minimizing ED(𝐰)subscript𝐸𝐷𝐰E_{D}(\mathbf{w})italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ). This corresponds to the least-squares objective shown in (4).

E.6.4 Ridge regression

We now turn to the ridge regression model, an extension of the linear regression model with more desirable properties, such as mitigating over-fitting. Concretely, we add a prior distribution over the weights 𝐰𝐰\mathbf{w}bold_w, leading to the following log-likelihood function:

lnp(𝐭𝐗,𝐰,β)+lnp(𝐰α).𝑝conditional𝐭𝐗𝐰𝛽𝑝conditional𝐰𝛼\ln p(\mathbf{t}\mid\mathbf{X},\mathbf{w},\beta)+\ln p(\mathbf{w}\mid\alpha).roman_ln italic_p ( bold_t ∣ bold_X , bold_w , italic_β ) + roman_ln italic_p ( bold_w ∣ italic_α ) . (7)

The prior distribution over the weights can be interpreted with the Bayes rule, showing the relationship to a posterior distribution over 𝐰𝐰\mathbf{w}bold_w:

p(𝐰𝐭)p(𝐭𝐰)p(𝐰),proportional-to𝑝conditional𝐰𝐭𝑝conditional𝐭𝐰𝑝𝐰p(\mathbf{w}\mid\mathbf{t})\propto p(\mathbf{t}\mid\mathbf{w})p(\mathbf{w}),italic_p ( bold_w ∣ bold_t ) ∝ italic_p ( bold_t ∣ bold_w ) italic_p ( bold_w ) , (8)

where we omit the 𝐗𝐗\mathbf{X}bold_X, α𝛼\alphaitalic_α, and β𝛽\betaitalic_β terms to keep the notation uncluttered. In other words, maximizing (7) corresponds to maximizing a posterior distribution over 𝐰𝐰\mathbf{w}bold_w. The question now arises what is a suitable form of the prior distribution p(𝐰)𝑝𝐰p(\mathbf{w})italic_p ( bold_w )? To ensure that p(𝐰𝐭)𝑝conditional𝐰𝐭p(\mathbf{w}\mid\mathbf{t})italic_p ( bold_w ∣ bold_t ) has the same functional form as p(𝐭𝐰)𝑝conditional𝐭𝐰p(\mathbf{t}\mid\mathbf{w})italic_p ( bold_t ∣ bold_w ), we choose p(𝐰)𝑝𝐰p(\mathbf{w})italic_p ( bold_w ) to be a conjugate prior of p(𝐭𝐰)𝑝conditional𝐭𝐰p(\mathbf{t}\mid\mathbf{w})italic_p ( bold_t ∣ bold_w ), namely a multivariate isotropic Gaussian distribution, taken to be zero-centered with precision parameter α𝛼\alphaitalic_α. The log-likelihood then becomes

lnp(𝐭𝐗,𝐰,β)+lnp(𝐰α)𝑝conditional𝐭𝐗𝐰𝛽𝑝conditional𝐰𝛼\displaystyle\ln p(\mathbf{t}\mid\mathbf{X},\mathbf{w},\beta)+\ln p(\mathbf{w}% \mid\alpha)roman_ln italic_p ( bold_t ∣ bold_X , bold_w , italic_β ) + roman_ln italic_p ( bold_w ∣ italic_α ) =\displaystyle==
n=1N{ln𝒩(tn𝐰T𝐱n,β1)}+ln𝒩(𝐰0,α1𝐈)subscriptsuperscript𝑁𝑛1𝒩conditionalsubscript𝑡𝑛superscript𝐰𝑇subscript𝐱𝑛superscript𝛽1𝒩conditional𝐰0superscript𝛼1𝐈\displaystyle\sum^{N}_{n=1}\{\ln\mathcal{N}(t_{n}\mid\mathbf{w}^{T}\mathbf{x}_% {n},\beta^{-1})\}+\ln\mathcal{N}(\mathbf{w}\mid 0,\alpha^{-1}\mathbf{I})∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT { roman_ln caligraphic_N ( italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ∣ bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) } + roman_ln caligraphic_N ( bold_w ∣ 0 , italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I ) (9)
=\displaystyle== (10)
N2lnβN2ln2πβED(𝐰)+M2lnαM2ln2παEW(𝐰),𝑁2𝛽𝑁22𝜋𝛽subscript𝐸𝐷𝐰𝑀2𝛼𝑀22𝜋𝛼subscript𝐸𝑊𝐰\displaystyle\frac{N}{2}\ln\beta-\frac{N}{2}\ln 2\pi-\beta E_{D}(\mathbf{w})+% \frac{M}{2}\ln\alpha-\frac{M}{2}\ln 2\pi-\alpha E_{W}(\mathbf{w}),divide start_ARG italic_N end_ARG start_ARG 2 end_ARG roman_ln italic_β - divide start_ARG italic_N end_ARG start_ARG 2 end_ARG roman_ln 2 italic_π - italic_β italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ) + divide start_ARG italic_M end_ARG start_ARG 2 end_ARG roman_ln italic_α - divide start_ARG italic_M end_ARG start_ARG 2 end_ARG roman_ln 2 italic_π - italic_α italic_E start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_w ) , (11)

where M𝑀Mitalic_M denotes the number of dimensions of the weight parameter 𝐰𝐰\mathbf{w}bold_w and I𝐼Iitalic_I denotes the identity matrix. Note that the first three summands of (11) correspond to (5). The EW(𝐰)subscript𝐸𝑊𝐰E_{W}(\mathbf{w})italic_E start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_w ) term represent a regularization term, defined by

EW(𝐰)=12𝐰T𝐰.subscript𝐸𝑊𝐰12superscript𝐰𝑇𝐰E_{W}(\mathbf{w})=\frac{1}{2}\mathbf{w}^{T}\mathbf{w}.italic_E start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_w ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_w . (12)

By removing terms from (11) that do not depend on 𝐰𝐰\mathbf{w}bold_w, we end up minimizing the sum of two terms, ED(𝐰)subscript𝐸𝐷𝐰E_{D}(\mathbf{w})italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ) and EW(𝐰)subscript𝐸𝑊𝐰E_{W}(\mathbf{w})italic_E start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_w ), denoting the data-dependent error and the regularization error, respectively. The relative importance of both terms is controlled by the α𝛼\alphaitalic_α and β𝛽\betaitalic_β hyperparameters. Equivalently, we minimize:

βED(𝐰)+αEW(𝐰).𝛽subscript𝐸𝐷𝐰𝛼subscript𝐸𝑊𝐰\beta E_{D}(\mathbf{w})+\alpha E_{W}(\mathbf{w}).italic_β italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ) + italic_α italic_E start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_w ) . (13)

If we combine α𝛼\alphaitalic_α and β𝛽\betaitalic_β into one hyperparameter λ=α/β𝜆𝛼𝛽\lambda=\alpha/\betaitalic_λ = italic_α / italic_β, we can equivalently write

ED(𝐰)+λEW(𝐰),subscript𝐸𝐷𝐰𝜆subscript𝐸𝑊𝐰E_{D}(\mathbf{w})+\lambda E_{W}(\mathbf{w}),italic_E start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( bold_w ) + italic_λ italic_E start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT ( bold_w ) , (14)

Which corresponds to

12n=1N(𝐰T𝐱ntn)2+λ2𝐰T𝐰,12superscriptsubscript𝑛1𝑁superscriptsuperscript𝐰𝑇subscript𝐱𝑛subscript𝑡𝑛2𝜆2superscript𝐰𝑇𝐰\frac{1}{2}\sum_{n=1}^{N}(\mathbf{w}^{T}\mathbf{x}_{n}-t_{n})^{2}+\frac{% \lambda}{2}\mathbf{w}^{T}\mathbf{w},divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG bold_w start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_w , (15)

Which forms the ridge-regression objective function. The λ𝜆\lambdaitalic_λ hyperparameter can control the degree of parameter shrinkage hastie2009elements , whereby the weight parameters are shrunk by imposing a penalty on their size. This brings the additional task of setting λ𝜆\lambdaitalic_λ, generally done using cross-validation. By setting the gradient of (15) with respect to 𝐰𝐰\mathbf{w}bold_w to 0 and solving for 𝐰𝐰\mathbf{w}bold_w, the approximate solution can be expressed in closed form using the standard equations:

𝐰=(𝐗T𝐗+λ𝐈)1𝐗T𝐭.𝐰superscriptsuperscript𝐗𝑇𝐗𝜆𝐈1superscript𝐗𝑇𝐭\mathbf{w}=(\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^{T}% \mathbf{t}.bold_w = ( bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_X + italic_λ bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_t . (16)

E.6.5 Bayesian regression

We now turn to a Bayesian treatment of the ridge regression model discussed in the previous subsection. First, consider the relationship we established between the posterior over 𝐰𝐰\mathbf{w}bold_w and the product of the likelihood and prior, as shown in (8). This is the point at which the ridge and Bayesian regression models diverge in their approach. For the ridge regression model, a point estimate for the weight vector 𝐰𝐰\mathbf{w}bold_w is obtained by using maximum a posteriori estimation (MAP), which involves maximizing the right-hand side of (8). We now discuss the alternative, fully Bayesian treatment, which explicitly models the posterior distribution on the left-hand side of (8).

Recall that we defined the prior distribution p(𝐰)𝑝𝐰p(\mathbf{w})italic_p ( bold_w ) as a conjugate prior to the likelihood function, leading to a multivariate Gaussian distribution. The result is that the posterior p(𝐰𝐭,𝐗)𝑝conditional𝐰𝐭𝐗p(\mathbf{w}\mid\mathbf{t},\mathbf{X})italic_p ( bold_w ∣ bold_t , bold_X ) also will have a Gaussian distribution. We can thus rewrite (8) to:

𝒩(𝐦N,𝐒N)𝒩(𝐗𝐰,β1𝐈)𝒩(0,α1𝐈),proportional-to𝒩subscript𝐦𝑁subscript𝐒𝑁𝒩𝐗𝐰superscript𝛽1𝐈𝒩0superscript𝛼1𝐈\mathcal{N}(\mathbf{m}_{N},\mathbf{S}_{N})\,\propto\,\mathcal{N}(\mathbf{X}% \mathbf{w},\beta^{-1}\mathbf{I})\mathcal{N}(0,\alpha^{-1}\mathbf{I}),caligraphic_N ( bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT , bold_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ∝ caligraphic_N ( bold_Xw , italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I ) caligraphic_N ( 0 , italic_α start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I ) , (17)

where the posterior is a Gaussian with mean 𝐦Nsubscript𝐦𝑁\mathbf{m}_{N}bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT and covariance 𝐒Nsubscript𝐒𝑁\mathbf{S}_{N}bold_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. We can use the Bayes theorem for Gaussian random variables to find 𝐦Nsubscript𝐦𝑁\mathbf{m}_{N}bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT and 𝐒Nsubscript𝐒𝑁\mathbf{S}_{N}bold_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. From this, it follows:

𝐦Nsubscript𝐦𝑁\displaystyle\mathbf{m}_{N}bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT =β𝐒N𝐗T𝐭absent𝛽subscript𝐒𝑁superscript𝐗𝑇𝐭\displaystyle=\beta\mathbf{S}_{N}\mathbf{X}^{T}\mathbf{t}= italic_β bold_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_t (18)
𝐒N1superscriptsubscript𝐒𝑁1\displaystyle\mathbf{S}_{N}^{-1}bold_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT =β𝐗T𝐗+α𝐈.absent𝛽superscript𝐗𝑇𝐗𝛼𝐈\displaystyle=\beta\mathbf{X}^{T}\mathbf{X}+\alpha\mathbf{I}.= italic_β bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_X + italic_α bold_I . (19)

It is worth noting the correspondence between the point estimate of 𝐰𝐰\mathbf{w}bold_w obtained in the ridge regression solution (16) and the mean of the posterior 𝐦Nsubscript𝐦𝑁\mathbf{m}_{N}bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT. If we fully write out 𝐦Nsubscript𝐦𝑁\mathbf{m}_{N}bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, we see that

𝐦Nsubscript𝐦𝑁\displaystyle\mathbf{m}_{N}bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT =β(β𝐗T𝐗+α𝐈)1𝐗T𝐭absent𝛽superscript𝛽superscript𝐗𝑇𝐗𝛼𝐈1superscript𝐗𝑇𝐭\displaystyle=\beta(\beta\mathbf{X}^{T}\mathbf{X}+\alpha\mathbf{I})^{-1}% \mathbf{X}^{T}\mathbf{t}= italic_β ( italic_β bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_X + italic_α bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_t
=β(β(𝐗T𝐗+λ𝐈))1𝐗T𝐭absent𝛽superscript𝛽superscript𝐗𝑇𝐗𝜆𝐈1superscript𝐗𝑇𝐭\displaystyle=\beta(\beta(\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I}))^{-1}% \mathbf{X}^{T}\mathbf{t}= italic_β ( italic_β ( bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_X + italic_λ bold_I ) ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_t
=β(β1(𝐗T𝐗+λ𝐈)1)𝐗T𝐭absent𝛽superscript𝛽1superscriptsuperscript𝐗𝑇𝐗𝜆𝐈1superscript𝐗𝑇𝐭\displaystyle=\beta(\beta^{-1}(\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I})^{-1% })\mathbf{X}^{T}\mathbf{t}= italic_β ( italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_X + italic_λ bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_t
=(𝐗T𝐗+λ𝐈)1𝐗T𝐭,absentsuperscriptsuperscript𝐗𝑇𝐗𝜆𝐈1superscript𝐗𝑇𝐭\displaystyle=(\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^{T}% \mathbf{t},= ( bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_X + italic_λ bold_I ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_t ,

which corresponds to (16). This means that the mode of the posterior distribution corresponds to the ridge regression solution. However, we use the full posterior distribution over 𝐰𝐰\mathbf{w}bold_w in the Bayesian regression approach rather than taking the mean 𝐦Nsubscript𝐦𝑁\mathbf{m}_{N}bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT as a point estimate. This works as follows. We first note that once we obtained the posterior distribution over 𝐰𝐰\mathbf{w}bold_w, the predictive distribution informed by the training data can now be written as

p(t𝐱,𝐭,𝐗)=p(t𝐱,𝐰)p(𝐰𝐭,𝐗)𝑑𝐰𝑝conditional𝑡𝐱𝐭𝐗𝑝conditional𝑡𝐱𝐰𝑝conditional𝐰𝐭𝐗differential-d𝐰p(t\mid\mathbf{x},\mathbf{t},\mathbf{X})=\int p(t\mid\mathbf{x},\mathbf{w})p(% \mathbf{w}\mid\mathbf{t},\mathbf{X})\,d\mathbf{w}italic_p ( italic_t ∣ bold_x , bold_t , bold_X ) = ∫ italic_p ( italic_t ∣ bold_x , bold_w ) italic_p ( bold_w ∣ bold_t , bold_X ) italic_d bold_w (20)

for an input 𝐱𝐱\mathbf{x}bold_x, where we once again omit the α𝛼\alphaitalic_α and β𝛽\betaitalic_β terms for readability. Noting that (20) is a marginal distribution and a convolution of two Gaussians, we can once again make use of Bayes theorem for Gaussian variables, resulting in the predictive distribution

p(t𝐱,𝐭,𝐗,α,β)=𝒩(t𝐦NT𝐱,β1+𝐱T𝐒N𝐱).𝑝conditional𝑡𝐱𝐭𝐗𝛼𝛽𝒩conditional𝑡superscriptsubscript𝐦𝑁𝑇𝐱superscript𝛽1superscript𝐱𝑇subscript𝐒𝑁𝐱p(t\mid\mathbf{x},\mathbf{t},\mathbf{X},\alpha,\beta)=\mathcal{N}(t\mid\mathbf% {m}_{N}^{T}\mathbf{x},\beta^{-1}+\mathbf{x}^{T}\mathbf{S}_{N}\mathbf{x}).italic_p ( italic_t ∣ bold_x , bold_t , bold_X , italic_α , italic_β ) = caligraphic_N ( italic_t ∣ bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_x , italic_β start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + bold_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_S start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT bold_x ) . (21)

The mean of this distribution is simply the mean of the posterior distribution multiplied by the input vector. As can be seen from the variance parameter of this equation, the predictive variance associated with an input 𝐱𝐱\mathbf{x}bold_x consists of a sum of two terms, which can be understood as follows. The first term expresses variance due to the noise in the training data. The second term describes the uncertainty associated with 𝐰𝐰\mathbf{w}bold_w, which varies according to the input 𝐱𝐱\mathbf{x}bold_x.

Given this predictive distribution, we can make predictions for new input values by calculating the conditional expectation,

𝔼[t𝐱,𝐭,𝐗]=tp(t𝐱,𝐭,𝐗)𝑑t.𝔼delimited-[]conditional𝑡𝐱𝐭𝐗𝑡𝑝conditional𝑡𝐱𝐭𝐗differential-d𝑡\mathbb{E}[t\mid\mathbf{x},\mathbf{t},\mathbf{X}]=\int t\,p(t\mid\mathbf{x},% \mathbf{t},\mathbf{X})\,dt.blackboard_E [ italic_t ∣ bold_x , bold_t , bold_X ] = ∫ italic_t italic_p ( italic_t ∣ bold_x , bold_t , bold_X ) italic_d italic_t . (22)

An alternative is to directly take the mean 𝐦Nsubscript𝐦𝑁\mathbf{m}_{N}bold_m start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT of the posterior as an estimate for 𝐰𝐰\mathbf{w}bold_w, which is used in some implementations of the Bayesian regression model scikit-learn-bayesian-ridge .

E.6.6 Hyperparameter selection

In a Bayesian framework, defining a prior distribution over one or both hyperparameters, also known as a hyperprior, can be used in finding hyperparameters using cross-validation. We can then marginalize all the parameters, which leads to a predictive distribution of the form.

p(t𝐱,𝐭,𝐗)=(t𝐱,𝐰)p(𝐰𝐭,𝐗,α,β)p(α,β𝐭,𝐗)𝑑𝐰𝑑α𝑑β.𝑝conditional𝑡𝐱𝐭𝐗triple-integralconditional𝑡𝐱𝐰𝑝conditional𝐰𝐭𝐗𝛼𝛽𝑝𝛼conditional𝛽𝐭𝐗differential-d𝐰differential-d𝛼differential-d𝛽p(t\mid\mathbf{x},\mathbf{t},\mathbf{X})=\iiint(t\mid\mathbf{x},\mathbf{w})p(% \mathbf{w}\mid\mathbf{t},\mathbf{X},\alpha,\beta)p(\alpha,\beta\mid\mathbf{t},% \mathbf{X})\,d\mathbf{w}\,d\alpha\,d\beta.italic_p ( italic_t ∣ bold_x , bold_t , bold_X ) = ∭ ( italic_t ∣ bold_x , bold_w ) italic_p ( bold_w ∣ bold_t , bold_X , italic_α , italic_β ) italic_p ( italic_α , italic_β ∣ bold_t , bold_X ) italic_d bold_w italic_d italic_α italic_d italic_β . (23)

Unfortunately, this expression is analytically intractable. Nevertheless, a framework for calculating an approximation, named evidence approximation bishop2006pattern , can compute estimates for α𝛼\alphaitalic_α and β𝛽\betaitalic_β. This framework is also referred to as type II maximum likelihood, which involves maximizing the marginal likelihood function p(𝐭α,β)𝑝conditional𝐭𝛼𝛽p(\mathbf{t}\mid\alpha,\beta)italic_p ( bold_t ∣ italic_α , italic_β ), where 𝐰𝐰\mathbf{w}bold_w has been integrated out. We will not go into depth into the evidence approximation framework. For more extensive treatment, the reader is referred to the statistical bookbishop2006pattern . It should be noted that this approach, where we include the hyperparameters as part of the training process by regarding them as random variables, does not necessarily lead to better estimates than those obtained with cross-validation. Nevertheless, automatically finding hyperparameters as part of the training process can be helpful in certain situations, for example, if cross-validation is not feasible.

The output of the date prediction model is a probability estimate for each 10-year bin in our timeline, along with error margins to estimate uncertainty. Within the Bayesian regression, we apply parameter constraints to restrict the uncertainties to non-negative values as they do not impact the probability estimation of our model, and the final results and interpretation. However, future research can explore the feasibility of an asymmetric error estimation above the x-axis. The choice of the 10-year bin is made empirically, and we keep the option of changing the bins to 5 or 15 for either thinner or thicker plots.

E.7 Data balancing

In addition to our original training data and the date prediction model described in the previous sections, we also employ two types of data balancing techniques to help reduce the time-axis bias in the training data. As can be seen in Figures 17 and 18, the training data is biased in the sense that there are many more high probabilities in the -200 to the -150 region. This creates much higher priors in that region. However, this bias is caused only by the samples that were chosen to be radiocarbon-dated and are not representative of the actual prior probability for the whole Dead Sea Scrolls collection. In order to make the predictions less dependent on the priors within the training data, two data-balancing strategies were implemented.

The first method concerns balancing using weights, where the output probabilities from the model are dampened or boosted based on the weights provided by the overall accumulated distribution seen in Figures 17 and 18.

The other data-balancing implementation was through augmentation, where underrepresented training data was compensatorily oversampled based on the overall accumulated distribution. The technical details for both implementations will be described in the following subsections.

Refer to caption
Figure 17: Distributions in the training data (orange) and total accumulated distribution of the C14 training data (blue), including all minor peaks of the (accepted) 2-sigma ranges.
Refer to caption
Figure 18: Distributions in the training data (orange) and total accumulated distribution of the C14 training data (blue), excluding minor peaks.

Please note that in addition to cross-validation and leave-one-out statistical tests (see Section E.9.1), we also check the sensitivity of the model with the inclusion and exclusion of minor peaks on the (accepted) 2σ𝜎\sigmaitalic_σ ranges. Figures 17 and 18 already show minimum changes over the overall probability distribution. This can be better visualized from Figure 19. The Euclidean distance calculated over the whole range sampled by five years is 0.1040.1040.1040.104 between the two (accepted) ranges (with and without minor peaks). The chi-square and Bhattacharyya distances are 0.1240.1240.1240.124 and 0.0440.0440.0440.044, respectively, showing no significant changes in the overall probability distribution. The predicted test results also remain unchanged. It is important to note, that incorporating the minor peaks did not lead to horizontal time shifts in existing high-probability peaks, at all (see Figure 19).

Refer to caption
Figure 19: Comparison between overall probability distributions of (accepted) ranges with the inclusion and exclusion of minor peaks, as mentioned in the procedure in Section E.6.2.

E.7.1 Balance using weights

Given probability pisubscript𝑝𝑖p_{i}italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT where i𝑖iitalic_i is a given year, threshold T𝑇Titalic_T, (binned) accumulated C14 probability cum_c14𝑐𝑢𝑚_𝑐14cum\_c14italic_c italic_u italic_m _ italic_c 14, maximum accumulated C14 probability M=max(cum_c14)𝑀𝑐𝑢𝑚_𝑐14M=\max(cum\_c14)italic_M = roman_max ( italic_c italic_u italic_m _ italic_c 14 ), and the number of summations that generated the accumulated probability in a bin ncum_c14isubscript𝑛𝑐𝑢𝑚_𝑐subscript14𝑖n_{cum\_c14_{i}}italic_n start_POSTSUBSCRIPT italic_c italic_u italic_m _ italic_c 14 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. The weighted probability of each bin wpisubscript𝑤subscript𝑝𝑖w_{p_{i}}italic_w start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT is calculated as:

wpi={picum_c14iif (pi>TM) and (ncum_c14i>2)piotherwisesubscript𝑤subscript𝑝𝑖casessubscript𝑝𝑖𝑐𝑢𝑚_𝑐subscript14𝑖if subscript𝑝𝑖𝑇𝑀 and subscript𝑛𝑐𝑢𝑚_𝑐subscript14𝑖2subscript𝑝𝑖otherwisew_{p_{i}}=\begin{cases}\frac{p_{i}}{cum\_c14_{i}}&\text{if }(p_{i}>T\cdot M)% \text{ and }(n_{cum\_c14_{i}}>2)\\ p_{i}&\text{otherwise}\end{cases}italic_w start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { start_ROW start_CELL divide start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_c italic_u italic_m _ italic_c 14 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG end_CELL start_CELL if ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_T ⋅ italic_M ) and ( italic_n start_POSTSUBSCRIPT italic_c italic_u italic_m _ italic_c 14 start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT > 2 ) end_CELL end_ROW start_ROW start_CELL italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL otherwise end_CELL end_ROW (24)

The weighted probabilities are then normalized to ensure that the scale of the weighted predictions is consistent with the original predictions. The process is described below.

Given the global maximum probability values in the original predictions, max_p𝑚𝑎𝑥_𝑝max\_pitalic_m italic_a italic_x _ italic_p, and in the weighted predictions, max_weighted_p𝑚𝑎𝑥_𝑤𝑒𝑖𝑔𝑡𝑒𝑑_𝑝max\_weighted\_pitalic_m italic_a italic_x _ italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d _ italic_p, the normalization process for each probability value wpisubscript𝑤subscript𝑝𝑖w_{p_{i}}italic_w start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT in the weighted predictions is described as follows:

Calculate the normalized probability wp_normisubscript𝑤𝑝_𝑛𝑜𝑟subscript𝑚𝑖w_{p\_norm_{i}}italic_w start_POSTSUBSCRIPT italic_p _ italic_n italic_o italic_r italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT:

wp_normi=wpimax_weighted_pmax_psubscript𝑤𝑝_𝑛𝑜𝑟subscript𝑚𝑖subscript𝑤subscript𝑝𝑖𝑚𝑎𝑥_𝑤𝑒𝑖𝑔𝑡𝑒𝑑_𝑝𝑚𝑎𝑥_𝑝w_{p\_norm_{i}}=\frac{w_{p_{i}}}{max\_weighted\_p}\cdot max\_pitalic_w start_POSTSUBSCRIPT italic_p _ italic_n italic_o italic_r italic_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = divide start_ARG italic_w start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG start_ARG italic_m italic_a italic_x _ italic_w italic_e italic_i italic_g italic_h italic_t italic_e italic_d _ italic_p end_ARG ⋅ italic_m italic_a italic_x _ italic_p (25)

E.8 Balance using augmentation

For augmentation, five scrolls were chosen to be duplicated in the training data in order to boost the underrepresented prior probabilities within the training data. Table 9 details the scrolls and number of fragments after the augmentation was applied, and Figure 20 shows the effect this had on the overall accumulated probabilities.

Table 9: Table detailing number of duplications after augmentation procedure
Scroll
Number of fragments
originally in the training data
Number of fragments
after augmentation
4Q2 2 12
4Q161 2 12
5_6Hev1b 1 6
11Q5 9 54
Mas1k 2 12
XHev_Se2 1 6
Refer to caption
Figure 20: Distributions in the training data (orange) and overall accumulated distribution (blue) of the C14 training data after augmentation.

E.9 Training options

For training, we create three main pools of data:

  • 14C dated manuscripts

  • 14C dated manuscripts with the addition of the old 14C from the 1990s

  • 14C dated manuscripts with augmentation

Within each of these training datasets, there are different possible subsets of the training data that are usable (all the following subsets include the 14C dated manuscripts):

  • 4Q52 can be included or excluded

  • Internally dated scrolls can be included or excluded

  • Maresha Ostracon can be included or excluded

E.9.1 Leave-one-out statistical test

In order to test whether the model predictions are robust, we performed a leave-one-out statistical test. The ‘leave-one-out’ (LOO) statistical test is a resampling technique used to evaluate the performance or robustness of a statistical model. In LOO, each sample in our training data is sequentially removed (left out), and the model is then trained on the remaining data points. The left-out observation is then used to evaluate the model’s performance or make predictions. This process is repeated for each observation in the training data. The goal of this statistical test is to get an indication of the variability of the model predictions by checking whether the model is overfitting the training data and the impact of outliers on the model’s performance.

LOO is commonly employed when the dataset is small (which is the case for our dataset) or when it is important to assess the model’s performance on each individual observation. The LOO test is a valuable tool for assessing the reliability and generalizability of a statistical model by iteratively evaluating its performance on subsets of the training data while systematically excluding each sample. Given a training set of size N𝑁Nitalic_N, we train N𝑁Nitalic_N models by leaving out one different data point for each model and training with the remaining N1𝑁1N-1italic_N - 1 data points. Then, on the test set, we make predictions using all N𝑁Nitalic_N models and overlap the resulting predictions obtained from each model. This gives a visual representation of the amount of variation between the predictions of the different models.

E.9.2 Gaussian of Gaussian

We obtain the probability and error margin for each 10-year bin from the date-prediction model. This gives us vertical Gaussian for each 10-year over the entire timeline. To convert them into a single Gaussian on the horizontal time axis, we perform a tool, dubbed ‘Gaussian of Gaussian’ on the predicted dates.

This program generates 1000100010001000 iterative attempts of randomly drawing a wave shape instance from the nbins𝑛𝑏𝑖𝑛𝑠n-binsitalic_n - italic_b italic_i italic_n italic_s distribution (over the entire timeline), assuming Gaussians (μ,σ)𝜇𝜎(\mu,\sigma)( italic_μ , italic_σ ) per bin. The max ypeak𝑦𝑝𝑒𝑎𝑘y-peakitalic_y - italic_p italic_e italic_a italic_k position of the wave shape can be detected along the xaxis𝑥𝑎𝑥𝑖𝑠x-axisitalic_x - italic_a italic_x italic_i italic_s: For our manuscript dating problem, x𝑥xitalic_x represents the year value. In this manner, it becomes possible to estimate the uncertainty of peak detection in the style-based OxCal approximation, and its effect on the date estimation. This addition to the Enoch method allows to obtain an estimate of date variability, similar to the output of OxCal itself. We explored whether smoothed distribution shapes were needed for this, but a detailed analysis fortunately revealed that the method could be kept simple: Smoothing of the shape often led to an xaxis𝑥𝑎𝑥𝑖𝑠x-axisitalic_x - italic_a italic_x italic_i italic_s shift and an increase of the x𝑥xitalic_x variability. The shape asymmetry of the (assumed) peak shape causes this time bias. Hence, we avoided any smoothing and used the raw, unfiltered generated histograms. The implicit assumption is that the ‘maximum’ co-occurs with a peak. Comparative plots for different information sources are obtained using the ‘Gaussian of Gaussian’ (see Appendix H).

Appendix F On the use of pre-trained deep learning methods for image-based dating

F.1 Considerations on the use of training deep learning neural networks on a problem with only 24 examples

Since the mathematical proof by Hornik Hornik1989 , it took some time but today, deep multilayer neural networks have excelled in many applications, especially since the advent of large data sets and the increase in computational power. However, as observed in the introduction of this article, the likelihood of success is low when training a deep network with too many parameters on a tiny data set. There is a serious risk of an ‘overfit’, i.e., a computed solution that appears to be performant on a training data set but fails to generalize (interpolate) properly when presented with unseen data Vapnik2000 . We have looked at a list of 44 modern deep-learning vision models that were published since 2010 and were cited minimally 100 times. Such models have, on average, 454 million weights (coefficients) which are computed from 715 million data points in training, on average, i.e., per single model. A ‘data point’ is a tuple of an image and its corresponding desired model output vector for classification, regression or generative task. The meta-analysis table is kindly provided by villalobos2022run ; epochMachineLearningData2022 . The most recent, transformer-based, models will even have billions of parameters and data points. It is evident that such large modern models can never be trained from scratch on a data set with just a few dozen, i.e., 24, radiocarbon-dated images as data points, for our problem.

An alternative approach would be the use of deep transfer learning TransferZhuang2021 ; ribani2019survey where an existing deep neural network, trained on a sufficiently large image data set Krizhevsky2017 , is fine-tuned on a smaller set of 14C-based dated images. In such a case, a hidden layer from a frozen pre-trained network is chosen as the shape-feature vector, and a new post-processing multilayer perceptron or dense network layer is trained to transform that feature vector to produce the output vector required by the actual task, for a given input image sample. We will mention five objectionable points to the use of deep transfer learning for the date prediction task.

Point 1. It is questionable whether currently common networks that are trained and designed for natural full-colour RGB photographic image classification will deliver a shape feature vector in their penultimate layer that is optimal for writer identification in bi-tonal manuscript images. Bitonal manuscript images have a flat-white background, and the interesting patterns are in the ink traces only. Such material is rarely present in generic photographic image collections. At the very least, there will be serious worries concerning efficiency, because about two-thirds of the connection weights are likely to be superfluous.

Point 2. Even if the colour-channel argument is dismissed, end users may argue that an opaque neural Krizhevsky2017 network or vision transformer ViTpaperDosivitskiy2020 method that is pretrained on non-representative image material (’photos of cats, dogs and urban scenes, etc.’) would not be acceptable for answering scholarly questions. To put this in comparison, current deep foundation models are not considered a good basis for the serious application in medical diagnostics in radiology yet and massive data would need to be collected in order to achieve such a status WilleminkFoundationRadiology2022 .

Point 3. Current deep-learning methods rely on images that are often very small, i.e., 224x224 or 512x512 pixel images. Only recently, with increased memory capacity in GPUs, images of 768x768 pixels can be used. This leads to many problems in the real-world application of deep learning, e.g., in a medical context Thambawita2021 . Our manuscript-image sizes are large and of variable aspect ratio, with widths of μw=3871subscript𝜇𝑤3871\mu_{w}=3871italic_μ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 3871 (±σw=1069plus-or-minussubscript𝜎𝑤1069\pm\sigma_{w}=1069± italic_σ start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT = 1069) pixels and heights of μh=3857subscript𝜇3857\mu_{h}=3857italic_μ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 3857 (±σh=740plus-or-minussubscript𝜎740\pm\sigma_{h}=740± italic_σ start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT = 740) pixels. On top of the other restrictions mentioned here, using current deep learning would require a downscaling of the high-quality manuscript images with a factor of 5 to 7, with considerable loss of information. Whereas recent vision transformers ViTpaperDosivitskiy2020 are better suited to deal with large images, they are based on extraneous very large image and photographic collections (cf. Point 2).

Point 4. Alternatively, tiling Haja2021 would unnecessarily complicate the analyses because of the likely imbalance of character content between the tiles and damage to the original character appearance at the tile margins. In spite of the success and allure of deep learning, dealing with large, variable-sized images has not been fundamentally solved. This puts a limit on their applicability in several scientific domains. In microscopy Campanella2019 and astronomy ivezic2019lsst , multi-gigapixel, terabyte images are already common. As in our case, current convolutional neural networks, per se, cannot process an original whole image in its unscaled entirety without information loss, e.g., for a prediction task.

Point 5. Regardless of the methodological problems in the face of sparse data, an end-to-end deep learning approach, i.e., transforming image pixels into a date prediction directly, has the disadvantage of limited explainability. If dedicated features can be used that are explainable and a regression model can be trained that requires limited data, such a modular approach has a distinct advantage. Still, it is worthwhile to explore a deep-learned variant for date prediction, as more (radiocarbon-)dated samples will become available.

F.2 An attempt in using state-of-the-art deep-learning methods, PNASNet

However, in order to empirically illustrate the problems with current deep learning, even when used in a transfer-learning setup, we have used a common foundational model (PNASNet) and used its output to estimate a date probability distribution. Using a pretrained PNASNet PNASnet5-Liu2018 , we rescaled each high-resolution manuscript image to the ’passport-photo’ size, which is customary for these models, i.e., 331×331331331331\times 331331 × 331 pixels. We then used the penultimate layer (Nhidden=4320subscript𝑁𝑖𝑑𝑑𝑒𝑛4320N_{hidden}=4320italic_N start_POSTSUBSCRIPT italic_h italic_i italic_d italic_d italic_e italic_n end_POSTSUBSCRIPT = 4320) of this existing network as a pretrained feature vector for a date-estimation output layer with an ’OxCal’ format probability target function. Figures 21 and 23 show the evolution of the loss curve in a typical training session. Although there is some variance present, the model seems to converge more or less after the presentation of 32,000 batches. However, when looking at the validation curve (Figure 22 and 24), we can see that the loss remains highly irregular. The most likely reason for this behavior is that, in spite of the frozen pre-trained mass of PNASNet, the number of fluid weights that need to be estimated for the transfer task is still in the order of 457k (Nhidden=4320subscript𝑁𝑖𝑑𝑑𝑒𝑛4320N_{hidden}=4320italic_N start_POSTSUBSCRIPT italic_h italic_i italic_d italic_d italic_e italic_n end_POSTSUBSCRIPT = 4320 x NOxCal=106subscript𝑁𝑂𝑥𝐶𝑎𝑙106N_{OxCal}=106italic_N start_POSTSUBSCRIPT italic_O italic_x italic_C italic_a italic_l end_POSTSUBSCRIPT = 106). This number is irresponsibly high, in relation to the small number of images in the data set. The subpar loss level in comparison to training, the horizontality and irregularity of the validation loss curves gives us very strong support for the decision to avoid this pathway. At the very least, it can be concluded that considerable additional research would be needed to improve the DL-transfer performance, from here. Given the particular conditions of this study, we have avoided the use of deep learning for the regression task, waiting for the data sparsity to be solved.

Refer to caption
\phantomsubcaption
Refer to caption
\phantomsubcaption
Figure 21: PNASNet training loss per epoch for 4-fold cross-validation (log-scale on the right side).
Refer to caption
\phantomsubcaption
Refer to caption
\phantomsubcaption
Figure 22: PNASNet validation loss per epoch for 4-fold cross-validation (log-scale on the right side).
Refer to caption
\phantomsubcaption
Refer to caption
\phantomsubcaption
Figure 23: PNASNet training loss per batch for 4-fold cross-validation (log-scale on the right side).
Refer to caption
\phantomsubcaption
Refer to caption
\phantomsubcaption
Figure 24: PNASNet validation loss per batch for 4-fold cross-validation. Validation is performed after every fourth epoch, so the curve is aligned with Figure  23 (log-scale on the right side).

Appendix G Enoch’s date predictions for 135 previously undated manuscripts

Before we discuss the results of the palaeographic post-hoc evaluation of the 135 unseen samples (Section G.4), we elaborate on the physical and image quality of the data, as well as explain how to read Enoch’s prediction plots and elaborate on how Enoch differs from traditional palaeography.

G.1 On the physical and image quality of the data

In order to appreciate how the Enoch model works, it should be noted what challenges the data pose, physically and image-wise.

As we have mentioned before, the Dead Sea Scrolls are extremely delicate material (see Section 1 in the main article and Section B.2 in Appendix B). In a few cases, the physical evidence consists of largely intact bookrolls of several meters in length, such as the Great Isaiah Scroll (see Section 5 in the main article). But in most cases, what were once large and small bookrolls are now only extant as fragmentary, deteriorated remains of various sizes and shapes. This means that the Enoch model has to deal with very diverse material remains that are available as digital images (see Section 2.1 in the main article).

The physical state of the data affects the image quality in various manners. For example, papyrus fragments often have damage patterns that affect the ink remains of the letters differently than fragments of animal skin remains do. Or, some manuscripts are represented by large, relatively well-preserved fragments, whereas others only have one small, badly damaged fragment left. Our binarized images for Enoch sometimes combine different fragments of a manuscript that are available on separate image plates of the IAA (e.g., 4Q86). Thus, the data for Enoch consists of diverse image types. To elaborate on what was briefly mentioned in Section 6.2 in the main article, image preparation treatment is important to further improve Enoch’s prediction results. The model does not change its prediction with the same set of training and testing data, but predictions can change (read “improve”) because of better-cleaned images.

This also means that two or more predictions for the same manuscript can have different results because the underlying data consists of diverse image types that warrant a diverse spread in the plots. Unlike MPS He2016 (or other historical manuscripts), the Dead Sea Scrolls images are all different in shape, orientation, number of characters, ink thickness, etc. Considering 4Q57, for example, there are nine “curves” (plots), which the palaeographers used in the first evaluation, because there are nine images. It should be noted that these are nine different individual images, due to improved/updated preprocessing performed over time, of 4Q57, often of different fragments. As the model receives different images (features), it produces different curves (plots). Each image represents a different set of evidence (from character shapes/features) for each bin. If all nine plots of 4Q57 were exactly the same, then that would be problematic because each image fragment is different even though they are from the same manuscript.

Refer to caption
Figure 25: Enoch’s prediction for 4Q544 as an example: Four fragment images from IAA plate 431 are on the left side. These fragment images are preprocessed using multispectral image fusion, neural network-based binarization (BiNet), noise reduction, and alignment correction to obtain the image on the top right. The features extracted from the preprocessed image are passed through the trained Enoch model to produce the prediction on the bottom right. This is a simple, unimodal prediction. The green box indicates the probable date range with high mean values (in this case, 200–100 BCE), and the red box indicates no mean with high uncertainty areas.

G.2 How to read a prediction plot

Each prediction plot produced by Enoch presents the output of the Bayesian regression model and pertains to an individual manuscript test sample. In the plot, the X-axis delineates the chronological timeline, partitioned into 10-year bins, while the Y-axis conveys probability values in the form of means with error bars. This representation encapsulates the model’s endeavour to infer the probable dating of the manuscripts, each within a 10-year interval, across a temporal expanse from 310 BCE to 200 CE years.

Within each 10-year bin, a pair of values is obtained: the mean and its corresponding error usually expressed as standard deviation. The mean is a point estimate, indicating the central tendency of the predicted manuscript dates for the given bin, thereby proposing an approximation of the most plausible date within the specific timeframe. The associated error, or standard deviation, serves as a critical metric showing the magnitude of variability inherent in the predicted dates and concurrently serves as a measure of uncertainty. An ‘ideal’ date prediction has a high probability and a low error.

The plot looks like a series of bars, like a histogram. By looking at these bars, we look for any patterns in the dates over time and gauge how confident or uncertain we are about these estimates. So, for individual plots, we look at the level of the mean value and the size of the error bars around it, to decide the most probable date or date range for that individual manuscript.

Refer to caption
Figure 26: Enoch’s prediction for 4Q58 as an example: Brill scan 431 is on the left side. The top right fragment (marked in blue) is preprocessed using neural network-based binarization (BiNet), noise reduction, and alignment correction to obtain the image on the top right. The features extracted from the preprocessed image are passed through the trained Enoch model to produce the prediction on the bottom right. This is a bimodal prediction. In this prediction plot, the blue and green boxes both indicate the probable date ranges with high mean values, and the red boxes indicate no significant mean with high uncertainty areas. Now, if the reader needs to choose one of the ranges from the blue or the green, then the green is the more probable range (in this case, 30– 120 CE) because of smaller error bars than the blue range.

The discrete prediction bars can be mathematically smoothed into continuous curves, yielding Gaussian Mixture Models (GMMs) as a representation. This transformation allows for a more nuanced and probabilistic portrayal of the underlying distribution of predicted manuscript dates. If the smoothed prediction depicts a unimodal distribution, choosing the probable date range is easy (see Figure 25). However, it requires more attention when the prediction is bimodal. The reader then needs to pay more attention to the error bars and the means for each 10-year bin (see Figure 26). This cannot be easily solved with an algorithm: A high probability value is ‘good’, but not if it is accompanied by a large uncertainty. In that case, the choice of a stable estimate with a slightly lower mean probability may be advisable.

Refer to caption
Figure 27: 4Q27 as a training sample for Enoch’s date prediction (4Q27 has six images in training and two in testing. This is one of the six training images): Fragment 1 from IAA plate 1082 is on the left side. This fragment image is preprocessed using multispectral image fusion, neural network-based binarization (BiNet), noise reduction, and alignment correction to obtain the image in the middle. On the right is the OxCal data from radiocarbon dating for 4Q27, which is the target output for the training of Enoch. The green area indicates the accepted part of the 2σ𝜎\sigmaitalic_σ calibrated bimodal data.

G.3 On shared characteristics and finding matches elsewhere

In order to appreciate how the Enoch model differs from traditional palaeographic approaches, we elaborate on what was briefly mentioned in Section 6.2 in the main article, namely that Enoch emphasizes shared characteristics and similarity matching, whereas traditional palaeography focuses on dissimilarities that are assumed indicative for style development.

Enoch’s Bayesian regression model performs the quantitative analysis of textural and allographic feature vectors. These feature vectors encapsulate various handwriting characteristics, offering a systematic representation for predictive modeling. Unlike the traditional approach of human palaeographers, who often seek dissimilarities, this model employs a similarity-based strategy. It strives to uncover patterns and relationships within the feature space by quantifying the resemblance of test images to the training set with known 14C date distribution. Leveraging a Bayesian framework, the model offers a probabilistic and data-driven means of attributing dates to unseen manuscripts. It thus complements the qualitative expertise of human palaeographers with a quantitative approach that can reveal subtle patterns and associations within the data. 4Q27 provides an excellent example of this approach, with six images in training Enoch and two in test prediction. Figure 27 shows one of the training images, and Figure 28 shows the prediction plot for one of the two test images.

Another example is that of 4Q319 (see Section 5 and Figure 2 in the main article). Here, the AI experts’ preferred reading of the prediction plot (see Table LABEL:tab:expert-AI-135undated) is for the younger range, against the so-called ‘biased range’ of 200–100 BCE. The occurrence of ‘young’ peaks can be seen as shape information suggesting an alternative to the bias (that is present for the 200-100 BCE range). The overall shape on the left-hand side is likely due to the OxCal-based training. However, in spite of the lesser occurrence of younger fragments in the training data, the right-hand part shows that from the style-based analysis a younger date is possible due to the stable results (with small error bars).

Refer to caption
Figure 28: Enoch’s prediction for 4Q27 (4Q27 has six images in training and two in testing; this is one of the two test images): Fragment 6 from IAA plate 1080 is on the left side. This fragment image is preprocessed using multispectral image fusion, neural network-based binarization (BiNet), noise reduction, and alignment correction to obtain the test image in the middle. The features extracted from the preprocessed image are passed through the trained Enoch model to produce the prediction on the right. Here, the orange area depicts a similar shape to the target OxCal for 4Q27 in training (see Figure 27), as expected from a regression model. In addition, Enoch also looks for similarity over all the training samples and finds an additional probability range (in green). Due to the low error values, i.e., smaller error bars, the green area is more probable (here, 30 BCE to 20 CE) than the orange area in this prediction plot.

G.4 Palaeographic post-hoc evaluation

The 135 unseen samples were chosen for various palaeographic and historical reasons, such as a diachronic cross-section of a biblical book (Psalms), manuscripts that share the same writing style, or for no particular reason at all.

Second, for the evaluation, we took the best image for each manuscript in terms of image quality: better-cleaned images give better results (see Section G.1). The AI experts among the article’s authors, M.D. and L.S., made visual evaluations of the scans in order to ensure that data is of sufficient quality with a sufficient number of characters in the used sample. The list of which specific image for each manuscript sample was used in the evaluation can be seen in Table LABEL:tab:135-images. Still, there remain very poor and difficult images for a number of scrolls to work with but we kept them in the test and did not want to tweak the data. So, in cases where the best available image is still a very poor image, we worked with that (see some examples as illustrations of this very poor quality in Table LABEL:tab:bad25-images). However, we kept all the images so the reader can see all the implications and improvements that can be obtained from careful preprocessing of the images. In Zenodo data repository (https://doi.org/10.5281/zenodo.10629480), the images are organized in three different directories: the first one with all 359 images for the 135 manuscripts, the second one with the selected 135 images, and the final one with 25 images to illustrate the poor quality of images.

Third, we did different balancing tests (see Section E.7 in Appendix E) and so produced different prediction plots. Yet, in the final evaluation we only use the balanced 0.05 plots, which we also indicate in Zenodo (https://doi.org/10.5281/zenodo.10629480), in the description of Organization of the data. All other prediction plots are also available for readers to see the different balancing tests that we have done (see inside Enoch-predictions.tar.gz file in the Zenodo data repository).

Fourth, the AI experts performed a blind reading of the balanced 0.05 prediction plots. They had no knowledge of the manuscript dates and only read the prediction plots, giving estimated minimum and maximum ranges (see Table LABEL:tab:expert-AI-135undated; for more details on how to read a plot, see Sections G.2 and G.3). These estimated minimum and maximum date ranges were then passed on to the palaeographers to assess the outcomes as “realistic” or “unrealistic”.

It is even possible to provide an algorithm to read the plots but the design philosophy of our date prediction model is based on the assumption that it is better to stay close to the known systematics in dealing with OxCal curves with date probabilities than to reside to an ‘oracle’ approach where an algorithm proposes a hard date range. The user can inspect the output of our Enoch model in a similar way as the OxCal curve analysis would ensue.

Fifth, in their qualitative post-hoc evaluation, the palaeography experts among the article’s authors, M.P. and E.T., regarded a date prediction as “realistic” if a prediction corresponds (partly or wholly) with their palaeographic estimates, the basis for which was already explained in detail in Appendix A and Section D.1, and “unrealistic” when it does not. In other words, if there is an overlap between our palaeographic estimates and the machine-learning-based dating, even if the overlap is minimal, we regard the model’s date prediction as “realistic”, and “unrealistic” when there is no overlap, i.e., when it is older or younger (“too old” or “too young”). We provide our palaeographic date estimates for each of the 135 manuscripts (see Table LABEL:tab:expert-AI-135undated), with the general principle in mind that we work with a 50-years range and allow for +/- 25 years on either side. Sometimes, e.g., in the case of quite idiosyncratic handwriting, we allow for an even broader range of 100 years. It should also be noted that if the data are of poor quality, especially if only little material is left and therefore few characters to inspect, then palaeographic estimations are more difficult to make. In other words, the palaeographic dates are not hard date ranges, but expert estimates. Still, for the evaluation we used the 50-years range in a strict sense for reasons of clarity, so that if there was only a 5- or 10-year gap we deemed the prediction as “too old” or “too young”.

Summarized, our post-hoc palaeographic assessment is based on the following considerations:

  1. 1.

    In line with Cross and all other palaeographers, we make a distinction between Hasmonaean- and Herodian-style writing;

  2. 2.

    Our palaeographic date estimates of these styles vis-à-vis one another are informed by the traditional view of the Hasmonaean script as being older than the Herodian script. Our 14C results confirm for most manuscripts the basic distinction between Hasmonaean-type manuscripts that are older, and Herodian-style manuscripts that are younger. Yet, for Herodian-type script, our 14C results indicate that Herodian script was present earlier than previously thought. Our evaluation of the implications of the 14C data for Hasmonaean-type script provides evidence for dates in the second century BCE and also allows for the late third century BCE, and for Herodian-type script to be already in existence earlier side by side with Hasmonaean-type script in the second century BCE (see Section D.1.3). Thus, we took into account the general tendency in the 14C results that date both individual manuscripts and the emergence of the ‘Hasmonaean’ and ‘Herodian’ scripts about 50–75 years earlier than according to traditional palaeography;

  3. 3.

    Linear typological developments within both Hasmonaean- and Herodian-type script have been stated by scholars, rather than substantiated with external date-bearing evidence. This makes traditional assumptions about “within script” linear typological development problematic, in our view even more so of ‘Herodian’ than of ‘Hasmonaean’. Especially for script generally seen as Late Herodian, we would not exclude a date around the turn of the era or somewhat earlier. We reckon with the possibility of a longevity of script types longer than traditionally assumed. Cross assumed a rapid development of the script from the Hasmonaean period onward. He suggested chronological ranges of 50 years, and sometimes even shorter ranges of 25–50 years for typological developments, but these assumptions remain unsubstantiated.

It should be noted that other researchers can follow our evaluation by taking the range estimates (see Table LABEL:tab:expert-AI-135undated) and/or look at the prediction plots (from Zenodo repository (https://doi.org/10.5281/zenodo.8168210)), then consider the specific images of the manuscripts in question in the IAA’s Leon Levy Dead Sea Scrolls Digital Library collection dssllweb and/or consider our binarized images (from the previously mentioned Zenodo repository), and take into account our considerations (see Appendix A and Section D.1). Or, instead of following our reasoning for a “realistic” or “unrealistic” assessment, they can make their own palaeographic post-hoc assessment, and justify their reasoning.

Also, please note that there are different probability values for each 10-year bin’s prediction within these minimum and maximum ranges. So, AI experts’ minimum and maximum values limit a probable range, but the range is not the final estimated date. One needs to read the probability plots to better estimate within the minimum-maximum range. This means that the range can sometimes be wide, but by reading the probability values along with the uncertainty estimates (or error bars), a reader can even narrow down to a more precise date range if they wish to do so.

The blind range estimation by the AI experts shows the distributions of year ranges in Table 10.

Table 10: Spread estimation (blind-test) by the AI expert
Range Count Prozentualer Anteil
280 years 2 1.48%
240 years 1 0.74%
210 years 4 2.96%
190 years 2 1.48%
170 years 6 4.44%
160 years 5 3.70%
150 years 5 3.70%
140 years 4 2.96%
130 years 8 5.93%
120 years 5 3.70%
110 years 8 5.93%
100 years 7 5.19%
90 years 18 13.33%
80 years 9 6.67%
70 years 4 2.96%
60 years 11 8.15%
50 years 22 16.30%
40 years 8 5.93%
30 years 5 3.70%
20 years 1 0.74%
Total: 135 100.00%

Some year ranges are so wide that the date prediction loses its effect of offering a limited number of quantified probability options within the time period under consideration. Fortunately, the instance of wide prediction ranges is limited within the 135 test samples. The definition for “wide range” is informed by the accepted 2σ𝜎\sigmaitalic_σ calibrated ranges which are the training data for the Enoch model and are on average 135 years, including the so-called minor peaks, or 110 years excluding the so-called minor peaks (see Figures 17 and 18, and Table LABEL:tab:2-sigma-accepted). Twenty-nine of the 135 test samples (21%) have a date range of more than 130 years, whereas 42 of the 135 test samples (31%) have a date range of more than 110 years.

In most cases, the date prediction range is well below 135 or 110 years, often only ca. 50 years (16%), which has the highest count of all the ranges (see Table 10).

The current average year value is 69.35 years, excluding wide ranges above 110 years, and 76.32 years, excluding wide ranges above 135 years. If one were to indiscriminately include all ranges, then the current average year value would be 98.76. The median value is 90 years. It should be noted that these average year values, as well as the median value, can change if, in the future, more manuscripts are tested. Also, if the image quality is further improved, these numbers can also be affected and improved (see below).

Most date ranges are indeed below or up to 90 years (78 out of 135 test samples). It should be noted that the possibility was claimed for the traditional palaeographic model to be able to fix a characteristic bookhand or the copying of a manuscript within 50 years or even 25–50 years, but that this was not substantiated with external date-bearing evidence (see Section D.1). Now, our Enoch model can produce prediction ranges of 50 years that are empirically based on physical evidence derived from 14C and geometric evidence from shape-based analysis. Enoch outperforms the 14C results: Enoch’s predictions are even narrower than the 14C date ranges in the time period 300–50 BCE, provide a more fine-grained distribution (as mentioned in Section 4 in the main article).

As can be seen in Table 2 in the main article, 107 (79%) of the 135 undated manuscripts were judged to have obtained a realistic date prediction. Of course, the wider the range of years of prediction plots are, the more manuscripts show an overlap between our palaeographic estimates and the machine-learning-based dating. If we disregard the 42 date predictions with a spread wider than 110 years, then the percentage of realistic predictions drops to 50% (68 out of 135) or to 73% (68 out of 93). Thus, even with a stricter selection rule, only allowing the narrow-range estimates, still a decent percentage of palaeographically realistic evaluations can be obtained from the harvest of undated material. Moreover, if we would also take into account the image quality of the samples and choose instead not to use data of very poor quality then the performance of the Enoch model becomes even more impressive. Twenty-five images are of poor quality (see Table LABEL:tab:bad25), leaving 110 images and samples in the test, of which 91% have a realistic prediction. Again, from these 110 images, if we ignore the 36 date predictions with a spread wider than 110 years, then the percentage of realistic predictions amounts to 61% out of 110, or 89% out of 74.

In the post-hoc evaluation, the palaeographers refrained from a decision in 4 cases (“see comment 1–4” in Table LABEL:tab:expert-AI-135undated). The comments are as follows:

  1. 1.

    4Q73: we consider this test sample a borderline as we would expect an older dating, ca. 100 BCE or ca. 75 BCE, in view of our considerations, especially the 14C results for Hasmonaean manuscripts (Section G.4). The traditional palaeographic date estimation, middle of the first century BCE DJD15 , comes close to Enoch’s date prediction;

  2. 2.

    4Q379: we consider the semicursive script in this manuscript difficult to date. Some semicursive manuscripts are easier to date, but this one is difficult, also according to the traditional palaeographic model there is too little to go on. Therefore, we refrain from a decision; 4Q379 could be around 100 BCE and then the prediction is realistic, but it could also be later. Cf. also DJD22 : the general indication “Hasmonaean semicursive” (263) indicates the difficulty in dating;

  3. 3.

    4Q398: this is again a manuscript in semicursive script, and difficult to date. Other palaeography experts gave the following dates: Puech, second quarter of the first century BCE Puech2015b ; Yardeni, 50–1 BCE DJD10 . The prediction plot would be compatible with the latter date;

  4. 4.

    4Q522: typologically, we would characterize the script as late Hasmonaean, but the date of the prediction model seems slightly too old to us. We would expect a slightly younger date, ca. 100–75 BCE, in view of our considerations (Section G.4). The traditional palaeographic date estimate by Puech is late Hasmonaean, second third of the first century BCE Puech1998B .

Two observations on the basis of these comments:

  1. 1.

    Outside nice formal bookhands, ordering Dead Sea Scrolls manuscripts according to typology can be difficult for palaeographers, especially for the semicursive script. In addition to the physical and image quality of the data (see Section G.1), script diversity can also pose a challenge for the Enoch model. More specifically, Enoch can handle formal and semiformal scripts well in predicting their age range, but manuscripts written in semicursive script are more difficult to date at the current stage. This can be explained by the fact that Enoch was not yet trained enough on this (only two 14C samples, 4Q114 and 4Q255/4Q433a, are in semicursive script);

  2. 2.

    The range 100–50 BCE is underrepresented in Enoch’s date predictions. This can be explained by the distribution of 14C samples across the time line, having little evidence securely fixed for this part of the time line: 4Q201, 4Q255/4Q433a, 4Q27, and 4Q2 cover (part of) the range 100–50 BCE but all of them extend beyond the range as well. Roughly speaking, Enoch predicts Hasmonaean-type manuscripts before 100 BCE and Herodian-type manuscripts after 50 BCE. Still, it should be noted that the range 100–50 BCE is not completely left devoid of Enoch’s prediction plots, as the plots for 4Q185, 4Q554, and 11Q13 show, albeit with a wide range of 150 years for 4Q554.

From the machine-learning perspective, these problems can be sorted out as more samples from critical time periods are added to the training data.

G.5 6 July 2021 test

Earlier in the project, a test was conducted on 6 July 2021. The test consisted of giving manuscripts with unseen 14C results to the AI experts to see whether Enoch would give date prediction estimates that match the 14C results. However, at the start of the test, it was unknown to the AI experts that the samples were chosen because of 14C results being available for them afterward.

The 14C results were taken from the 1990s 14C dating of the Dead Sea Scrolls Bonani1992 ; Jull1995 . The assumption was that the manuscripts chosen were not contaminated with castor oil as these manuscripts were not handled by the original team of editors in the 1950s Doudna1998 ; carmi200214c ; rasmussen2009effects . This applies to 1QIsaa, 1QpHab, 1QapGen, 1QS, 1QHa, 11Q19, Mas1l.

Two more manuscripts were added for other reasons. 4Q53 was added because scholars assume that it was written by the same scribe as 1QS. 4Q319 was added because it is actually the same manuscript as 4Q259 Hempel2020 , which was subjected to 14C dating by our own project.

The test was filmed. The film captures the whole process that was conducted in one go. The film can be accessed here: https://doi.org/10.5281/zenodo.8167946

Table 11: AI experts’ (blind) range estimation (est_min and est_max) and palaeography experts’ evaluation (pal_eval) with year range (pal_min and pal_max).
Q-num est_min est_max pal_eval pal_min pal_max
1QapGen -50 -1 realistic -50 -1
1QHa -140 -1 realistic -50 -1
1QIsaa -200 -100 realistic -175 -125
1QpHab -40 10 realistic -25 25
1QS -190 -100 realistic -150 -100
2Q3 -40 130 realistic -50 -1
2Q14 -310 -100 too_old -75 -25
2Q24 -40 10 realistic -50 -1
3Q6 -10 120 realistic -25 25
4Q13 -40 20 realistic -50 -1
4Q27 -30 20 realistic -75 -25
4Q28 30 120 too_young -200 -150
4Q38 -30 10 realistic -50 -1
4Q38a -190 -60 too_old -50 -1
4Q53 -40 10 too_young -150 -100
4Q57 -80 120 realistic -1 50
4Q58 30 120 realistic -1 50
4Q73 -40 10 see_comment 1 -100 -50
4Q76 -190 -150 realistic -175 -125
4Q83 -210 -150 realistic -175 -125
4Q84 -200 -50 realistic -50 -1
4Q85 -140 70 realistic 25 75
4Q86 30 120 too_young -75 -25
4Q87 -40 90 realistic -25 25
4Q88 -190 20 realistic -100 -50
4Q89 -50 120 realistic 25 75
4Q90 -30 -10 realistic -75 -25
4Q91 -40 10 realistic 1 50
4Q92 -190 -100 realistic -100 -50
4Q93 -50 30 realistic -75 -25
4Q94 -30 20 realistic -50 -1
4Q95 -30 90 realistic -50 -1
4Q96 -30 10 realistic -50 -1
4Q97 -50 10 realistic -50 -1
4Q98 -190 -150 too_old -50 -1
4Q98a -40 -10 realistic -75 -25
4Q98b -10 120 realistic 25 75
4Q98c 10 120 realistic 25 75
4Q98f -30 60 too_young -100 -50
4Q98g -200 -150 realistic -175 -125
4Q109 -300 -240 realistic -250 -150
4Q160 -200 -110 realistic -175 -125
4Q161 -40 80 realistic -50 -1
4Q163 -190 -1 realistic -125 -75
4Q166 -40 70 realistic -50 -1
4Q167 -50 120 realistic -50 -1
4Q171 -50 70 realistic -50 -1
4Q175 -150 -1 realistic -150 -100
4Q184 -40 80 realistic -50 -1
4Q185 -190 -80 realistic -100 -50
4Q196 -190 -110 too_old -100 -50
4Q203 -170 -60 too_old -50 -1
4Q212 -100 30 realistic -75 -25
4Q215 -30 70 realistic -50 -1
4Q215a -30 20 realistic -50 -1
4Q216 -190 -110 too_old -75 -25
4Q227 -40 30 realistic -50 -1
4Q252 -40 20 realistic -50 -1
4Q256 -40 10 realistic -75 -25
4Q258 -50 20 realistic -50 -1
4Q266 -190 -100 realistic -100 -50
4Q267 -170 20 realistic -50 -1
4Q271 -40 20 realistic -75 -25
4Q272 -50 10 realistic -75 -25
4Q274 -170 -40 realistic -75 -25
4Q276 -30 60 realistic -75 -25
4Q277 -30 110 realistic -75 -25
4Q284a -160 80 realistic -50 -1
4Q301 -40 -1 realistic -75 -25
4Q303 -30 110 realistic -50 -1
4Q319 -40 -1 realistic -125 -25
4Q325 -40 40 realistic -50 -1
4Q373 -300 -240 too_old -100 -50
4Q375 -30 20 realistic -50 -1
4Q379 -190 -100 see_comment 2 -125 -75
4Q390 -30 70 realistic -75 -25
4Q391 -200 -110 realistic -125 -75
4Q394 -200 -110 too_old -100 -1
4Q397 -40 60 realistic -50 -1
4Q398 -30 10 see_comment 3 -75 -25
4Q409 -30 20 realistic -50 -1
4Q410 -40 80 realistic -50 -1
4Q422 -190 -110 realistic -150 -100
4Q426 -190 -50 realistic -100 -50
4Q431 -30 120 realistic -50 -1
4Q432 -160 120 realistic -50 -1
4Q434 -30 20 realistic -75 -25
4Q436 -30 20 realistic -50 -1
4Q437 -40 20 realistic -50 -1
4Q439 -40 70 realistic -25 25
4Q442 -10 120 realistic -50 -1
4Q448 -30 10 too_young -100 -50
4Q457 -40 -10 too_young -150 -100
4Q471a -160 20 realistic -50 -1
4Q473 -30 20 realistic -50 -1
4Q474 -30 20 realistic -50 -1
4Q475 -30 120 realistic -50 -1
4Q476 -30 120 realistic -50 -1
4Q492 30 120 too_young -75 -25
4Q493 -40 70 realistic -75 -25
4Q494 -30 20 realistic -50 -1
4Q501 -30 20 realistic -75 -25
4Q508 -190 -110 too_old -50 -1
4Q511 -30 80 realistic -50 -1
4Q522 -200 -110 see_comment 4 -100 -50
4Q525 -50 10 realistic -50 -1
4Q530 -30 10 too_young -125 -75
4Q531 -30 20 realistic -50 -1
4Q540 -40 30 too_young -150 -100
4Q542 -50 20 too_young -125 -75
4Q544 -200 -110 realistic -150 -100
4Q545 -200 -100 realistic -125 -75
4Q547 -200 -120 realistic -125 -75
4Q554 -200 -50 realistic -75 -25
4Q557 -200 -120 realistic -150 -100
4Q577 -50 10 too_young -125 -75
5-6Hev1b -40 120 realistic 50 100
5-6Hev45 -40 120 too_old 134 134
5Q5 -200 -150 too_old -25 25
6Q18 -160 120 realistic -50 -1
11Q5 10 120 realistic 25 75
11Q6 30 120 realistic -1 50
11Q7 30 120 realistic -1 50
11Q8 -30 20 too_old 25 75
11Q13 -80 20 realistic -50 -1
11Q14 -40 120 realistic -1 50
11Q18 -40 120 realistic -50 -1
11Q19 -40 -1 realistic -25 25
11Q20 -90 120 realistic -25 25
Mas1e -50 30 realistic -1 50
Mas1f -200 -100 too_old 25 75
MasJosh -30 20 realistic -50 -1
Mur88 -50 120 realistic 25 75
Nash -200 -110 realistic -175 -125
Sdeir1 -40 120 realistic 75 125
Table 12: List of images for each manuscript sample used in the post-hoc evaluation
Q-number Image-name Q-number Image-name
1QapGen 1QapGen_4_crpcln 4Q301 4Q301_2_processed
1QHa 1QHa_QIrug-1668_cln…Lotte 4Q303 4Q303_processed
1QIsaa 1QIsaa_col02_cleaned 4Q319 4Q319_1_crpcln
1QpHab 1QpHab_Brill2307_cleaned_MD 4Q325 4Q325_processed
1QS 1Qs_QIrug-2153_cln…Lotte 4Q373 4Q373_processed
2Q3 2Q3_processed 4Q375 4Q375_processed
2Q14 2Q14_1_processed 4Q379 4Q379_1_processed
2Q24 2Q24_processed 4Q390 4Q390_processed
3Q6 3Q6_processed 4Q391 4Q391_4_processed
4Q13 4Q13_processed 4Q394 4Q394_cleaned
4Q27 4Q27_processed_MDnew 4Q397 4Q397_1_processed
4Q28 4Q28_256-1_cleaned 4Q398 4Q398_1_processed
4Q38 4Q38_processed 4Q409 4Q409_processed
4Q38a 4Q38a_processed 4Q410 4Q410_processed
4Q53 4Q53_405_part2_cleaned 4Q422 4Q422_2_processed
4Q57 4Q57_363_part1_cleaned 4Q426 4Q426_processed
4Q58 4Q58_Brill0458_cleaned 4Q431 4Q431_processed
4Q73 4Q73_1112-1_cleaned 4Q432 4Q432_processed
4Q76 4Q76_cleaned 4Q434 4Q434_processed
4Q83 4Q83_1148_part1_cleaned 4Q436 4Q436_processed
4Q84 4Q84_3_processed 4Q437 4Q437_processed
4Q85 4Q85_2_processed 4Q439 4Q439_processed
4Q86 4Q86_processed 4Q442 4Q442_processed
4Q87 4Q87_processed 4Q448 4Q448_processed
4Q88 4Q88_3_processed 4Q457 4Q457_processed
4Q89 4Q89_processed_MDnew 4Q471a 4Q471a_processed
4Q90 4Q90_processed 4Q473 4Q473_processed
4Q91 4Q91_processed 4Q474 4Q474_processed
4Q92 4Q92_processed 4Q475 4Q475_processed
4Q93 4Q93_processed 4Q476 4Q476_processed
4Q94 4Q94_processed 4Q492 4Q492_processed
4Q95 4Q95_processed 4Q493 4Q493_processed
4Q96 4Q96_processed 4Q494 4Q494_processed
4Q97 4Q97_processed 4Q501 4Q501_processed
4Q98 4Q98_processed 4Q508 4Q508_processed
4Q98a 4Q98a_processed 4Q511 4Q511_2_processed
4Q98b 4Q98b_processed 4Q522 4Q522_cleaned_2
4Q98c 4Q98c_processed 4Q525 4Q525_2_processed
4Q98f 4Q98f_processed 4Q530 4Q530_processed
4Q98g 4Q98g_processed 4Q531 4Q531_processed
4Q109 4Q109_cleaned_MDnew 4Q540 4Q540_processed
4Q160 4Q160_137plate_cleaned 4Q542 4Q542_cleaned
4Q161 4Q161_583_part2_cleaned 4Q544 4Q544_cleaned
4Q163 4Q163_584_599_cleaned 4Q545 4Q545_processed
4Q166 4Q166_4_processed 4Q547 4Q547_processed
4Q167 4Q167_processed 4Q554 4Q554_cleaned_MDcrpcln1
4Q171 4Q171_2_processed 4Q557 4Q557_processed
4Q175 4Q175_cleaned 4Q577 4Q577_processed
4Q184 4Q184_287_cleaned 5-6Hev1b 5-6Hev1b_891_cleaned
4Q185 4Q185_160_part2_cleaned 5-6Hev45 5-6Hev45_part2_cleaned
4Q196 4Q196_cleaned 5Q5 5Q5_processed
4Q203 4Q203_906_cleaned 6Q18 6Q18_processed
4Q212 4Q212_227_cleaned 11Q5 11Q5_2_processed
4Q215 4Q215_processed 11Q6 11Q6_2_processed
4Q215a 4Q215a_processed 11Q7 11Q7_2_processed
4Q216 4Q216_cleaned_1 11Q8 11Q8_processed
4Q227 4Q227_processed 11Q13 11Q13_579_part2_cleaned
4Q252 4Q252_processed 11Q14 11Q14_cleaned_1
4Q256 4Q256_907_cleaned 11Q18 11Q18_processed
4Q258 4Q258_140_part1_cleaned 11Q19 11Q19_Brill2293_cleaned
4Q266 4Q266_cleaned_MDcrp1 11Q20 11Q20_5-new
4Q267 4Q267_2_processed Mas1e Mas1e_cleaned
4Q271 4Q271_processed Mas1f Mas1f_processed
4Q272 4Q272_processed MasJosh MasJosh_cleaned
4Q274 4Q274_processed Mur88 Mur88_2_crp-cln-prcsd_cleaned
4Q276 4Q276_processed Nash Nash-MS-OR-…-line9to15
4Q277 4Q277_processed Sdeir1 Sedir1_984_cleaned
4Q284a 4Q284a_processed
Table 13: List of twenty-five images of poor quality
Image-name Image-name Image-name
4Q540_processed 4Q53_405_part2_cleaned 4Q577_processed
4Q196_cleaned 4Q457_processed 4Q508_processed
4Q86_processed 4Q98_processed 4Q98f_processed
4Q492_processed 4Q448_processed Mas1f_processed
4Q88_3_processed 4Q530_processed 4Q98c_processed
2Q14_1_processed 4Q98g_processed 4Q98b_processed
4Q284a_processed 5Q5_processed 4Q398_1_processed
4Q432_processed 4Q373_processed 4Q379_1_processed
6Q18_processed
Table 14: A few examples of poor quality images (after binarization and cleaning).
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]

Appendix H Comparative plots for different information sources

Refer to caption
Figure 29: Overview of date estimations by three information sources and a calendar date: accepted 2σ𝜎\sigmaitalic_σ calibrated ranges 14C without minor peaks (blue), Enoch (green), palaeography (red), and historical (black). The vertical axis contains the manuscript numbers, and the horizontal axis contains dates: BCE in negative and CE in positive.
Refer to caption
Figure 30: Overview of date estimations by three information sources and a calendar date: accepted 2σ𝜎\sigmaitalic_σ calibrated ranges 14C with minor peaks (blue), Enoch (green), palaeography (red), and historical (black). Please note that this is the same as in Figure 1 in the main article, except here, the minor peaks are taken as a continuous range within the 2σ𝜎\sigmaitalic_σ calibrated range.
Refer to caption
Figure 31: Overview of date estimations by three information sources and a calendar date: accepted 1σ𝜎\sigmaitalic_σ calibrated ranges 14C (blue), Enoch (green), palaeography (red), and historical (black).

Appendix I List of images for different tests

Table 15: Complete list of 64 training images (including 4Q52; for the date prediction model) from the radiocarbon-dated manuscripts.
Q-numbers Name Plate Fragment Q-numbers Name Plate Fragment
4Q2 4Q2_1 215 1,4 4Q255 4Q255_4Q433a_1 177 3R
4Q2_2 215 3 4Q255_4Q433a_2 177 4R
4Q3 4Q3_1 393 5 4Q259 4Q259_1 683 6
4Q23 4Q23_1 271 1,4 4Q259_2 683 7
4Q23_2 272 5, 6, 18, 7 4Q259_3 695 1,3
4Q23_3 272 19 696 6
401 3, 4, 5 4Q259_4 810 3R, 5R, 7R, 8R
4Q27 4Q27_1 1080 1, 2, 6 4Q267 4Q267_1 106 2,6,9,11
4Q27_2 1081A 2 4Q267_2 107 2,9,12
4Q27_3 1082 1 4Q375 4Q375_1 122A 1
4Q27_4 1082 4 4Q375_2 122A 1,2
4Q27_5 1084B 1,7,9 4Q416 4Q416_1 180 1,2
4Q27_6 1086B 2, 8 4Q416_2 181 1
4Q30 4Q30_1 237 7 4Q416_3 181 1
4Q30_2 238 1 4Q504 4Q504_1 421 3, 4,5
4Q47 4Q47_1 1092 1 4Q504_2 982 1
4Q47_2 1092 3, 5 4Q504_3 982 2
4Q52 4Q52_1 42599 4Q504_4 982 2
4Q52_3 206 1,3 4Q521 4Q521_1 330-1 1
4Q70 4Q70_1 1109 7, 11 4Q521_2 330-1 1
4Q70_2 1110 2 4Q541 4Q541_1 147 1,19
4Q70_3 1110 3 5_6Hev1b 5_6Hev1b_1 890 2
1111 1 11Q5 11Q5_1 974 1
4Q70_4 1111 3 11Q5_2 974 1
4Q114 4Q114_1 224 1 11Q5_3 975 1
4Q161 4Q161_1 583 2,3 11Q5_4 975 1
4Q161_2 585 2,5 11Q5_5 976 1
4Q176 4Q176_1 285 1 11Q5_6 976 3
4Q176_2 285 2 11Q5_7 977 2
4Q201 4Q201_1 821 2 11Q5_8 978 1
904 1 11Q5_9 979 1
4Q201_2 821 1 Mas1k Mas1k_1 X232 1
4Q206 4Q206_1 358 1,6 Mas1k_2 X232 1
4Q206_2 359 1,3 Xhev_Se2 Xhev_Se2_1 534 2
Table 16: List of 135 manuscripts used for making date predictions. Please note that one manuscript may contain several images in the test dataset.
Q-numbers Q-numbers Q-numbers Q-numbers
1QapGen 4Q98 4Q301 4Q508
1QHa 4Q98a 4Q303 4Q511
1QIsaa 4Q98b 4Q319 4Q522
1QpHab 4Q98c 4Q325 4Q525
1QS 4Q98f 4Q373 4Q530
2Q3 4Q98g 4Q375 4Q531
2Q14 4Q109 4Q379 4Q540
2Q24 4Q160 4Q390 4Q542
3Q6 4Q161 4Q391 4Q544
4Q13 4Q163 4Q394 4Q545
4Q27 4Q166 4Q397 4Q547
4Q28 4Q167 4Q398 4Q554
4Q38 4Q171 4Q409 4Q557
4Q38a 4Q175 4Q410 4Q577
4Q53 4Q184 4Q422 5/6Hev1b
4Q57 4Q185 4Q426 5/6Hev45
4Q58 4Q196 4Q431 5Q5
4Q73 4Q203 4Q432 6Q18
4Q76 4Q212 4Q434 11Q5
4Q83 4Q215 4Q436 11Q6
4Q84 4Q215a 4Q437 11Q7
4Q85 4Q216 4Q439 11Q8
4Q86 4Q227 4Q442 11Q13
4Q87 4Q252 4Q448 11Q14
4Q88 4Q256 4Q457 11Q18
4Q89 4Q258 4Q471a 11Q19
4Q90 4Q266 4Q473 11Q20
4Q91 4Q267 4Q474 Mas1e
4Q92 4Q271 4Q475 Mas1f
4Q93 4Q272 4Q476 Mas1l
4Q94 4Q274 4Q492 Mur88
4Q95 4Q276 4Q493 Nash
4Q96 4Q277 4Q494 Sdeir1
4Q97 4Q284a 4Q501
Table 17: List of all 13 images that are split from training manuscripts and added to test.
Q-number Number of images
4Q27 2
4Q161 2
4Q267 2
4Q375 1
5-6Hev1b 1
11Q5 5
Table 18: Complete list of 23 training images from a selection of previously 14C-tested manuscripts Bonani1992 ; Jull1995 .
Manuscript Radiocarbon (BP) Image IDs
Mas1l 2086,28 MasJosh.png
1QIsaa 2141,32 1QIsaa_col01_cleaned.png
1QIsaa_col02_cleaned.png
1QIsaa_col03_cleaned.png
1QIsaa_col34_cleaned.png
1QIsaa_col35_cleaned.png
1QpHab 2054,22 QIrug-Qumran_extr09_2305_1QpHab_crpcln.png
QIrug-Qumran_extr09_2306_1QpHab_crpcln.png
QIrug-Qumran_extr09_2307_1QpHab_crpcln.png
QIrug-Qumran_extr09_2308_1QpHab_crpcln.png
QIrug-Qumran_extr09_2309_1QpHab_crpcln.png
QIrug-Qumran_extr09_2310_1QpHab_crpcln.png
QIrug-Qumran_extr09_2311_1QpHab_crpcln.png
11Q19 2030,40 QIrug-Qumran_extr09_2293_11Q19_crpcln.png
QIrug-Qumran_extr09_2294_11Q19_crpcln.png
QIrug-Qumran_extr09_2295_11Q19_crpcln.png
QIrug-Qumran_extr09_2296_11Q19_crpcln.png
QIrug-Qumran_extr09_2297_11Q19_crpcln.png
QIrug-Qumran_extr09_2298_11Q19_crpcln.png
QIrug-Qumran_extr09_2299_11Q19_crpcln.png
QIrug-Qumran_extr09_2300_11Q19_crpcln.png
1QS 2041,68 QIrug-Qumran_extr09_2151_1Qs_1_crpcln.png
QIrug-Qumran_extr09_2151_1Qs_2_crpcln.png
Table 19: Complete list of 30 images for date-bearing manuscripts from the fifth–fourth centuries BCE and the second century CE.
Manuscript Date Manuscript Date
A6_11R -411 IA06 -353
A6_12R -411 IA17 -324
A6_13R -411 IA21 -330
A6_14 -411 MareshaOstracon -176
A6_15 -411 Mur24_1 133
A6_16 -411 Mur24_2 133
A6_3 -411 NS_A1r -353
A6_4 -411 NS_A2r -351
A6_5 -411 NS_A4r -348
A6_7 -411 NS_A5r -348
A6_8 -411 NS_A6r -349
B3_1 -456 NS_C1r -330
IA01 -348 NS_C4r -324
IA03 -348 WDSP1_1 -335
IA04 -351 WDSP2 -352

Appendix J Radiocarbon sample information

Table 20: Information of physical samples for radiocarbon dating
IAA plate Sample Work Plate-fragment Info from IAA on DJD references, places in fragments where samples were taken, previous treatments from the 1950s onward, and sample weights
206 4Q52 4QSamb P206-Fr003 DJD 17: pl XXIV, fr 7; Bottom left; Maybe glues? Japanese Tissue Paper + Methylcellulose glue (2001)
421 4Q504 4QDibHama P421-Fr004 DJD 7: pl LII, fr 7; Bottom; Maybe castor oil?
285 4Q176 4QTanh P285-Fr002 DJD 5: pl XXII, fr 10; Upper left; Scotch tape; Rice paper + Perspex glue + trichlorethylene
224 4Q114 4QDanc P224-Fr001 DJD 16: pl XXXV, fr 3; Bottom right; Japanese Tissue Paper + Methylcellulose glue (2009)
385 4Q216 4QJuba P385-Fr011 DJD 13: pl I, fr 12ii; Left sheet: upper margin; Scotch tape; Rice paper + Perspex glue + trichlorethylene; Hinge of Japanese Tissue Paper + Methylcellulose glue (1992); Magen Broshi sampled the right sheet and the thread in 2003, therefore it was decided not to sample it again
801 4Q185 4QSapiential Work P801-Fr003 Not in DJD 5. Strugnell, RevQ 7 (1970) p.257 pl I, fr h. Bottom right; Japanese Tissue Paper + Methylcellulose glue (2012); Fragment 1 is sewn and encapsulated for exhibition, therefore it was decided to sample fr 3 instead
577 11Q20 11QTb P577-Fr014 DJD 23: pl XLIII, fr 10b; Bottom left; Plate 608 is sewn and encapsulated for exhibition, therefore it was decided to sample plate 577 instead
64 Mur88 MurXII P64-Fr001 DJD 2: pl LX, col X; Left sheet: bottom right; Rice paper + Perspex glue + trichlorethylene
891 5/6Hev1b 5/6HevPsalms P891-Fr003 DJD 38: pl XXVII, fr 10; Bottom right
585 4Q161 4QpIsaa P585-Fr001 DJD 5: pl IV, fr 2; Middle right; Scotch tape? Rice paper + Perspex glue + trichlorethylene
206 4Q52 4QSamb P206-Fr003 (b) DJD 17: pl XXIV, fr 7; Bottom left; Batch 1: additional material (4 mg)
285 4Q176 4QTanh P285-Fr002 (b) DJD 5: pl XXII, fr 10; Upper left; Batch 1: additional material (3mg)
224 4Q114 4QDanc P224-Fr001 (b) DJD 16: pl XXXV, fr 3; Bottom right; Batch 1: additional material (c.1 mg)
385 4Q216 4QJuba P385-Fr011 (b) DJD 13: pl I, fr 12ii; Left sheet: upper margin; Batch 1: additional material (4 mg)
577 11Q20 11QTb P577-Fr014 (b) DJD 23: pl XLIII, fr 10b; Bottom left; Batch 1: additional material (4 mg)
64 Mur88 MurXII P64-Fr001 (b) DJD 2: pl LX, col X; NB IAA did not list this one in their Excel list so no additional information
891 5/6Hev1b 5/6HevPsalms P891-Fr003 (b) DJD 38: pl XXVII, fr 10; Bottom right; Batch 1: additional material (6 mg)
585 4Q161 4QpIsaa P585-Fr001 (b) DJD 5: pl IX, fr 2; Middle right; Batch 1: additional material (4 mg)
1111 4Q70 4QJera P1111-Fr010 DJD 15: pl XXIX, fr 37; Left margin, middle; IAA measurement: 7 mg
1093 4Q47 4QJosha P1093-Fr005 DJD 14: pl XXXIV, fr 20; Up right diagonal margin; IAA measurement: 7 mg in two pieces
271 4Q23 4QLevNuma P271-Fr002 DJD 12: pl XXIII, fr 1; Bottom margin, middle; IAA measurement: 8.50 mg (one piece, broke down into 3 after weighing)
177 4Q255/
4Q433a
4QpapSa/
4QpapHodayot-
like Text B
P177 recto-Fr001 DJD 29: pl XV, fr. 1; Bottom margin, middle; IAA measurement: 6 mg in three pieces
977 11Q5 11QPsa P977-Fr004 DJD 4: pl III, fr A,B,C I; Middle-left, the sample was taken from the delaminated area adjacent to the Tetragrammaton and \<kwl¿; IAA measurement: 7 mg in two pieces
393 4Q3 4QGenc P393-Fr005 DJD 12: pl IX; Bottom margin, right side; IAA measurement: 8-9 mg in two pieces
1081A 4Q27 4QNumb P1081A-Fr002 DJD 12: pl XXXIX, fr 12; Lateral margin, bottom right; IAA measurement: 10 mg in one piece
x232 Mas1k MasShirShabb Px232-Fr001 Masada 6: ill 15; Bottom margin, right side; IAA measurement: 8 mg in two pieces
386 4Q206 4QEne ar P386-Fr001 Milik, Books of Enoch: pl XX, fr b; Bottom, center-left, below last \<’¿; IAA measurements: 7 mg in two pieces
237 4Q30 4QDeutc P237-Fr007 DJD 14: pl V, fr 32; Bottom, center; IAA measurements: 8 mg in two pieces
904 4Q201/
4Q338
4QEna ar/
4QGenealogical List
P904-Fr009 DJD 36: pl I, fr 2; Bottom, right; IAA measurements: 7-8 mg in two pieces
810 4Q259 4QSe P810-Fr011 DJD 26: pl XV, fr 2b; Bottom; IAA measurements: 9 mg in two pieces
180 4Q416 4QInstructionb P180-Fr004 DJD 34: pl VI, fr 4; Bottom, left corner; IAA measurements: 8-9 mg in two pieces
215 4Q2 Genb P215-Fr004 DJD 12: pl VI; Right blank margin, bottom left; IAA measurements: 9 mg in one piece
122A 4Q375 4QapocrMosesa P122A-Fr001 DJD 19: pl XIV, fr 1; Bottom left, middle of 2nd column; IAA measurements: 8-9 mg in one piece
534 XHev/Se2 XHev/Se Numb P534-Fr002 DJD 38: pl XXIX, fr 2; Bottom right corner; IAA measurements: 9 mg in two pieces
147 4Q541 4QapocrLevib P147-Frag019 DJD 31: pl XIV, fr 24; Bottom left corner; IAA measurements: 8-9 mg in one piece
330 4Q521 4QMessianic Apocalypse P330-Fr004 DJD 25: pl III, fr 10; Bottom left corner; IAA measurements: 8 mg in two pieces
107 4Q267 4QDamascusb P107-Fr010 DJD 18: pl XX, fr 9; Top left corner; IAA measurements: 7 mg in three pieces and some dust
879 Mur19 Mur pap WrDiv P879-Fr001 DJD 2: pl XXX, fr 19; Top left corner; IAA measurements: 8 mg in one piece and some dust

Appendix K Data-sheet radiocarbon runs

The samples were dated by two different AMS machines characterised by codes GrA and GrM. For the GrA machine the δ𝛿\deltaitalic_δ13C values of the IRMS are shown in the table; for GrM these are AMS values.

Table 21: Data-sheet 14C runs
fragment KLR samplenr GrA GrM 14a % σ𝜎\sigmaitalic_σ δ𝛿\deltaitalic_δ13C ‰ C% <14a> <σ𝜎\sigmaitalic_σ> age BP σ𝜎\sigmaitalic_σ remarks used calibrated 1σ𝜎\sigmaitalic_σ calibrated 2σ𝜎\sigmaitalic_σ
P206-Fr003 11089 65369 69793 75.69 0.39 -21.22 39.5
67017 10677 74.38 0.42 -21.20
67017 10678 75.05 0.43 -21.40
75.69 0.39 2237 41 1 GrA only
74.71 0.30 2342 33 2 GrM averaged
75.07 0.24 2303 26 3 averaged X 401–369 BCE
407–356,
281–232 BCE
P421-Fr004 11090 65370 68446 76.20 0.32 -17.85 48.8
65370 68447 76.43 0.33 -18.30 46.3
65370 68446 76.68 0.31
65370 68447 76.19 0.31
76.38 0.16 2164 16 4 averaged X
342–321,
201–175 BCE
352–287,
228–219,
211-151 BCE
P285-Fr002 11091 65371 69794 77.43 0.39 -22.19 44.2
65371 69794 77.06 0.36
65371 69794 76.20 0.43
67018 10679 75.68 0.42 -21.70
67018 10680 75.74 0.41 -22.10
76.95 0.23 2104 23 3 GrA averaged
75.71 0.29 2235 31 2 GrM averaged
76.49 0.18 2153 19 5 averaged X
343–321,
202–166 BCE
351–304,
209–102,
67–60 BCE
P224-Fr001 11092 65372 69795 76.04 0.39 -20.50 42.7
65372 69795 76.39 0.36
65372 69795 76.14 0.43
76.21 0.23 2182 24 3 GrA averaged
P224-Fr001 69074 13252 76.49 0.44 -19.80 2nd run, cleaned;
69074 13253 76.73 0.39 -19.70 after soxhlet;
69074 13254 76.20 0.36 -20.30 no glue;
69074 13255 76.40 0.35 -21.00 no black spot.
76.44 0.19 2158 20 4 GrM averaged
76.34 0.15 2168 15 all 7 averaged X
343–320,
202–176 BCE
352–287,
228–219,
211–160 BCE
P385-Fr011 11093 65373 69799 74.71 0.48 -21.61 46.0 74.71 0.48 2342 51 questionable ??
67020 10675 68.75 0.40 -21.70
67020 10676 69.24 0.38 -22.10
69.01 0.28 2979 32 2 GrM averaged
P801-Fr003 11094 65374 68448 77.17 0.33 -20.03 43.6
65374 68449 77.29 0.33 -20.27 46.2
65374 68448 76.93 0.31
65374 68449 77.43 0.31
77.20 0.16 2078 17 4 averaged 107–46 BCE
159–42,
7–5 BCE
P577-Fr014 11095 65357 69800 77.60 0.40 -20.93 44.1
65357 69800 77.71 0.36
65357 69800 77.79 0.44
67021 10681 76.51 0.43 -22.00
67021 10682 75.90 0.42 -21.00
77.15 0.18 2084 19 5 averaged
77.70 0.23 2027 24 3 GrA averaged
76.20 0.30 2183 32 2 GrM averaged
67021 18827 76.42 0.33 -20.70
67021 18828 75.61 0.35 -20.60
76.02 0.24 2202 26 2 new GrM aver
76.08 0.19 2195 19 4 GrM averaged
P64-Fr001 11096 65376 69806 78.22 0.39 -21.80 42.2
65376 69806 78.73 0.36
65376 69806 78.31 0.44
67022 10663 77.66 0.41 -23.00 77.66 0.41 2030 42
67022 10664 79.25 0.42 -21.60 79.25 0.42 1868 43
78.44 0.18 1950 18 5 averaged
78.44 0.23 1950 23 3 GrA averaged
78.43 0.29 1951 30 2 GrM averaged
67022 18829 76.90 0.33 -21.60
67022 18830 77.93 0.33 -20.90
77.44 0.24 2053 25 2 new GrM aver
77.83 0.18 2013 19 4 GrM averaged
P891-Fr003 11097 65377 69807 70.89 0.37 -21.23 41.8 questionable;
65377 69807 66.83 0.33 inhomogeneity?
65377 69807 72.95 0.43 reject GrA
67023 10659 78.07 0.38 -21.40
67023 10660 79.06 0.40 -21.20
78.54 0.28 1940 28 2 GrM averaged X
30–42,
59–124 CE
10–204 CE
P585-Fr001 11098 65378 69810 77.96 0.38 -21.02 40.8
65378 69810 77.61 0.36
65378 69810 78.12 0.44
67024 10661 77.92 0.39 -22.80
67024 10662 76.95 0.38 -21.40
77.86 0.22 2010 23 3 GrA averaged
77.42 0.27 2055 28 2 GrM averaged
77.69 0.17 2028 18 5 averaged X 45 BCE–8 CE
89–80 BCE,
54 BCE–27 CE,
48–57 CE
P1111-Fr010 11567 67025 11151 75.92 0.35 -19.20
67025 11152 75.37 0.30 -17.90
67025 11170 75.99 0.34 -18.80
67025 11171 76.00 0.33 -18.50
75.79 0.16 2226 17 4 averaged X
362–351,
295–272,
266–209 BCE
375–346,
317–203 BCE
P1093-Fr005 11568 67026 11153 76.27 0.32 -22.00
67026 11154 76.41 0.32 -23.60
67026 11172 76.72 0.33 -21.60
67026 11181 77.00 0.36 -21.70 statistic failure
76.46 0.19 2155 19 3 averaged X
343–320,
202–167 BCE
351–291,
209–104 BCE
P271-Fr002 11569 67027 11155 76.46 0.35 -24.60
67027 11156 76.52 0.32 -24.10
67027 11182 77.72 0.38 -22.80 statistic failure
67027 11183 77.79 0.38 -23.50 statistic failure
76.49 0.24 2152 24 2 averaged X
346–316,
204–151,
129–123 BCE
351–289,
227–220,
210–97,
71–58 BCE
P177-Fr001 11570 67028 11166 77.10 0.33 -10.80
67028 11167 76.42 0.32 -10.10
67028 11184 77.70 0.38 -10.60
67028 11185 76.93 0.35 -10.60
76.99 0.17 2100 17 4 averaged X
152–94,
74–56 BCE
167–51 BCE
P977-Fr004 11571 67029 11168 77.75 0.31 -21.80
67029 11169 78.25 0.33 -21.60
67029 11186 78.70 0.38 -21.00
67029 11187 78.91 0.49 -20.90
78.27 0.18 1967 18 4 averaged X
23–78,
101–107 CE
31–16 BCE,
7–120 CE
P393-Fr005 11924 69725 14380 77.10 0.40 -19.00
69725 14381 76.79 0.40 -18.90
69725 14228 76.73 0.40 -19.10
69725 14229 76.45 0.40 -18.60
76.77 0.20 2123 21 4 averaged X
174–102,
67–60 BCE
339–326,
199–89,
81–53 BCE
P1081a-Fr002 11925 69228 13385 76.69 0.36 -20.90 39.9
69228 13386 76.98 0.35 -21.20 39.7
69228 14170 77.90 0.27 -21.90 statistic failure
69228 14171 77.67 0.29 -22.20 statistic failure
76.84 0.25 2115 26 2 averaged X
171–97,
72–57 BCE
336–330,
198–50 BCE
Px232-Fr001 11926 69229 13387 77.66 0.37 -20.70 40.3
69229 13388 77.70 0.36 -20.70 41.3
69229 14175 77.98 0.31 -22.40
69229 14223 78.20 0.39 -20.30
77.89 0.18 2007 18 4 averaged X
41–9 BCE,
1 BCE–25 CE
46 BCE–62 CE
P386-Fr001 11927 69726 14382 76.52 0.40 -20.80
69726 14383 76.54 0.41 -21.10
69726 14230 76.12 0.40 -20.50
69726 14241 76.20 0.37 -21.30
76.34 0.20 2169 21 4 averaged X
348–312,
206–171 BCE
356–281,
232–150,
131–121 BCE
P237-Fr007 11928 69727 14565 75.91 0.44 -20.70
69727 14566 76.47 0.44 -20.90
69727 14395 76.82 0.35 -19.90
69727 14242 75.61 0.38 -20.90
69727 14243 76.11 0.37 -20.00
69727 14384 77.38 0.39 -20.00 statistic failure
76.21 0.17 2182 18 5 averaged X
351–295,
209–176 BCE
356–279,
256–248,
233–169 BCE
P904-Fr009 11929 69230 13389 76.31 0.39 -21.10
69230 13390 77.33 0.37 -21.10
69230 14173 77.37 0.30 -22.10
69230 14174 77.53 0.31 -22.20
69230 14172 77.82 0.29 -22.80 statistic failure
69230 77.21 0.17 2077 18 4 averaged X 107–46 BCE
162–41,
9–2 BCE
P810-Fr011 11930 69728 14396 76.85 0.36 -18.70
69728 14397 76.64 0.36 -18.80
69728 14244 75.99 0.37 -19.20
69728 14245 76.63 0.38 -19.10
76.53 0.18 2148 19 4 averaged X
343–322,
201–155 BCE
349–311,
206–100,
68–59 BCE
P180-Fr004 11931 69729 14398 76.60 0.38 -20.30
69729 14399 76.99 0.37 -21.10
69729 14246 76.55 0.39 -21.60
69729 14359 76.63 0.54 -20.30
76.71 0.20 2130 22 4 averaged X
196–184,
179–104 BCE
343–321,
202–91,
79–54 BCE
P215-Fr004 11932 69730 14400 77.22 0.38 -20.40
69730 14401 77.69 0.36 -19.60
69730 14360 76.67 0.46 -19.90
69730 14361 77.74 0.42 -19.80
77.38 0.20 2059 20 4 averaged X
97–71,
58–39,
11 BCE–2 CE
152–131 BCE,
121 BCE–10 CE
P122A-Fr001 11933 69731 14567 76.55 0.45 -20.80
69731 14568 76.20 0.44 -20.30
69731 14362 77.23 0.42 -19.00
69731 14363 76.93 0.43 -19.60
76.74 0.22 2126 23 4 averaged X
193–189,
176–101,
67–60 BCE
342–323,
201–88,
82–53 BCE
P534-Fr002 11934 69231 13391 77.75 0.40 -19.80
69231 13392 78.14 0.36 -19.80
69231 14224 78.10 0.39 -19.40
69231 14225 77.88 0.40 -18.70
77.98 0.19 1998 20 4 averaged X
38–13 BCE,
3–29,
43–59 CE
45 BCE–75 CE
P147-Fr019 11935 69732 14569 76.20 0.43 -18.80
69732 14570 76.55 0.43 -18.80
69732 14364 76.72 0.42 -18.20
69732 14365 76.64 0.42 -17.60
76.53 0.21 2148 22 4 averaged X
343–320,
202–152 BCE
351–305,
209–95,
73–57 BCE
P330-Fr004 11936 69733 14571 76.26 0.44 -21.20
69733 14572 76.39 0.45 -20.90
69733 14377 76.00 0.38 -20.90
69733 14366 77.16 0.43 -20.80
76.43 0.21 2159 22 4 averaged X
346–316,
204–165 BCE
353–286,
229–217,
211–104 BCE
P107-Fr010 11937 69232 13393 76.99 0.36 -20.40 47.6
69232 13394 76.62 0.39 -20.30 38.3
69232 14226 76.14 0.40 -19.60
69232 14227 76.16 0.40 -20.30
76.51 0.19 2151 21 4 averaged X
344–320,
202–157 BCE
351–294,
209–98,
70–58 BCE
P879-Fr001 11938 69734 14573 77.80 0.44
69734 14574 78.15 0.44
69734 14378 78.11 0.40
69734 14379 78.23 0.39
78.08 0.21 1987 21 4 averaged X
32–17 BCE,
7–64 CE
41–9 BCE,
1 BCE–81 CE,
98–110 CE
67806 11762 0.08000 0.030 -29.60 background age
67460 11306 78.24 0.10 -24.92 1971 15
Niederlande
Roman age
68084 11761 77.96 0.12 -27.31 2000 15
P224 69073 13256 74.25 0.54 -20.60 2390 60
P1081a 69093 13262 110.5 0.56 -25.71 modern age

Appendix L Worksheet of comparative data for 2σ𝜎\sigmaitalic_σ calibrated ranges and traditional palaeographic estimates

Whole or partial overlap between 2σ𝜎\sigmaitalic_σ calibrated ranges and palaeographic estimates in 17 of the 26 accepted samples: 4Q23, 4Q47, 4Q52, 4Q70, 4Q161, 4Q176, 4Q201/4Q338, 4Q255/4Q433a, 4Q259, 4Q504, 4Q521, 4Q541, 11Q5, Mas1k, Mur19, 5/6Hev1b, XHev/Se2 (see Appendix D.1.1).

  1. 1.

    4Q23 (4QLevNuma)

  2. \bullet

    355–285 BCE (29.8%), 230–220 BCE (0.8%), 210–95 BCE (62.8%), 75–55 BCE (2.1%)

  3. \bullet

    DJD 12:154 (Ulrich): early Hasmonaean formal script, dating from approximately the middle or latter half of the second century BCE (150–100 BCE).

  4. 2.

    4Q47 (4QJosha)

  5. \bullet

    355–290 BCE (33.8%), 210–100 BCE (61.6%)

  6. \bullet

    DJD 14:143 (Ulrich): referring to Cross Hasmonaean formal bookhand, second half of the second century or the first half of the first century BCE (150-50 BCE).

  7. \bullet

    Puech, Revue Biblique 122/4 (2015), 482: hasmonéenne au mieux dans la première moitié du 1er s. avant J.-C. (100–50 BCE).

  8. 3.

    4Q52 (4QSamb)

  9. \bullet

    410–355 BCE (78.9%), 285–230 BCE (16.6%)

  10. \bullet

    DJD 17:220 (Cross, Parry, and Saley) (ca. 250 BCE).

  11. 4.

    4Q70 (4QJera))

  12. \bullet

    375–345 BCE (16.3%), 320–200 BCE (79.2%)

  13. \bullet

    DJD 15:150 (Tov): quoting Yardeni 1990 and Cross 1985 (Cross shifting between earlier and later dates to settle on an earlier date), the late third or early second century BCE (225–175 BCE).

  14. 5.

    4Q161 (4QpIsaa)

  15. \bullet

    90–80 BCE (1.7%), 55 BCE–30 CE (92.1%), 45–60 CE (1.7%)

  16. \bullet

    Strugnell 1970 groups this manuscript with other manuscripts such as 4Q166 and 4Q171 and gives a general indication of the script as developed rustic semiformal Herodian (see also DJD 19:112). Yardeni 2007 also lists this manuscript as part of those copied by the prolific scribe she identified and dates it to the late first century BCE to the beginning of the first century CE (30 BCE–20 CE).

  17. 6.

    4Q176 (4QTanh)

  18. \bullet

    355–300 BCE (30.5%), 210–100 BCE (64.2%), 70–60 BCE (0.7%)

  19. \bullet

    Strugnell 1970:229 and Tigchelaar RevQ 2019; “middle Hasmonaean” (ca. 125–75 BCE).

  20. 7.

    4Q201/4Q338 (4QEna ar/4QGenealogical List)

  21. \bullet

    165–40 BCE (93.6%), 10–1 BCE (1.9%)

  22. \bullet

    Milik 1976:140: first half of the second century BCE. Mixed evidence: archaic and connections with semicursive scripts of third and second centuries BCE, perhaps dependent upon the Aramaic scripts and scribal customs of northern Syria or Mesopotamia.

  23. \bullet

    Puech 2017:99: ca. 200 BCE.

  24. \bullet

    Langlois Le premier manuscrit du Livre d’Hénoch, 62–68: ca. 150 BCE.

  25. \bullet

    200–150 BCE

  26. 8.

    4Q255/4Q433a (4QpapSa/4QpapHodayot-like Text B)

  27. \bullet

    170-50 BCE (95.4%)

  28. \bullet

    DJD 26:8, 20, 24, 29 (Alexander/Vermes, following Cross): 125–100 BCE.

  29. 9.

    4Q259 (4QSe)

  30. \bullet

    350–310 BCE (24.3%), 210–100 BCE (69.7%), 70–55 BCE (1.4%)

  31. \bullet

    DJD 26:8, 20, 24, 133 (Alexander and Vermes, also referring to Cross): 50–25 BCE. Late Hasmonaean/Early Herodian semicursive, with mixed semicursive and semiformal features. But 4Q259 is difficult to date palaeographically. Suggestions range from 50–25 BCE (Cross), second half second century BCE, 150–100 BCE (Milik), to first half first century BCE, preferably shortly after 100 BCE, 100–75 BCE (Puech).

  32. 10.

    4Q504 (4QDibHama)

  33. \bullet

    355–285 BCE (45.4%), 230–150 BCE (50.1%)

  34. \bullet

    DJD 7:137 (Baillet): “L’écriture est une calligraphie asmonéenne, qui peut dater des environs de 150 avant J.-C.” Cross: 175–150 BCE.

  35. 11.

    4Q521 (4QMessianic Apocalypse)

  36. \bullet

    355–285 BCE (38.0%), 230–100 BCE (57.5%)

  37. \bullet

    DJD 25:3–5 (Puech): formal Hasmonaean script, following Cross; first quarter of the first century BCE (100–80 BCE).

  38. 12.

    4Q541 (4QapocrLevib ar)

  39. \bullet

    355–300 BCE (24.6%), 210–95 BCE (68.2%), 75–55 BCE (2.7%)

  40. \bullet

    DJD 31:227 (Puech): Hasmonaean, to the end of the second century BCE, ca. 100 BCE; the writing is of the type of 1QS, 1QIsaa, 4Q175, but posterior to 4Q504 (125–100 BCE).

  41. 13.

    11Q5 (11QPsa)

  42. \bullet

    35–15 BCE (3.3%), 5–120 CE (92.2%)

  43. \bullet

    DJD 4:6–9 (Sanders): first half of the first century CE (1–50 CE).

  44. 14.

    Mas1k (ShirShabb)

  45. \bullet

    50 BCE–65 CE (95.4%)

  46. \bullet

    Masada 6:120 (Newsom and Yadin; Newsom HSS 27:168): developed Herodian formal hand, late Herodian formal hand, ca. 50 CE. Also: DJD 11:239.

  47. 15.

    Mur19 (pap WrDiv)

  48. \bullet

    45 BCE–85 CE (91.5%), 95–110 CE (3.9%)

  49. \bullet

    Cursive script with internal date of 71/72 CE validates radiocarbon date. The text refers to “year 6 of Masada”. See Section B.4 in Appendix B.

  50. 16.

    5/6Hev1b (Ps)

  51. \bullet

    10–205 CE (95.4%)

  52. \bullet

    DJD 38:143: late Herodian, understood as 50-68 CE. Cross: 75–100 CE.

  53. 17.

    XHev/Se2 (XHev/Se Numa)

  54. \bullet

    45 BCE–75 CE (95.4%)

  55. \bullet

    DJD 38:174 (Flint): late Herodian, 50–68 CE.

Older 2σ𝜎\sigmaitalic_σ calibrated ranges in 9 of the 26 accepted samples: 4Q2, 4Q3, 4Q27, 4Q30, 4Q114, 4Q206, 4Q267, 4Q375, 4Q416 (see Appendix D.1.2).

  1. 1.

    4Q2 (Genb)

  2. \bullet

    155–130 BCE (5.2%), 125 BCE–10 CE (90.3%)

  3. \bullet

    DJD 12:31 (Davila): late Herodian or even post-Herodian formal hand, ca. 50–68+ CE.

  4. 2.

    4Q3 (4QGenc)

  5. \bullet

    340–325 BCE (3.5%), 200–50 BCE (92.0%)

  6. \bullet

    DJD 12:39 (Davila): Herodian formal hand, dating from the middle to the end of that period, ca. 20–68 CE.

  7. 3.

    4Q27 (4QNumb)

  8. \bullet

    340–330 BCE (1.3%), 200–50 BCE (94.2%)

  9. \bullet

    DJD 12:211 (Jastram): following Cross, early Herodian semiformal, 30 BCE–20 CE, earlier in that range.

  10. 4.

    4Q30 (4QDeutc)

  11. \bullet

    360–275 BCE (57.4%), 260–245 BCE (1.4%), 235–165 BCE (36.7%)

  12. \bullet

    DJD 14:15 (White Crawford): following Cross, typical Hasmonaean book hand, 150–100 BCE. But Cross 2003 gives a more narrow date of 125–100 BCE.

  13. 5.

    4Q114 (4QDanc)

  14. \bullet

    355–285 BCE (49.5%), 230–160 BCE (45.9%)

  15. \bullet

    DJD 16:270 (Ulrich, following Cross): late second century BCE, no more than about a half century younger than the autograph, 125–100 BCE.

  16. 6.

    4Q206 (4QEne ar)

  17. \bullet

    360–280 BCE (48.6%), 235–145 BCE (45.8%), 135–120 BCE (1.1%)

  18. \bullet

    Milik 1976:225: Hasmonaean, probably first half first century BCE, also referring to Cross 1961: p. 138, fig. 2, lines 2 (4Q30) and 3 (4Q51) and p. 149, fig. 4, lines 2 (4Q114) and 4 (4Q398), 100–50 BCE.

  19. 7.

    4Q267 (4QDamascusb)

  20. \bullet

    355–290 BCE (28.6%), 210–95 BCE (65.3%), 70–55 BCE (1.6%)

  21. \bullet

    DJD 18:1, 96 (Yardeni): formal early Herodian, Cross’s round semiformal; connects this manuscript with 4Q397 as possibly same scribe, 30 BCE–20 CE.

  22. 8.

    4Q375 (4QapocrMosesa)

  23. \bullet

    345–320 BCE (6.0%), 205–50 BCE (89.5%)

  24. \bullet

    DJD 19:112 (Strugnell): early Herodian, rustic semiformal, 30 BCE–20 CE. Compare with 4Q27, 4Q161, both radiocarbon and palaeography.

  25. 9.

    4Q416 (4QInstructionb)

  26. \bullet

    345–320 BCE (8.0%), 205–90 BCE (78.1%), 80–50 BCE (9.4%)

  27. \bullet

    DJD 34:74–76 (Strugnell and Harrington): Herodian, between 4Q51 and 1QM, hence “in a date transitional between the late Hasmonaean and the earliest Herodian hands”, and Strugnell judged the hand of 4Q416 to be earlier than the hands of 4Q415, 4Q417, and 4Q418 by some twenty-five years (76), 50–25 BCE.