\jyear

2024

[1]\fnmMladen \surPopović

[1]\orgdivQumran Institute, \orgnameUniversity of Groningen, \orgaddress\countryThe Netherlands

2]\orgdivArtificial Intelligence, \orgnameUniversity of Groningen, \orgaddress\countryThe Netherlands

3]\orgdivCenter for Isotope Research, \orgnameUniversity of Groningen, \orgaddress\countryThe Netherlands

4]\orgdivDepartment of Physics, Chemistry, and Pharmacy, \orgnameUniversity of Southern Denmark, \orgaddress\countryDenmark

5]\orgdivDepartment of Chemistry and Industrial Chemistry, \orgnameUniversity of Pisa, \orgaddress\countryItaly

6]\orgdivFaculty of Theology and Religious Studies, \orgnameKU Leuven, \orgaddress\countryBelgium

Dating ancient manuscripts using radiocarbon and AI-based writing style analysis

[email protected] \fnmMaruf A. \surDhali [email protected] \fnmLambert \surSchomaker [email protected] \fnmJohannes \survan der Plicht [email protected] \fnmKaare \surLund Rasmussen [email protected] \fnmJacopo \surLa Nasa [email protected] \fnmIlaria \surDegano [email protected] \fnmMaria \surPerla Colombini [email protected] \fnmEibert \surTigchelaar [email protected] * [ [ [ [ [

Abstract

Determining the chronology of ancient handwritten manuscripts is essential for reconstructing the evolution of ideas. For the Dead Sea Scrolls, this is particularly important. However, there is an almost complete lack of date-bearing manuscripts evenly distributed across the timeline and written in similar scripts available for palaeographic comparison. Here, we present Enoch, a state-of-the-art AI-based date-prediction model, trained on the basis of new ¹⁴C dated samples of the scrolls. Enoch uses established handwriting-style descriptors and applies Bayesian ridge regression. The challenge of this study is that the number of radiocarbon-dated manuscripts is small, while current machine learning requires an abundance of training data. We show that by using combined angular and allographic writing style feature vectors and applying Bayesian ridge regression, Enoch could predict the ¹⁴C-based dates from style, supported by leave-one-out validation, with varied MAEs of 27.9 to 30.7 years relative to the ¹⁴C dating. Enoch was then used to estimate the dates of 135 unseen manuscripts, revealing that 79% of the samples were considered ‘realistic’ upon palaeographic post-hoc evaluation. We present a new chronology of the scrolls. The ¹⁴C ranges and Enoch’s style-based predictions are often older than the traditionally assumed palaeographic estimates. In the range of 300–50 BCE, Enoch’s date prediction provides an improved granularity. The study is in line with current developments in multimodal machine-learning techniques, and the methods can be used for date prediction in other partially-dated manuscript collections. This research shows how Enoch’s quantitative, probability-based approach can be a tool for palaeographers and historians, re-dating ancient Jewish key texts and contributing to current debates on Jewish and Christian origins.

keywords:

Palaeography, Artificial Intelligence, Radiocarbon Dating, Dead Sea Scrolls

One of the main aims of palaeography—the study of ancient handwriting—is the dating of manuscripts on the basis of their handwriting Nongbri2019 ; Orsini2018-edit ; OrsiniClarysse2012 . Determining the chronology of ancient handwritten manuscripts is essential for reconstructing the evolution of ideas. This is particularly important for the Dead Sea Scrolls from ancient Judaea. These contain the oldest manuscripts of the Hebrew Bible and many previously unknown ancient Jewish texts, mostly written in Aramaic/Hebrew script. The discovery of the scrolls in the 1940s–1960s fundamentally transformed our knowledge of Jewish and Christian origins Brooke2018-kl .

Aramaic/Hebrew script in Judaea evolved from the imperial Aramaic script of the fifth and fourth centuries BCE to the Jewish square script in the first and second centuries CE. For the centuries in between, palaeographers have constructed a model of successive developmental stages, each characterized by distinct features of the script. This model was used to date manuscripts and thus affected the study of religious, cultural, and historical developments Tigchelaar2020 ; Puech2017 ; Cross2003 ; Avigad1965 . However, these palaeographic distinctions are not reliably grounded (see Appendix A).

For palaeographic comparison, one requires enough date-bearing manuscripts that are evenly distributed across the timeline and written in similar script. Yet, only some of the very oldest, fourth century BCE, and the very youngest, first and second century CE, manuscripts have calendar dates (see Section A.1 in Appendix A). To compensate for this scarcity for the centuries in between, palaeographers have turned to inscriptions on other surfaces which would be historically datable Puech2017 ; Cross2003 ; yardeni2000-mq ), but these, too, have no absolute dates (see Section A.2.1 in Appendix A). Historical hypotheses of a slow development of the Aramaic/Hebrew script in the third century BCE and the emergence and rapid development of a national script around the mid-second century BCE Puech2017 ; Cross2003 ; Yardeni1990 remain unsubstantiated. Neither inscriptions nor historical hypotheses enable us to reliably date the Dead Sea Scrolls (see Section A.2.2 in Appendix A).

In this study, we bridge the palaeographic gap between the fourth century BCE and the second century CE, and advance palaeography in general, by combining new radiocarbon (¹⁴C) dates with state-of-the-art artificial intelligence(AI)-based writing style analysis. A straightforward approach is to use machine-learning algorithms that are able to learn from a small set of labeled, i.e., dated, examples. This requirement is in conflict with the need for labeled data in current supervised deep learning, typically a thousand examples per class Krizhevsky2017 . An abundance of data points is needed to warrant the stable estimation of, e.g., a neural-network model with millions of coefficients, in order to minimize the risk of arriving at a seemingly ‘good’ model Vapnik2000 . A simple example is a linear function that has two coefficients and consequently needs a minimum of two data points to be determined. It may be appreciated that there exists a fundamental problem if the number of manuscript-date reference points is in the order of two dozen, while a computational model requires hundreds of thousands of coefficients or more. We address this challenge by applying methods that can (a) operate under sparse data conditions, (b) are explainable, and (c) do not require (pre)training from an extraneous, alien image collection. So, while it is tempting to use modern methods of deep learning, as we have done before He2019 ; He2020 ; He2021 ; Zhang2022 ; Ameryan2023 , we will present several arguments for not using such approaches for the proposed style-based date prediction on a very small data set, i.e., at the current stage of Dead Sea Scrolls research on handwriting-style based manuscript dating (Appendix F).

Since a general, large, representative, and labeled data set is not available for the period of the scrolls, we apply dedicated pattern recognition and machine-learning models, only using the relevant scrolls data for training a date-prediction model. Given the importance of the topic, it is expected that the use of pretrained deep transfer learning on the basis of extraneous material would elicit valid concerns among palaeographers about the relation between the scrolls’ target data and training data from a (very) different origin and period. Like the Ithaca approach assael2022restoring , a deep neural network making chronological attributions of ancient Greek inscriptions based on the totality of textual content, we focus on predicting chronological development, but unlike Ithaca, we use shape-style information from handwritten manuscript images instead.

We present Enoch, a machine-learning-based date-prediction model using established handwriting-style descriptors and applying Bayesian ridge regression. Enoch, named after the ancient Jewish science hero, was trained on the basis of new ¹⁴C-dated samples of the scrolls, providing reliable, absolute time markers that can bridge the palaeographic gap. Because of possible castor oil contamination issues with previous ¹⁴C datings of the scrolls Bonani1992 ; Jull1995 , which would give a misleading ¹⁴C age that was ‘younger’ than the true age of the samples, new ¹⁴C dating was necessary for this study rasmussen2001effects ; rasmussen2003reply ; rasmussen2009effects .

Enoch integrates multiple dating methods, using both physical, material-based evidence from ¹⁴C dating and geometric, writing style-based evidence from AI methods. With a new set of ¹⁴C-dated scrolls for temporal reference, the corresponding handwritten style features in those tested manuscripts are used for date estimation for undated manuscripts from the collection by applying machine-learning-based writing style analysis. Subsequently, interpolation of writing style features over time allows Enoch to make estimates for samples that do not have a ¹⁴C date and are only available as a digital image. Thus, Enoch offers date predictions as probability-based options that can aid palaeographers and historians in their decision-making and contribute to historical debates.

1 Radiocarbon dating

We performed ¹⁴C dating on 30 undated manuscripts from 4 sites, spanning an estimated 5 centuries: 25 from the Qumran caves, 1 from Masada, 2 from the Murabba\laspat caves, and 2 from the Naḥal Ḥever caves. Twenty-eight manuscripts were made of animal skin or parchment, and 2 of papyrus.

Samples were selected because of their script and presumed period, the manuscripts having a sufficient number of characters for Enoch to be trained, and also on the basis of practical and conservational considerations (see Section B.1 in Appendix B). One date-bearing document, Mur19, was used as a validation test for ¹⁴C, but did not go into the training of Enoch because of its cursive script.

The scrolls are extremely delicate material. As in the previous attempts made at dating the scrolls Bonani1992 ; Jull1995 , we, too, had to adjust the standard chemical AAA (Acid-Alkali-Acid) pretreatment (see Section B.2 in Appendix B). Also, many fragments are contaminated with castor oil, which scholars in the 1950s used to improve the readability of the scrolls’ text rasmussen2001effects ; rasmussen2003reply ; rasmussen2009effects . This study is the first to apply, prior to ¹⁴C dating, a chemical treatment specifically designed for removing fatty materials, employing solvent extraction (see Sections B.2 and B.7.1 in Appendix B). Further specialized analytical chemistry methods were applied before and after the sample pretreatment to demonstrate that the total amount of lipid materials is below a threshold that does not significantly skew the ¹⁴C date (see Sections B.7.2–B.7.5 in Appendix B).

The samples were dated by two Accelerator Mass Spectrometry (AMS) machines (see Section B.3 in Appendix B).

2 Integration of multiple dating methods

We used 24 manuscripts from the ¹⁴C samples with accepted dates as labeled data for the primary training set for Enoch (see Sections B.4-B.6 in Appendix B and Section D.2 in Appendix D). For the data labels, we used OxCal v4.4.2 Oxcal ; Oxcal2 to obtain the raw data points for the probability distributions. This is because the ¹⁴C results are not single dates, as with date-bearing documents, but represent date ranges with probability distributions. The ¹⁴C data input for training Enoch consists of the probability distributions of accepted 2-sigma (2 $\sigma$ ) calibrated ranges (see Section E.6 in Appendix E).

In addition to the primary training set, we created different combinations of training data to perform comparative analyses and further check the robustness of the model. These combinations include the tentative addition or omission of 4Q52, some previously tested ¹⁴C samples Bonani1992 ; Jull1995 , date-bearing documents from the fifth–fourth centuries BCE and the second century CE (see Tables 18 and 19 in Appendix I for complete lists of manuscripts), the Maresha ostracon from 176 BCE (see Section A.1 in Appendix A), and leave-one-out of the training data points.

2.1 Deep neural networks for detection of handwritten ink-trace patterns

The physical 24 ¹⁴C-dated manuscripts are visually available on many individual images of the IAA’s Leon Levy Dead Sea Scrolls Digital Library collection dssllweb . We also use images from Brill Publishers lim1995dead , especially in cases where the manuscript is unavailable in the IAA collection. For this study, these images underwent multiple preprocessing measures to become suitable for pattern recognition-based techniques. It should be noted that the images are extremely difficult to work with (some examples can be seen in Figure 12 in Appendix E and Figure 25 in Appendix G; see also Section G.1 in Appendix G). We are not dealing with digitally encoded text but with pixel images of highly degraded manuscripts as input.

We utilize multispectral band images of each fragment and employ an in-house image fusion technique dhali2019binet to generate a three-channel image. The resulting image representation enhances ink-vs-background contrast and therefore facilitates the effective separation of ink from backgrounds, commonly called binarization. For this purpose, we employ BiNet dhali2019binet , an artificial neural network based on an encoder-decoder U-net architecture designed to binarize the diverse range of scroll images. The resulting binarized images consist solely of black foreground pixels (ink) against a white background, ensuring that subsequent analyses focus exclusively on the handwritten patterns while minimizing inadvertent matches due to material-texture attributes. We further correct the rotation of the binarized images and divide them into multiple parts to maintain a balanced distribution of handwritten characters within each new image. No extraneous image material was used to train this binarization method.

Thus, we obtained a data set of 75 images from the 24 ¹⁴C-dated manuscripts. We used 62 of these images to train Enoch. The remaining 13 images, chosen deliberately and randomly, were passed as unseen test data to cross-validate the robustness and reliability of Enoch’s performance. The prediction of these 13 images by Enoch gives an 85.14% overlap to the original ¹⁴C probability distributions (see Table 17 in Appendix I). The image samples typically contain 150–200 characters, which has been shown to be sufficient for the comparable task of writer identification Brink2008 .

2.2 Extracting features for style attribution

It should be noted that in this context, ‘style’ is not related to textual content or wording. In fact, for characterizing the handwritten shapes, small shapes along the ink trace are used, largely uncoupled from the textual content, because we want to avoid spurious matches or date predictions on the basis of textual content. Once the training images were available, we could perform feature extraction techniques to translate handwriting patterns into feature vectors. The feature vectors relate directly to the shape-based evidence of the ink traces in the manuscripts and have a solid basis in writer identification Bulacu2003 ; Schomaker2004 ; Bulacu2007 and document dating Dhali2020 ; He2016 . We extract features from both the allographic and textural levels of characters PlosOne . An overview of machine-learning methods can be found in Sommerschield2023 .

The first, allographic, method uses a self-organized character map obtained using a Kohonen neural network. As an example, this allographic codebook feature allows for a 93% ( $\pm~{}\sigma=2.3$ %) accuracy classification of the scripts ‘Hasmonaean’ vs. ‘Herodian’, using PCA, on 590 labeled manuscripts, results averaged over 32 random odd/even splits for training/testing monknet . The second, textural, method uses statistical pattern recognition on angular information. The ‘hinge’ method for estimating the curvature distribution has been used extensively in writer verification and dating studies Bulacu2007 ; Adam2018 ; Dhali2020 . Whereas the allographic feature addresses stylistic elements at the character level, the ‘hinge’ method concerns a micro-level feature directly related to the original writing activity that yielded the curvature of the ink trace. Therefore, we make a weighted combination of textural and allographic features to obtain an adjoined feature vector for each manuscript image. Such a feature vector constitutes the input data to Enoch.

2.3 Bayesian ridge regression

Due to the limited size of the data set, we cannot employ high-parametric models like period-specific temporal codebooks He2016 . Instead, we utilize conditional modeling with Bayesian ridge regression Hoerl2000 . This approach applies Bayesian inference to estimate model parameters for date prediction. By placing a prior distribution on the parameters and updating it with observed data using Bayes’ rule, we obtain the posterior distribution of the parameters and predicted dates. The Bayesian approach is chosen because our target output data represents probability curves for ¹⁴C dates (i.e., a vector) containing the accepted 2 $\sigma$ calibrated ‘OxCal’ ranges. This probabilistic approach enables us to incorporate all available information while maintaining interpretability. Moreover, instead of producing a single number for the estimated date of a sample, it provides a comprehensive posterior distribution that allows us to assess the uncertainty associated with the estimated dates. Additionally, Enoch can utilize the Bayesian approach to provide error margins for predictions on unseen data.

2.4 Testing Enoch

Once Enoch was trained, we performed the validation by leave-one-out tests to check its performance. At this point, we took the calibrated style-based date estimation method of Enoch and applied it to a collection of 135 unseen manuscripts from the Dead Sea Scrolls as test data (see Table 16 in Appendix I).

We use two types of data-balancing techniques to compensate for the imbalanced distribution of the training data over different periods. One balancing technique involves data augmentation using random elastic morphing bulacu2009morph to create a balanced training data distribution. The second balancing is done on the output date predictions. This post-data-balancing uses accumulated training probabilities and training data point counts with 5%, 10%, and 20% threshold values to avoid under-sampled time regions.

The general recipe for Enoch’s analysis of manuscript images is presented in Table 1. Before applying this to scrolls manuscripts, we tried out a known mediaeval, dated benchmark data set of charters, MPS He2016 , with success Koopmans2023 .

Table 1: Style-based date prediction recipe for Enoch

1.

Select and crop the relevant manuscript images based on scholarly identification criteria;
2.

In the images, perform a separation of the ink trace from the material background texture by using a deep-learning-based U-net variant for multispectral image-intensity binarization dhali2019binet ;
3.

For each manuscript, compute two shape descriptors: a histogram of allographic fraglet occurrence and a histogram of angular co-occurrence along the ink-trace edges Bulacu2007 ; Schomaker2004 ;
4.

Adjoin the two feature vectors, properly weighted, to a single handwriting-style vector bulacu2009morph ;
5.

In order to decorrelate the features, avoid collinearity, and minimize the necessary number of parameters in the next stage, perform a strong dimensionality reduction (PCA, 20 dimensions).
6.

Take the ¹⁴C-dated manuscript-image samples for training Enoch as a style-based Bayesian ridge-regression model with a scalar date estimate as the target output. In this training, augment the image data set by using random elastic morphing to obtain a sufficient and balanced number of examples per ¹⁴C-dated reference. This step is an essential, new contribution that allows a merger of ¹⁴C-based and style-based information in the date estimation. For validating Enoch, use the leave-one-out approach: each sample that is under evaluation does not occur in the training data;
7.

Harvesting: estimate style-based dates for undated manuscripts.

3 ¹⁴C dates and palaeographic estimates

The AMS results yielded 26 accepted ¹⁴C dates (see Sections B.4–B.6 in Appendix B), which are shown in Table LABEL:tab:summarized-c14 (Appendix B). The historical date preserved in Mur19 is consistent with the calibrated age range obtained by ¹⁴C (see Section B.4 in Appendix B). Overall, we improved and extended the existing series of ¹⁴C-dated Dead Sea Scrolls Bonani1992 ; Jull1995 .

Figure 1 shows the comparison between the 2 $\sigma$ calibrated ranges and traditional palaeographic estimates (in blue and red). This demonstrates that 17 of the 26 sampled manuscripts have whole or partial overlap, and 9 out of 26 samples yield calibrated ages that do not overlap with previous palaeographic estimates (see Appendix D).

Overall, the ¹⁴C results indicate older date ranges for individual manuscripts as well as for the emergence of the so-called ‘Hasmonaean’ and ‘Herodian’ scripts. Only two manuscripts have date ranges that go in the direction of a younger possible range. The ¹⁴C results for most manuscripts confirm the basic distinction between Hasmonaean-type manuscripts that are older, and Herodian-style manuscripts that are younger, and also between so-called ‘Archaic’ and Hasmonaean-type manuscripts.

However, the ¹⁴C date ranges for manuscripts that are traditionally considered Hasmonaean and Herodian are quite differently distributed across the timeline. As can be seen in Figure 1 (in blue), Hasmonaean-type manuscripts are all grouped together in a narrower part of the timeline but Herodian-type manuscripts are more spread out across the timeline, extending from the second century CE all the way back to the second century BCE (see Sections D.1.1–D.1.3 in Appendix B).

Sample 4Q114 is one of the most significant findings of the ¹⁴C results. The manuscript preserves Daniel 8–11, which scholars date on literary-historical grounds to the 160s BCE SchmidSchroeter ; zenger9 . The accepted 2 $\sigma$ calibrated peak for 4Q114, 230–160 BCE, overlaps with the period in which the final part of the biblical book of Daniel was presumably authored (see Section D.1.2 in Appendix D).

4 Validation of Enoch

Figure 1 (in green) also shows the results of cross-validation and leave-one-out tests for training Enoch. The choice for the bandwidths (2 $\sigma$ date ranges for ¹⁴C, 1 $\sigma$ uncertainties of the ridge regression for style-based predictions) is based on the intrinsic reliability of the two information sources. ¹⁴C date ranges are evidently superior to style-based predictions.

Enoch’s style-based predictions largely follow the ¹⁴C results, even though the validation samples (rows) are in no way present in the training data. In the range 300–50 BCE, Enoch’s estimates provide a more fine-grained distribution than the ¹⁴C results. For samples 5/6Hev1b, Mas1k, and XHev/Se2, the style-based estimate is earlier and more uncertain. However, 11Q5 shows that in this late date range, a fairly certain style-based date estimate above 100 CE can also be achieved. This may go against historical reconstructions according to which the scrolls were hidden in the Qumran caves before the summer of 68 CE Popovic2012 . Yet, we did not impose here a chronological limit on the model, because of the ¹⁴C result for 11Q5, and in order to examine the possibility of style continuation after 70 CE.

Regarding the differences between the ¹⁴C date ranges and Enoch’s script style-based estimates, the mean absolute error (MAE) is $30.7$ years. The MAE drops to $27.9$ years when minor peaks (less than $4\%$ in all cases except for $5.2\%$ in 4Q2 and $9.4\%$ in 4Q416) are ignored (see Figure 29 in Appendix H). In manuscript dating, MAE is commonly used HamidMAE2019 for evaluation of a regression method. The difference with rms error is limited HodsonMAE202 . With the chosen 2 $\sigma$ (¹⁴C) and 1 $\sigma$ (AI) bandwidths, the error for the leftmost margin is $6.4$ years while for the rightmost margin it equals $-38.4$ years, indicating that Enoch’s style-based estimate range ends earlier than the ¹⁴C range. For each sample, the date ranges of the two information sources have partial to full overlap with an average of $88.8\%$ . For Ithaca assael2022restoring , AI and epigraphy were used as two information sources to predict dates for ancient Greek inscriptions. Their prediction provides an average distance of 29.3 years from the target dating brackets, with a median distance of 3 years based on the totality of texts. We also aim for date prediction tasks, but, unlike Ithaca, we utilize three information sources: ¹⁴C, shape-based writing style analysis (AI), and palaeography.

Refer to caption — Figure 1: Overview of date estimations by three information sources and a calendar date: (accepted) 2 $\sigma$ calibrated ranges ¹⁴C (blue), Enoch (green), palaeography (red), and historical (black). The vertical axis contains the manuscript numbers, and the horizontal axis contains dates: BCE in negative and CE in positive.

Figure 1 shows the general result that, on average, ¹⁴C date ranges and Enoch’s predictions indicate older dates than palaeography. Only 4Q201 and 11Q5 have older palaeographic date estimates, although there is an overlap with the ¹⁴C results (see Section D.1.1 in Appendix D).

5 Harvest of Enoch’s date predictions for previously undated manuscripts

Table 2: Expert validation of Enoch’s date predictions

Prediction is:	Subcategory	Manuscript count		Prozentualer Anteil
Realistic	107			79.26%
Unrealistic	Indecisive	4	28	20.74%
	Too old	13
	Too young	11
Total manuscripts	135			100.00%

Table 2 summarizes Enoch’s date predictions for 135 previously undated manuscripts. Expert palaeographers evaluated the style-based date predictions, condensing the prediction into two main categories: $realistic$ and $unrealistic$ , the latter subdivided into $too$ $old$ and $too$ $young$ (see Appendix G).

As can be seen in Table 2, $107$ ( $79\%$ ) of the undated manuscripts were dated realistically, according to the palaeographers. Enoch’s date prediction task is not a $50/50$ , binary decision task but regressive, with many possible years in the interval 300 BCE–200 CE. Assuming a coarseness of 25 years, as in the MPS project He2016 , the date range would consist of 20 bins, with a $5\%$ prior-probability hit rate. Therefore, a success rate of $79\%$ is unlikely to be accidental. For $21\%$ of the manuscripts, the palaeographers judged Enoch’s date predictions to be unrealistic. Enoch’s 28 unrealistic predictions were divided between too old ( $46\%$ ) and too young ( $39\%$ ).

Samples 4Q259 and 4Q319 show that Enoch can accurately find the same date estimate for the same writing styles. The accepted 2 $\sigma$ calibrated range of 4Q259 was used to train Enoch. Images of 4Q319 were part of a test set already in 2021. 4Q259 contains text that is part of the so-called Rule of the Community. 4Q319 contains a calendrical text. Because of perceived generic differences, 4Q319 received a separate classification number but is materially actually part of the same manuscript as 4Q259 Hempel2020 . At the time of the test, 6 July 2021, this identity was not known to the AI experts. Figure 2 shows that Enoch was able to give a date prediction estimate for 4Q319 that matches the accepted 2 $\sigma$ calibrated range of 4Q259 (see Section G.5 in Appendix G).

Previously, we demonstrated that two scribes were at work in the Great Isaiah Scroll PlosOne . Now, Enoch shows that there is no temporal difference between the two halves of the manuscript as if one part were written significantly later than the other. On the contrary, both scribes are estimated to have worked on their respective part of the scroll of 1QIsa^a in the same period. Figure 3 shows that Enoch dates the two halves consistently between 180–100 BCE.

6 Discussion and conclusions

6.1 Aramaic/Hebrew script development in ancient Judaea

This study in style-based date prediction using the Enoch approach is a first step. The ¹⁴C data generated in this study in combination with machine-learning-based writing style analysis enabled us to examine Aramaic/Hebrew script in individual manuscripts with an empirically based precision that was not possible before. We combined palaeography, AI, and ¹⁴C to create a date-prediction model that leads to a new chronology of the scrolls during the third century BCE until the second century CE. We give four novel insights into Aramaic/Hebrew script development during this period and the date of individual manuscripts.

First, ¹⁴C date ranges and Enoch’s style-based estimates are overall older than previous palaeographic estimates. These older dates for the scrolls are realistic. Hasmonaean-type manuscripts have accepted 2 $\sigma$ calibrated ranges that allow for older dates in the first half of the second century BCE, and sometimes slightly earlier, instead of only circa 150–50 BCE. There are no compelling palaeographic or historical reasons that preclude these older dates as reliable time markers for the ‘Hasmonaean’ script. This also applies to the accepted 2 $\sigma$ calibrated range for 4Q70 and its ‘Archaic’ script.

Second, ‘Herodian’ script emerged earlier than previously thought. This suggests that the ‘Hasmonaean’ and ‘Herodian’ scripts were not transitioning from the mid-first century BCE onward, but that they existed next to each other at a considerably earlier date.

Third, this novel approach of palaeography leads to a new chronology of the scrolls that impacts our understanding of the history of ancient Judaea and the people behind the scrolls. Hypotheses about whether the movement behind the scrolls originated in the second or first century BCE will need to be reconsidered in light of Enoch’s second-century BCE date predictions for Hasmonaean-type manuscripts such as 1QS and 4Q163 (see Appendix G), bearing texts that are regarded typical for the movement. Scholars often assume that the rise and expansion of the Hasmonaean kingdom from the mid-second century BCE onward caused a rise in literacy and gave a push to scribal and intellectual culture. Yet, the results of this study attest to the copying of multiple literary manuscripts before this period. One example is 4Q109, a copy of the biblical book of Ecclesiastes, a book which scholars tentatively date to the end of the third century BCE SchmidSchroeter , for which Enoch gives a third-century BCE date prediction (see Appendix G), close to Archaic-type manuscripts such as 4Q52 and 4Q70—copies of the biblical books of Samuel and Jeremiah.

Fourth, this study’s ¹⁴C result for 4Q114 and Enoch’s date prediction for 4Q109 now establish these to be the first known fragments of a biblical book from the time of their presumed authors SchmidSchroeter . Also, Enoch’s integration of multiple dating methods yields a strongly improved value of sources of evidence and allows for a mutual confirmation of evidence from the two sources—physical (material) and geometric (shape-based).

The results of this study thus dismantle unsubstantiated historical suppositions and chronological limitations, and call into question the validity of the default model’s relative typology. This relative typology can only be maintained with restrictions. The spread of the Hasmonaean-type manuscripts over the timeline does not affect the default relative typology in a major way, but the older, second-century BCE date ranges of the Herodian-type manuscripts challenge the relative typology. More research is needed to solve this issue.

6.2 The Enoch approach to dating ancient manuscripts

To our knowledge, Enoch is the first complete machine-learning-based model that employs raw image inputs to deliver probabilistic date predictions for handwritten manuscripts utilizing the entire probability distribution from ¹⁴C output, and that is completed by palaeographic input while ensuring transparency and interpretability through its explainable design. Palaeographers and historians may now use Enoch’s quantitative, probability-based approach to palaeography as a tool to examine date predictions. The probability-based options can help in decision-making and to explicate qualitative palaeographic reasoning. Also, the methods underpinning Enoch can be used for date prediction in other partially-dated manuscript collections.

It could be argued that the style-based predictions are influenced by the ¹⁴C-based training of the model. However, the leave-one-out validation results indicate that unseen samples obtain their interpolated position on the time axis based on the detected handwriting style in the images. The placement of an unseen sample on the time axis is not fundamentally constrained. Any date in the time range of 300 BCE to 200 CE could have been reached, looking at all style-based dates empirically covered by the model.

In this study, we have avoided using palaeographic estimates as target values for machine learning because our goal is to provide physical (material) and geometric (shape-based) evidences for manuscript dating. While the use of palaeographic estimates as target values for machine learning is technically possible, we consider it too risky, given the existing uncertainties and lack of consensus associated with the precise dating of individual manuscripts.

It becomes apparent that a broader time axis, with a sufficient number of samples at the tails—both at the BCE and CE ends—will allow for a larger time range of predictions. It would be very valuable if new manuscript samples could be added to the current collection. The consequences of each newly added manuscript sample to the Dead Sea Scrolls ¹⁴C reference collection can now easily be computed using the Enoch approach.

Enoch’s $79$ % success rate in date prediction is potentially interesting in view of the fully undated status of the manuscripts before the analysis was performed. Moreover, the images for the test data were not treated with the same care as those for the training of Enoch. All the training images underwent rotation and alignment correction, followed by a clean arrangement of smaller fragments within each manuscript, to obtain accurate feature extractions for the style periods represented by those manuscripts. If the same image preparation treatment were applied to every single image of the test data, it is to be expected that the percentage of realistic date predictions would exceed $79$ %. The $28$ ( $21$ %) manuscripts that received an unrealistic date prediction in the current test may be due to image quality issues (see Figure 13 in Appendix E). The results of the test samples are likely to be better if more accurate manual cropping and rotation correction had been performed, similar to what has been done to the training samples.

In its manuscript analysis, Enoch differs from traditional palaeographic approaches. Enoch emphasizes shared characteristics and similarity matching between trained and test manuscripts, whereas traditional palaeography focuses on subtle differences that are assumed to be indicative for style development. Combining dissimilarity matching and adaptive reinforcement learning can uncover hidden patterns. This interdisciplinary fusion may enrich our understanding of textual content, material properties, and historical context, leading to enhanced interpretations of the past. This remains a task for the future. New ¹⁴C evidence or, with new discoveries, a whole range of date-bearing manuscripts can be added to Enoch’s training data for further refinement and precision, continuously improving accuracy. Although the limited data were insufficient for a full deployment of deep-learning in the prediction task (see Appendix F), future research needs to address the problems of sparse labeling and high dimensionality. It is to be expected that new solutions will appear here, because these problems are encountered in many application domains. If palaeographers are willing to accept the use of ‘black box’, pre-trained deep-learning models that are based on completely extraneous large image and photograph collections, future research may be directed at adapting the output of such models to the vectorial regression-based date-prediction task that is proposed in the current article.

7 Online content

All data, code, and test film associated with this article are publicly available on Zenodo with the following DOIs:
Data and prediction plots (v3): https://doi.org/10.5281/zenodo.10998958
Code and feature files (v5): https://doi.org/10.5281/zenodo.11371749
Film (see details in Appendix G): https://doi.org/10.5281/zenodo.8167946

8 Supplementary information

This article has ten supplementary materials:

•

Appendix A: The dating problem of the Dead Sea Scrolls
•

Appendix B: Radiocarbon dating of the Dead Sea Scrolls
•

Appendix C: ¹⁴C determinations and calibrated date plots
•

Appendix D: Palaeography and radiocarbon dating of the Dead Sea Scrolls
•

Appendix E: Artificial intelligence (AI) in dating the scrolls
•

Appendix F: On the use of pre-trained deep learning methods for image-based dating
•

Appendix G: Enoch’s date predictions for 135 previously undated manuscripts
•

Appendix H: Comparative plots for different information sources
•

Appendix I: List of images for different tests
•

Appendix J: Radiocarbon sample information
•

Appendix K: Data-sheet radiocarbon runs
•

Appendix L: Worksheet of comparative data for 2 $\sigma$ ¹⁴C dates and traditional palaeographic estimates

9 Acknowledgments

The authors thank P. Shor, J. Uziel, T. Bitler, H. Libman, B. Riestra, O. Rosengarten, and S. Halevi at the Dead Sea Scrolls Unit of the Israel Antiquities Authority (IAA) and E. Boaretto (advisor to the IAA from the Weizmann Institute of Science, Jerusalem) for providing physical samples and multispectral images of the scrolls—courtesy of the Leon Levy Dead Sea Scrolls Digital Library; Brill Publishers for the Dead Sea Scrolls images from the Brill Collection; A. Aerts-Bijma and D. Paul for handling and measuring the ¹⁴C samples at the Center for Isotope Research (Groningen); S. Legnaioli for the Raman analyses performed at the CNR-ICCOM (Pisa); A. Krauss and T. van der Werff for their contributions to developing and testing Enoch; L. Bouma for cleaning images; D. Longacre, G. Hayes, A.W. Aksu, H. van der Schoor, C. van der Veer, and M. van Dijk for their contributions to preparing images for training Enoch; M.W. Dee for advising on and inspecting the code and data acquisition process from OxCal to the Enoch model at the Center for Isotope Research (Groningen). This project has received funding by the European Research Council under the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 640497 (HandsandBible). M.P. and E.T. were also supported by NWO, Netherlands Organisation for Scientific Research and FWO, the Research Foundation - Flanders (SV-15-29).

Declarations

Please check the Instructions for Authors of the journal to which you are submitting to see if you need to complete this section. If yes, your manuscript must contain the following sections under the heading ‘Declarations’:

•

Funding:
The work has been supported by an ERC Starting Grant of the European Research Council (EU Horizon 2020): The Hands that Wrote the Bible: Digital Palaeography and Scribal Culture of the Dead Sea Scrolls (HandsandBible # 640497).
•

The authors have no conflict of interest/Competing interests
•

Availability of data, materials, and code: see Section 7
•

Authors’ contributions: all the authors contributed equally to the article.

References

\bibcommenthead
(1) Nongbri, B.: Palaeographic analysis of codices from the early christian period: A point of method. Journal for the Study of the New Testament 42, 84–97 (2019). https://doi.org/10.1177/0142064x19855582
(2) Orsini, P.: Introduction. In: Studies on Greek and Coptic Majuscule Scripts and Books, p.VII–XVI. De Gruyter, Berlin (2018). https://doi.org/10.1515/9783110575446-000
(3) Orsini, P., Clarysse, W.: Early New Testament manuscripts and their dates: A critique of theological palaeography. Ephemerides Theologicae Lovanienses 88, 443–474 (2012)
(4) Brooke, G.J., Hempel, C. (eds.): T&T Clark Companion to the Dead Sea Scrolls. T&T Clark, London (2018)
(5) Tigchelaar, E.: Seventy years of palaeographic dating of the Dead Sea Scrolls. In: Drawnel, H. (ed.) Sacred Texts and Disparate Interpretations: Qumran Manuscripts Seventy Years Later, pp. 258–278. Brill, Leiden (2020). https://doi.org/10.1163/9789004432796_014
(6) Puech, E.: La paléographie des manuscrits de la mer Morte. In: Fidanzio, M. (ed.) The Caves of Qumran, pp. 96–105. Brill, Leiden (2017). https://doi.org/10.1163/9789004316508_008
(7) Cross, F.M.: The development of the Jewish scripts. In: Leaves from an Epigrapher’s Notebook: Collected Papers in Hebrew and West Semitic Palaeography and Epigraphy, pp. 1–43. Eisenbrauns, Winona Lake, IN (2003). https://doi.org/10.1163/9789004369887_002. (originally published in 1961
(8) Avigad, N.: The palaeography of the Dead Sea Scrolls and related documents. In: Scripta Hierosolymitana, Volume IV: Aspects of the Dead Sea Scrolls, pp. 56–87. Magness Press, Jerusalem (1965)
(9) Yardeni, A.: Textbook of Aramaic, Hebrew and Nabataean Documentary Texts from the Judaean Desert and Related Material, 2 Vols. The Hebrew University, Jerusalem (2000)
(10) Yardeni, A.: The palaeography of 4QJer^a – a comparative study. Textus 15, 233–268 (1990). https://doi.org/10.1163/2589255x-01501012
(11) Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Communications of the ACM 60, 84–90 (2017). https://doi.org/10.1145/3065386
(12) Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, ??? (2000). https://doi.org/%****␣ms.tex␣Line␣325␣****10.1007/978-1-4757-3264-1
(13) He, S., Schomaker, L.: Deep adaptive learning for writer identification based on single handwritten word images. Pattern Recognition 88, 64–74 (2019). https://doi.org/10.1016/j.patcog.2018.11.003
(14) He, S., Schomaker, L.: FragNet: Writer identification using deep fragment networks. IEEE Transactions on Information Forensics and Security 15, 3013–3022 (2020). https://doi.org/10.1109/tifs.2020.2981236
(15) He, S., Schomaker, L.: GR-RNN: Global-context residual recurrent neural networks for writer identification. Pattern Recognition 117, 107975 (2021). https://doi.org/10.1016/j.patcog.2021.107975
(16) Zhang, Z., Schomaker, L.: DiverGAN: An efficient and effective single-stage framework for diverse text-to-image generation. Neurocomputing 473, 182–198 (2022). https://doi.org/10.1016/j.neucom.2021.12.005
(17) Ameryan, M., Schomaker, L.: How to limit label dissipation in neural-network validation: Exploring label-free early-stopping heuristics. Journal on Computing and Cultural Heritage 16(1), 1–20 (2023). https://doi.org/10.1145/3587168
(18) Assael, Y., Sommerschield, T., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., de Freitas, N.: Restoring and attributing ancient texts using deep neural networks. Nature 603, 280–283 (2022). https://doi.org/10.1038/s41586-022-04448-z
(19) Bonani, G., Ivy, S., Wölfli, W., Broshi, M., Carmi, I., Strugnell, J.: Radiocarbon dating of fourteen Dead Sea Scrolls. Radiocarbon 34, 843–849 (1992). https://doi.org/10.1017/s0033822200064158
(20) Jull, A.J.T., Donahue, D.J., Broshi, M., Tov, E.: Radiocarbon dating of scrolls and linen fragments from the Judean Desert. Radiocarbon 37, 11–19 (1995). https://doi.org/%****␣ms.tex␣Line␣450␣****10.1017/s0033822200014740
(21) Rasmussen, K.L., van der Plicht, J., Cryer, F.H., Doudna, G., Cross, F.M., Strugnell, J.: The effects of possible contamination on the radiocarbon dating of the Dead Sea Scrolls I: castor oil. Radiocarbon 43, 127–132 (2001). https://doi.org/10.1017/S0033822200031702
(22) Rasmussen, K.L., van der Plicht, J., Doudna, G., Cross, F.M., Strugnell, J.: Reply to Israel Carmi (2002): “Are the 14C dates of the Dead Sea Scrolls affected by castor oil contamination?”. Radiocarbon 45, 497–499 (2003). https://doi.org/10.1017/S0033822200032847
(23) Rasmussen, K.L., van der Plicht, J., Doudna, G., Nielsen, F., Højrup, P., Stenby, E.H., Pedersen, C.T.: The effects of possible contamination on the radiocarbon dating of the Dead Sea Scrolls II: empirical methods to remove castor oil and suggestions for redating. Radiocarbon 51, 1005–1022 (2009). https://doi.org/10.1017/S0033822200034081
(24) Ramsey, C.B.: Development of the radiocarbon calibration program. Radiocarbon 43, 355–363 (2001). https://doi.org/10.1017/s0033822200038212
(25) Ramsey, C.B., van der Plicht, J., Weninger, B.: ‘Wiggle matching’ radiocarbon dates. Radiocarbon 43, 381–389 (2001). https://doi.org/10.1017/s0033822200038248
(26) Israel Antiquities Authority: The Leon Levy Dead Sea Scrolls Digital Library. https://www.deadseascrolls.org.il/explore-the-archive. Accessed: 2023-04-10
(27) Lim, T., Alexander, P.: The Dead Sea Scrolls Electronic Library (Volume 1). Brill Publishers (1995)
(28) Dhali, M.A., de Wit, J.W., Schomaker, L.: Binet: Degraded-manuscript binarization in diverse document textures and layouts using deep encoder-decoder networks. arXiv preprint (2019). https://doi.org/10.48550/arXiv.1911.07930
(29) Brink, A., Bulacu, M., Schomaker, L.: How much handwritten text is needed for text-independent writer verification and identification. In: 2008 19th International Conference on Pattern Recognition (ICPR). IEEE, Piscataway (2008). https://doi.org/10.1109/icpr.2008.4761908
(30) Bulacu, M., Schomaker, L., Vuurpijl, L.: Writer identification using edge-based directional features. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. IEEE Comput. Soc. https://doi.org/10.1109/icdar.2003.1227797
(31) Schomaker, L., Bulacu, M.: Automatic writer identification using connected-component contours and edge-based features of uppercase western script. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 787–798 (2004). https://doi.org/10.1109/tpami.2004.18
(32) Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 701–717 (2007). https://doi.org/10.1109/tpami.2007.1009
(33) Dhali, M.A., Jansen, C.N., de Wit, J.W., Schomaker, L.: Feature-extraction methods for historical manuscript dating based on writing style development. Pattern Recognition Letters 131, 413–420 (2020). https://doi.org/10.1016/j.patrec.2020.01.027
(34) He, S., Samara, P., Burgers, J., Schomaker, L.: Image-based historical manuscript dating using contour and stroke fragments. Pattern Recognition 58, 159–171 (2016). https://doi.org/10.1016/j.patcog.2016.03.032
(35) Popović, M., Dhali, M.A., Schomaker, L.: Artificial intelligence based writer identification generates new evidence for the unknown scribes of the Dead Sea Scrolls exemplified by the Great Isaiah Scroll (1QIsa^a). PLOS ONE 16, 0249769 (2021). https://doi.org/10.1371/journal.pone.0249769
(36) Sommerschield, T., Assael, Y., Pavlopoulos, J., Stefanak, V., Senior, A., Dyer, C., Bodel, J., Prag, J., Androutsopoulos, I., de Freitas, N.: Machine learning for ancient languages: A survey. Computational Linguistics, 1–45 (2023). https://doi.org/10.1162/coli_a_00481
(37) Schomaker, L.: Monk - Search and annotation tools for handwritten manuscripts. http://monk.hpc.rug.nl/. Accessed: 2023-07-17 (2023)
(38) Adam, K., Baig, A., Al-Maadeed, S., Bouridane, A., El-Menshawy, S.: KERTAS: dataset for automatic dating of ancient arabic manuscripts. International Journal on Document Analysis and Recognition (IJDAR) 21, 283–290 (2018). https://doi.org/10.1007/s10032-018-0312-3
(39) Hoerl, A.E., Kennard, R.W.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 80–86 (2000). https://doi.org/10.1080/00401706.2000.10485983
(40) Bulacu, M., Brink, A., van der Zant, T., Schomaker, L.: Recognition of handwritten numerical fields in a large single-writer historical collection. In: 2009 10th International Conference on Document Analysis and Recognition, pp. 808–812 (2009). https://doi.org/10.1109/ICDAR.2009.8
(41) Koopmans, L., Dhali, M., Schomaker, L.: The effects of character-level data augmentation on style-based dating of historical manuscripts. In: Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods, pp. 124–135. SCITEPRESS - Science and Technology Publications, Setubal, Portugal (2023). https://doi.org/10.5220/0011699500003411
(42) Schmid, K., Schröter, J.: The Making of the Bible: From the First Fragments to Sacred Scripture. Belknap Press, Cambridge, MA (2021)
(43) Zenger, E., Frevel, C.: Einleitung in das Alte Testament. Neunte, aktualisierte Auflage. Kohlhammer, Stuttgart (2016)
(44) Popović, M.: Qumran as scroll storehouse in times of crisis? A comparative perspective on Judaean Desert manuscript collections. Journal for the Study of Judaism 43, 551–594 (2012). https://doi.org/%****␣ms.tex␣Line␣800␣****10.1163/15700631-12341239
(45) Hamid, A., Bibi, M., Moetesum, M., Siddiqi, I.: Deep learning based approach for historical manuscript dating. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 967–972 (2019). https://doi.org/10.1109/ICDAR.2019.00159
(46) Hodson, T.O.: Root-mean-square error (rmse) or mean absolute error (mae): when to use them or not. Geoscientific Model Development 15(14), 5481–5487 (2022). https://doi.org/10.5194/gmd-15-5481-2022
(47) Hempel, C.: The Community Rules from Qumran: A Commentary. Mohr Siebeck, Tübingen (2020). https://doi.org/10.1628/978-3-16-157027-8
(48) Gropp, D.M.: Discoveries in the Judaean Desert: Volume XXVIII. Wadi Daliyeh II and Qumran Miscellanea, Part 2. Clarendon Press, Oxford (2001)
(49) Naveh, J., Shaked, S.: Aramaic Documents from Ancient Bactria (Fourth Century BCE.). The Khalili Family Trust, London (2012)
(50) Porten, B., Yardeni, A.: Textbook of Aramaic Documents from Ancient Egypt, 4 Vols. The Hebrew University, Jerusalem (1986–1999)
(51) Benoit, P., Milik, J.T., de Vaux, R.: Discoveries in the Judaean Desert: Volume II. Les grottes de Murabba\laspât, 2 Vols. Clarendon Press, Oxford (1961)
(52) Cotton, H.M., Yardeni, A.: Discoveries in the Judaean Desert: Volume XXVII. Aramaic, Hebrew and Greek Documentary Texts from Naḥal Ḥever and Other Sites. With an Appendix Containing Alleged Qumran Texts. Clarendon Press, Oxford (1997)
(53) Eshel, E., Kloner, A.: An Aramaic ostracon of an Edomite marriage contract from Maresha, dated 176 BCE. Israel Exploration Journal 46, 1–22 (1996)
(54) Geraty, L.T.: The Khirbet el-Kôm bilingual ostracon. Bulletin of the American Schools of Oriental Research 220, 55–61 (1975)
(55) Eck, W., Cotton, H.M., Di Segni, L.: Corpus Inscriptionum Iudaeae/Palaestinae: Volume 1, Part 1 vol. 1, pp. 414–416. De Gruyter, Berlin (2010)
(56) Avigad, N.: Ancient Monuments in the Kidron Valley. Bialik Institute, Jerusalem (1954)
(57) Barag, D.: The 2000-2001 exploration of the tombs of Benei Ḥezir and Zechariah. Israel Exploration Journal 53, 78–110 (2003)
(58) Naveh, J.: Dated coins of Alexander Janneus. Israel Exploration Journal 18, 20–26 (1968)
(59) Baillet, M., Milik, J.T., de Vaux, R.: Discoveries in the Judaean Desert of Jordan: Volume III. Les ‘petites grottes’ de Qumrân, 2 Vols. Clarendon Press, Oxford (1962)
(60) Puech, E.: Discoveries in the Judaean Desert: Volume XXXI. Qumrân Grotte 4.XXII. Textes araméens, première partie: 4Q529-549. Clarendon Press, Oxford (2001)
(61) Puech, E.: Inscriptions funéraires palestiniennes: Tombeau de Jason et ossuaires. Revue biblique 90, 481–533 (1983)
(62) Cross, F.M.: The oldest manuscripts from Qumran. Journal of Biblical Literature 74, 147–172 (1955)
(63) Magness, J.: Ossuaries and the burials of Jesus and James. Journal of Biblical Literature 124, 121–154 (2005)
(64) Cross, F.M.: The papyri and their historical implications. In: Lapp, P.W., Lapp, N.L. (eds.) Discoveries in the Wâdī ed-Dâliyeh, pp. 17–29. American Schools of Oriental Research, Cambridge, MA (1974)
(65) Cross, F.M.: The development of the Jewish scripts. In: Wright, G.E. (ed.) The Bible and the Ancient Near East, pp. 133–202. Doubleday, Garden City, NY (1961)
(66) Cross, F.M.: Palaeography and the Dead Sea Scrolls. In: Flint, P.W., VanderKam, J.C. (eds.) The Dead Sea Scrolls After Fifty Years: A Comprehensive Assessment, Volume One, pp. 379–402. Brill, Leiden (1998)
(67) Bonani, G., Broshi, M., Carmi, I., Ivy, S., Strugnell, J., Wölfli, W.: Radiocarbon dating of the dead sea scrolls. Atiqot 20, 27–32 (1991)
(68) Popović, M.: Book production and circulation in ancient Judaea: Evidenced by writing quality and skills in the Dead Sea Scrolls Isaiah and Serekh manuscripts. In: Williams, T.B., Keith, C., Stuckenbruck, L. (eds.) The Dead Sea Scrolls in Ancient Media Culture, pp. 199–265. Brill, Leiden (2023). https://doi.org/10.1163/9789004537804_007
(69) Longacre, D.: Disambiguating the concept of formality in palaeographic descriptions: Stylistic classification and the ancient jewish hebrew/aramaic scripts. Comparative Oriental Manuscript Studies Bulletin 5, 101–128 (2019). https://doi.org/10.25592/UHHFDM.739
(70) Queffelec, A., Bertran, P., Bos, T., Lemée, L.: Mineralogical and organic study of bat and chough guano: implications for guano identification in ancient context. Journal of Cave and Karst Studies 80, 1–17 (2018). https://doi.org/10.4311/2017es0102
(71) Dee, M.W., Palstra, S.W.L., Aerts-Bijma, A.T., Bleeker, M.O., de Bruijn, S., Ghebru, F., Jansen, H.G., Kuitems, M., Paul, D., Richie, R.R., Spriensma, J.J., Scifo, A., van Zonneveld, D., Verstappen-Dumoulin, B.M.A.A., Wietzes-Land, P., Meijer, H.A.J.: Radiocarbon dating at Groningen: New and updated chemical pretreatment procedures. Radiocarbon 62, 63–74 (2019). https://doi.org/10.1017/rdc.2019.101
(72) Van der Plicht, J., Wijma, S., Aerts, A., Pertuisot, M., Meijer, H.: Status report: the Groningen AMS facility. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 172, 58–65 (2000). https://doi.org/10.1016/S0168-583X(00)00284-6
(73) Synal, H.-A., Stocker, M., Suter, M.: MICADAS: A new compact radiocarbon AMS system. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms 259, 7–13 (2007). https://doi.org/10.1016/j.nimb.2007.01.138
(74) Mook, W.G., van der Plicht, J.: Reporting 14C activities and concentrations. Radiocarbon 41, 227–239 (1999). https://doi.org/10.1017/S0033822200057106
(75) Reimer, P.J., Austin, W.E., Bard, E., Bayliss, A., Blackwell, P.G., Ramsey, C.B., Butzin, M., Cheng, H., Edwards, R.L., Friedrich, M., et al.: The intcal20 northern hemisphere radiocarbon age calibration curve (0–55 cal kbp). Radiocarbon 62, 725–757 (2020). https://doi.org/10.1017/RDC.2020.41
(76) van der Plicht, J.: Variations in atmospheric ¹⁴C. In: Reference Module in Earth Systems and Environmental Sciences. Encyclopedia of Quaternary Science, 3rd Edition, pp. 1–10. Elsevier, Amsterdam (2022). https://doi.org/10.1016/b978-0-323-99931-1.00014-3
(77) Tigchelaar, E.: Identification and reidentification of some fragments of 4Q70 (4QJer^a). Textus 29, 193–200 (2020). https://doi.org/10.1163/2589255x-02901006
(78) Koffmahn, E.: Zur Datierung der aramäisch/hebräischen Vertragsurkunden von Murabba\laspat. Wiener Zeitschrift für die Kunde des Morgenlandes 59, 119–136 (1963)
(79) Yadin, Y.: The excavation of Masada—1963/64: preliminary report. Israel Exploration Journal 15, 1–120 (1965)
(80) Goodblatt, D.: Dating documents in Provincia Iudaea: A note on papyri Murabba\laspat 19 and 20. Israel Exploration Journal 49, 249–259 (1999)
(81) Eshel, H.: Documents of the first Jewish revolt from the Judean Desert. In: Berlin, A.M., Overman, J.A. (eds.) The First Jewish Revolt: Archaeology, History, and Ideology, pp. 171–177. Routledge, London (2003)
(82) Eshel, H., Broshi, M., Jull, T.A.J.: Four Murabba\laspat papyri and the alleged capture of Jerusalem by Bar Kokhba. In: Katzoff, R., Schaps, D. (eds.) Law in the Documents of the Judaean Desert, pp. 45–50. Brill, Leiden (2005). https://doi.org/10.1163/9789047403999_006
(83) Wise, M.O.: Language and Literacy in Roman Judaea. Yale University Press, New Haven, CT (2015)
(84) Pajunen, M.: 4QSapiential Admonitions B (4Q185): Unsolved challenges of the Hebrew text. In: Brooke, G., Høgenhaven, J. (eds.) The Mermaid and the Partridge, pp. 191–220. Brill, Leiden (2011). https://doi.org/10.1163/ej.9789004194304.i-310.41
(85) Degano, I., Modugno, F., Bonaduce, I., Ribechini, E., Colombini, M.P.: Recent advances in analytical pyrolysis to investigate organic materials in heritage science. Angewandte Chemie International Edition 57, 7313–7323 (2018). https://doi.org/10.1002/anie.201713404
(86) La Nasa, J., Modugno, F., Degano, I.: Liquid chromatography and mass spectrometry for the analysis of acylglycerols in art and archeology. Mass Spectrometry Reviews 40, 381–407 (2021). https://doi.org/10.1002/mas.21644
(87) La Nasa, J., Biale, G., Sabatini, F., Degano, I., Colombini, M.P., Modugno, F.: Synthetic materials in art: a new comprehensive approach for the characterization of multi-material artworks by analytical pyrolysis. Heritage Science 7, 1–14 (2019). https://doi.org/10.1186/s40494-019-0251-4
(88) Colombini, M.P., Modugno, F.: Organic Mass Spectrometry in Art and Archaeology. John Wiley & Sons, Hoboken, NJ (2009)
(89) La Nasa, J., Ghelardi, E., Degano, I., Modugno, F., Colombini, M.P.: Core shell stationary phases for a novel separation of triglycerides in plant oils by high performance liquid chromatography with electrospray-quadrupole-time of flight mass spectrometer. Journal of Chromatography A 1308, 114–124 (2013). https://doi.org/10.1016/j.chroma.2013.08.015
(90) Ghioni, C., Hiller, J.C., Kennedy, C.J., Aliev, A., Odlyha, M., Boulton, M., Wess, T.J.: Evidence of a distinct lipid fraction in historical parchments: a potential role in degradation? Journal of lipid research 46, 2726–2734 (2005). https://doi.org/10.1194/jlr.M500331-JLR200
(91) Charlesworth, J., Cotton, H., Flint, P.: Discoveries in the Judaean Desert: Volume XXXVIII. Miscellaneous Texts from the Judaean Desert. Clarendon Press, Oxford (2000)
(92) Tigchelaar, E.: 4Q1 (4QGen-Exod^a): Identification of fragments and comments. Textus 32, 19–38 (2023). https://doi.org/10.1163/2589255X-bja10028
(93) Ulrich, E., Cross, F.M.: Discoveries in the Judaean Desert: Volume XIV. Qumran Cave 4.IX. Deuteronomy, Joshua, Judges, Kings. Clarendon Press, Oxford (1995)
(94) Langlois, M.: Le premier manuscrit du Livre d’Hénoch: Étude épigraphique et philologique des fragments araméens de 4Q201 à Qumrân. Cerf, Paris (2011)
(95) Puech, E.: Les copies du livre de Josué dans les manuscrits de la mer Morte: 4Q47, 4Q48, 4Q123 et XJosué. Revue biblique 122, 481–506 (2015). https://doi.org/10.2143/RBI.122.4.3149591
(96) Cross, F.M., Parry, D.W., Saley, R.J., Ulrich, E.: Discoveries in the Judaean Desert: Volume XVII. Qumran Cave 4.XII. 1–2 Samuel. Clarendon Press, Oxford (2005)
(97) Strugnell, J.: Notes en marge du volume V des “Discoveries in the Judaean Desert of Jordan”. Revue de Qumran 7, 163–276 (1970)
(98) Tigchelaar, E.: Lamentations 4:21-22 as another word of consolation in 4Q176. Revue de Qumran 31, 3–9 (2019). https://doi.org/10.2143/RQ.31.1.3286503
(99) Milik, J.T.: The Books of Enoch: Aramaic Fragments of Qumrân Cave 4. Clarendon Press, London (1976)
(100) Sanders, J.A.: Discoveries in the Judaean Desert: Volume IV. The Psalms Scroll of Qumran Cave XI. Clarendon Press, Oxford (1965)
(101) Charlesworth, J.H., Milgrom, J., Qimron, E., Schiffmann, L.H., Stuckenbruck, L.T., Whitaker, R.E. (eds.): The Dead Sea Scrolls. Hebrew, Aramaic, and Greek Texts with English Translations, Volume 1: Rule of the Community and Related Documents. JCB Mohr (Paul Siebeck), Tübingen (1994)
(102) Puech, E.: L’alphabet cryptique A en 4QS^e (4Q259). Revue de Qumran 18, 429–435 (1998)
(103) Puech, E.: Discoveries in the Judaean Desert: Volume XXV. Qumrân Grotte 4.XVIII. Textes hébreux: 4Q521-4Q528, 4Q576-4Q579. Clarendon Press, Oxford (1998)
(104) Ulrich, E., Cross, F.M., Davila, J.R.: Discoveries in the Judaean Desert: Volume XII. Qumran Cave 4.VII. Genesis to Numbers. Clarendon Press, Oxford (1995)
(105) Baumgarten, J.M.: Discoveries in the Judaean Desert: Volume XVIII. Qumran Cave 4.XIII. The Damascus Document (4Q266-273). Clarendon Press, Oxford (1996)
(106) Sirat, C.: Les manuscrits en caractères hébraïques: Réalités d’hier et histoire d’aujourd’hui. Scrittura e civiltà 10, 239–288 (1986)
(107) Ulrich, E.: Discoveries in the Judaean Desert: Volume XVI. Qumran Cave 4.XI. Psalms to Chronicles. Clarendon Press, Oxford (2000)
(108) Cross, F.M.: The Ancient Library of Qumran, 2nd edn. Doubleday, Garden City, NY (1961)
(109) Drawnel, H.: Qumran Cave 4: The Aramaic Books of Enoch, 4Q201, 4Q202, 4Q204, 4Q205, 4Q206, 4Q207, 4Q212. Oxford University Press, Oxford (2019)
(110) Broshi, M., Eshel, E., Fitzmyer, J.: Discoveries in the Judaean Desert: Volume XIX. Qumran Cave 4.XIV. Parabiblical Texts, Part 2. Clarendon Press, Oxford (1996)
(111) Strugnell, J., Harrington, D., Elgvin, T.: Discoveries in the Judaean Desert: Volume XXXIV. Qumran Cave 4.XXIV. Sapiential Texts, Part 2: 4QInstruction (Mûsār l^ĕ Mēvîn): 4Q415 ff. Clarendon Press, Oxford (1999)
(112) Kimball, S., Mattis, P.: GNU Image Manipulation Program - GIMP (version 2.8.6). https://www.gimp.org/ (2023)
(113) Mumuni, A., Mumuni, F.: Data augmentation: A comprehensive survey of modern approaches. Array 16, 100258 (2022). https://doi.org/10.1016/j.array.2022.100258
(114) Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982). https://doi.org/10.1007/bf00337288
(115) Joachims, T.: Learning to Classify Text Using Support Vector Machines, 2002 edn. The Springer International Series in Engineering and Computer Science. Springer, Dordrecht (2002). https://doi.org/10.1007/978-1-4615-0907-3
(116) Bulacu, M., Schomaker, L.: Combining Multiple Features for Text-Independent Writer Identification and Verification. In: Lorette, G. (ed.) Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft, La Baule (France) (2006). Université de Rennes 1. http://www.suvisoft.com. https://hal.inria.fr/inria-00104189
(117) Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin/Heidelberg (2009)
(118) scikit-learn developers: scikit-learn: Bayesian Ridge Regression. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html. Accessed: 2021-04-15
(119) Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin/Heidelberg (2006)
(120) Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
(121) Villalobos, P., Sevilla, J., Heim, L., Besiroglu, T., Hobbhahn, M., Ho, A.: Will we run out of data? an analysis of the limits of scaling datasets in machine learning. arXiv preprint arXiv:2211.04325 (2022)
(122) Epoch: Parameter, Compute and Data Trends in Machine Learning. https://epochai.org/data/pcd (2022)
(123) Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., He, Q.: A comprehensive survey on transfer learning. Proceedings of the IEEE 109, 43–76 (2021). https://doi.org/10.1109/jproc.2020.3004555
(124) Ribani, R., Marengoni, M.: A survey of transfer learning for convolutional neural networks. In: 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images Tutorials (SIBGRAPI-T), pp. 47–57 (2019). https://doi.org/10.1109/sibgrapi-t.2019.00010. IEEE
(125) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv (2020). https://doi.org/10.48550/ARXIV.2010.11929
(126) Willemink, M.J., Roth, H.R., Sandfort, V.: Toward foundational deep learning models for medical imaging in the new era of transformer networks. Radiology: Artificial Intelligence 4(6), 210284 (2022). https://doi.org/10.1148/ryai.210284
(127) Thambawita, V., Strümke, I., Hicks, S.A., Halvorsen, P., Parasa, S., Riegler, M.A.: Impact of image resolution on deep learning performance in endoscopy image classification: An experimental study using a large dataset of endoscopic images. Diagnostics 11(12), 2183 (2021). https://doi.org/10.3390/diagnostics11122183
(128) Haja, A., Schomaker, L.R.B.: A fully automated end-to-end process for fluorescence microscopy images of yeast cells: From segmentation to detection and classification. In: Lecture Notes in Electrical Engineering, pp. 37–46. Springer, ??? (2021). https://doi.org/10.1007/978-981-16-3880-0_5
(129) Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Silva, V.W.K., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J.: Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature Medicine 25(8), 1301–1309 (2019). https://doi.org/10.1038/s41591-019-0508-1
(130) Ivezić, Ž., Kahn, S.M., Tyson, J.A., Abel, B., Acosta, E., Allsman, R., Alonso, D., AlSayyad, Y., Anderson, S.F., Andrew, J., et al.: Lsst: from science drivers to reference design and anticipated data products. The Astrophysical Journal 873(2), 111 (2019)
(131) Liu, C., Zoph, B., Neumann, M., Shlens, J., Hua, W., Li, L.-J., Fei-Fei, L., Yuille, A., Huang, J., Murphy, K.: Progressive neural architecture search. In: Computer Vision – ECCV 2018, pp. 19–35. Springer, ??? (2018). https://doi.org/10.1007/978-3-030-01246-5_2
(132) Ulrich, E.C., Cross, F.M., Fuller, R.E.: Discoveries in the Judaean Desert XV. Qumran Cave 4.X, The Prophets. Clarendon Press, Oxford (1997)
(133) Brooke, G., Collins, J., Elgvin, T., Flint, P., Greenfield, J., Larson, E., Newsom, C., Puech, E., Schiffman, L., Stone, M., Trebolle Barrera, J., VanderKam, J.: Discoveries in the Judaean Desert XXII: Qumran Cave 4. XVII, Parabiblical Texts, Part 3. Clarendon Press, Oxford (1997)
(134) Puech, E.: La Lettre essénienne MMT dans le manuscrit 4Q397 et les parallèles. Revue de Qumran 27, 99–135 (2015). https://doi.org/%****␣ms.tex␣Line␣2075␣****10.2143/RQ.27.1.3129665
(135) Qimron, E., Strugnell, J.: Discoveries in the Judaean Desert: Volume X. Qumran Cave 4.V, Miqṣat Ma\laspaśe Ha-Torah. Clarendon Press, Oxford (1994)
(136) Doudna, G.: Dating the scrolls on the basis of radiocarbon analysis. In: Flint, P.W., VanderKam, J.C. (eds.) The Dead Sea Scrolls After Fifty Years: A Comprehensive Assessment, Volume One, pp. 430–471. Brill, Leiden (1998)
(137) Carmi, I.: Are the ¹⁴C dates of the Dead Sea Scrolls affected by castor oil contamination? Radiocarbon 44, 213–216 (2002). https://doi.org/10.1017/s0033822200064808

Appendix A The dating problem of the Dead Sea Scrolls

There is broad agreement in scholarship about the long-term lines of development of Aramaic and Hebrew script in Judaea from the fourth century BCE until the second century CE as an evolution from imperial Aramaic chancery script of the fourth century BCE to what became the dominant Jewish square script in the first and second centuries CE. However, when we zoom into the specifics of the centuries in between, the finer typological and chronological distinctions—misleadingly connected with historical-political eras—are not reliably grounded in the data; rather, they rely on so-called absolute pegs that are not absolute at all and on unsubstantiated suppositions about historical processes that would have influenced palaeographic developments.

The main problem is that there is a palaeographic gap between the third century BCE and the second century CE. There is a lack of absolute dates across the time period of the scrolls.

A.1 Too few date-bearing manuscripts to compare with

Palaeographic comparison of undated and dated manuscripts with a similar script is not possible. Only few date-bearing manuscripts have survived and those are at the outer limits of the date range. The oldest, from fourth-century BCE Wadi Daliyeh Gropp2001-dy and fourth-century BCE Bactria Naveh2012-vv , have script comparable to only one or two manuscripts, 4Q52 and 4Q70 (see also Sections A.2.2 in Appendix A and D.1.1 in Appendix B), but not the vast majority of the scrolls. The manuscripts from fifth-century BCE Elephantine are even further away in time PortenYardeni19861999 .

The youngest, from first- and second-century CE Murabba\laspat yardeni2000-mq ; Benoit1961 and Naḥal Ḥever yardeni2000-mq ; CottonYardeni1997 , are mostly in cursive script and cannot be used to compare and date the vast majority of the Hebrew and Aramaic scrolls written in more formal scripts. Those dated manuscripts include about 30 documentary texts, mainly from Murabba\laspat and Naḥal Ḥever. From the same period are 15 undated but datable letters, mostly in cursive script, to and from Simon bar Kokhba, the leader of the revolt against the Romans in 132–135 CE. Dated documents written in formal or bookhand script are limited to a farming contract from Murabba\laspat (Mur24) and three leases of land from Naḥal Ḥever (5/6Ḥev 44, 45, 46), from 133 and 134 CE.

Only one dated ostracon, from 176 BCE from Maresha eshel1996aramaic is known from the crucial period between the third century BCE and the first century CE. Another ostracon, from Khirbet el-Qom, is partially dated, and could date from 277, 241, or 217/6 BCEgeraty1975khirbet . Yet, these can hardly be used for dating formal hands, and cannot even serve as an indicative time marker to tie in manuscripts with a semicursive handwriting.

A.2 Weak workarounds

The way taken by Cross and others around the lack of date-bearing documents in formal, semiformal, or semicursive script from the third century BCE until the first century CE does not solve the problem. The relative development and absolute chronology of the scrolls’ palaeography was determined by taking recourse to a combination of a. supposed absolute pegs and b. unsubstantiated palaeographic and historical suppositions:

A.2.1 Not so absolute time markers

Cross Cross2003 claimed that his model was pegged by a series of absolute datings, in scores if not hundreds of documents inscribed on a variety of materials, especially from the late first century BCE and first century CE. Puech Puech2017 provided additional pegs, specifically for the less formal Hasmonaean hands. Cross and Puech relied on inscriptions on other surfaces such as stone and metal, but here too there are no absolute dates, not even for the most important pegs, such as the Benei Ḥezir tomb and the Jason’s tomb inscriptions. Avigad Avigad1965 acknowledged this, but his caution seems to have been forgotten.

A telling example is the estimated date of the Benei Ḥezir tomb inscription in Jerusalem’s Kidron Valley (CIIP 137 eck2010corpus ), which, according to Cross, had been dated securely, on the basis of archaeological and historical evidence, to the end of the first century BCE. Based on architectural typology of the Hellenistic-style façade and Josephus’s description of the Maccabees’ family tomb in Modi\laspin, Avigad avigad1954 initially suggested to date the tomb to the mid-second century BCE. He then estimated the inscription, which lists eight priests from two generations who had been interred in the tomb, to have been made on the façade one or two generations after the construction of the tomb, in the first half of the first century BCE. Later, he dated the inscription palaeographically to the second half of the first century BCE, or to the Herodian period, and on that basis redated the tomb to the end of the Hasmonaean period Avigad1965 . The precise length of time between the construction of the façade and the writing of the inscription (how many years are one to two generations?) is a conjecture.

After the 2000–2001 exploration of the Benei Ḥezir and Zechariah tombs, Barag Barag2003 put forward new data and interpretations which would indicate that the tomb dated to the period of flourishing in Jerusalem between ca. 132/1 and 63 BCE, most likely in the first century BCE. For example, it features the new type of tombs typical of the Hasmonaean period, which became common in the first century BCE. In the same direction point correspondences with Nabataean tomb architecture (undated but supposed to go back to the first century BCE), which, Barag argued, likely inspired the Benei Ḥezir tomb. As for the inscription, which he conjectured to be 50–100 years younger than the construction of the tomb, he compared its writing to that of the bronze coins of the 25th year of Alexander Jannaeus (79/8) BCE, and posited that the script of the Benei Ḥezir inscription would seem to be slightly later, from the late Hasmonaean or early Herodian period.

Without mentioning the Benei Ḥezir inscription, Naveh Naveh1986 had identified the script on the Alexander Jannaeus coinage as ‘vulgar semiformal’ and saw its closest parallels to the letters found on ossuaries. Cross Baillet1962 had described this style as a “crude simplified derivative” of the formal Herodian hand. Naveh’s aligning of the letters of the coins with those of the ossuaries might suggest that this type of Herodian hand was already anticipated by the Jannaeus coins. Naveh therefore referred to the palaeographical significance of these coins. One should note, however, that neither Naveh nor Barag carefully analysed the letters of the coins.

Summarizing, all scholars associate the Benei Ḥezir tomb with the Maccabaees or the Hasmonaean period (either early or late), and date its inscription to the first century BCE. Yet, Cross’s claim that a late first-century BCE date is secure and an absolute peg, cannot be sustained. The date estimates of the tomb and its inscriptions are not only based on architectural typology, but also on the palaeographic typology. None of the evidence argues against a mid-first century BCE or even earlier date of the inscription.

This is one example to demonstrate that inscriptions in Hebrew and Aramaic on other surfaces, such as stone and metal, cannot fill the void of absolute dating pegs between the third century BCE until the first century CE. In addition to the Benei Ḥezir burial inscription, this applies also, for example, to the so-called Queen Helena inscription (CIIP 123 eck2010corpus ) and Uzziah plaque (CIIP 602 eck2010corpus ) from the first century CE. Strictly speaking, these are not absolutely dated. The same applies to the Jason’s tomb inscriptions (CIIP 392-397 eck2010corpus ), the date of which is not fixed either. Puech Puech2001 ; Puech1983 had initially argued on the basis of his reconstruction of the historical background of the inscriptions that the main one in Aramaic (CIIP 392) must be dated to 82/1 BCE, but more recently he stated that the Aramaic inscription dates palaeographically to about the middle of the first century BCE or slightly earlier Puech2017 . Yet, Yardeni dated the inscription shortly before the destruction of the tomb by an earthquake in 31 BCE eck2010corpus .

Another example are the hundreds of ossuary inscriptions, which Cross Cross1955 said to virtually all belong to the Herodian era. A post-20/15 BCE date for the ossuaries may be archaeologically correct Magness2005 , but the political and historical framing to the Herodian period does not limit the emergence of the script exhibited on the ossuaries to that period. The question when the so-called Herodian script came into being is decided somewhat arbitrarily. Cross took 30 BCE, Milik and Baillet 50 BCE Tigchelaar2020 . Avigad also took 50 BCE or slightly earlier. Furthermore, Avigad already acknowledged that scrolls referred to as ‘Herodian’ may easily be earlier than this period Avigad1965 . In other words, even for the ‘Herodian’ script, just as for the ‘Hasmonaean’ script (see below), the emergence is difficult to establish. In terms of typological development, we have to reckon with the possibility of longer, broader time frames for both scripts.

A.2.2 Unsubstantiated palaeographic and historical premises

Even if one were to accept Cross’s recourse to a series of absolute datings, these would support mainly late first-century BCE and first-century CE comparisons. They do not help to establish the beginnings of the ‘Hasmonaean’ script. Lacking dated material from the third and second centuries BCE, Cross had to take further recourse to two premises to attempt to establish the upper range of the oldest scripts, ‘Archaic’ and ‘Hasmonaean’, from the scrolls, and to limit the earliest dating of the scrolls mainly to the second century BCE, with only a few exceptions for older ‘Archaic’ manuscripts such as 4Q52 and 4Q70.

In addition to a lack of time markers, two palaeographic and historical premises by Cross, Yardeni and others stand out: a slow development of the Aramaic/Hebrew script in the early Hellenistic period (third century BCE); and the emergence of a national script as a watershed around 200–150 BCE.

The presumed slow development of the Aramaic/Hebrew script in the early Hellenistic period is not supported by any dated evidence of that period. The assumption was in part based on a few undated cursive Aramaic papyri from Egypt containing Greek names (hence assumed to be from the third century BCE), but the later discovery of the dated Wadi Daliyeh papyri showed that there were different lines of development, some having taken place much earlier Cross1974 ; Yardeni1990 , thus challenging the premise of the slow development, and reducing the importance of those Hellenistic Egyptian Aramaic papyri for establishing the evolution of the Aramaic/Hebrew script. For Judaea, Cross Cross1955 also referred in passing to a conservative palaeography for the copying of sacred texts, but without further explanation or supporting evidence.

Cross initially dated 4Q52 (4QSam^b) to “the last quarter of the third century B.C.” Cross1955 , “no doubt late in the century” Cross1961 , but after the discovery of the Wadi Daliyeh manuscripts, simply to “ca. 250 B.C.” Cross1974 or “the mid-third century BCE” Cross1998 ; Cross2003 . He seems to have been reluctant to date 4Q52 and also 4Q70 (4QJer^a) earlier, and therefore assumed a very slow evolution of the script, so as not to have a large time gap with the manuscripts written in what he called the “early Hasmonaean” script and which he dated to ca. 150 BCE.

Yardeni, too, regarded 4Q52 and 4Q70 as examples of a transitional stage from the fourth and third century BCE Aramaic script in the direction of the ‘Hasmonaean’ script Yardeni1990 . Her conclusion that these two manuscripts could therefore be dated to the late third or early second century BCE seems to be based rather on the supposed proximity to this national script than on the correspondences with the earlier Aramaic scripts.

However, the palaeographic principle is to date an undated manuscript by comparing its script to that of dated writings with a similar script. This means that the oldest manuscript of the scrolls, 4Q52, must be compared to the Aramaic evidence from Wadi Daliyeh from the fourth century BCE. 4Q52 should then be chronologically closer to those manuscripts, especially WDSP 1 (335 BCE).

The hypothesis of the emergence of a national script around 200–150 BCE and the supposition that the ‘Hasmonaean’ script was a development of the Hasmonaean period after 150 BCE are not supported by any dated evidence but based on historical assumptions, given in passing, about a “nationalistic expansion and resurgent Orientalism” Cross2003 after the death of the Seleucid king Antiochus IV Epiphanes (164 BCE). These unfounded assumptions were then imposed as an interpretative framework on the manuscript evidence. But, given the absence of dated material from the third and second century BCE, there are no historical, typological or other palaeographic reasons for limiting the rise of the script which Cross called ‘Hasmonaean’ to the mid-second century BCE.

This means that manuscripts written in ‘Hasmonaean’ script may date already from the early second century or from the third century BCE. This older dating is also realistic when manuscripts written in so-called ‘Archaic’ script, such as 4Q52 or 4Q70, can be dated earlier in the third century BCE or, for 4Q52, even perhaps in the late fourth century BCE. Furthermore, this older dating can be independently supported by the ¹⁴C dating results in this study (see Section D.1 in Appendix D).

A.3 The way out of the gap

Summarizing, the dating problem of the Dead Sea Scrolls, due to the absence of calendar dates, is further confounded by the fact that there are no other date-bearing manuscripts in similar script available for palaeographic comparison. This lack of date-bearing documents cannot be overcome by using inscriptions on other surfaces instead because these, too, have no absolute dates. Also, datable inscriptions mainly date from the first century BCE and first century CE and thus cannot shed light on script developments in the third and second centuries BCE. Historical premises and assumptions remain unsubstantiated and devoid of factual support, and they fail to support a chronological framework for the palaeography and the manuscript evidence. These assumptions cannot determine or sufficiently constrain the dates connected to the writing of the scrolls.

Therefore, ¹⁴C dates derived from manuscript samples are needed as absolute time markers to lead the way out of the palaeographic gap. In the absence of an abundance of date-bearing manuscripts written in similar script available for palaeographic comparison, ¹⁴C dating, a scientific measurement (“yardstick of time”), provides more reliable time markers, and in combination with our style-based date-prediction model Enoch even more precise time markers.

Appendix B Radiocarbon dating of the Dead Sea Scrolls

Two series of Dead Sea Scrolls were radiocarbon dated in the 1990s, in Zurich and in Tucson, Arizona bonani1991radiocarbon ; Bonani1992 ; Jull1995 . In addition, three samples were submitted to Oxford but in all three cases the chemistry is recorded as having “failed,” i.e., no sample to measure; probably the samples completely dissolved during the pretreatment phase (communication from R. Hedges, Research Laboratory for Archaeology, Oxford, 7 January 2005).

Although scrolls were radiocarbon dated in the 1990s, new radiocarbon dating was necessary because of castor oil contamination issues with these previous dates. Furthermore, since then, radiocarbon dating methods and procedures have improved significantly in terms of better calibration, higher precision obtained by more modern methods and instruments, and also more effective cleaning procedures for dealing with contaminated samples.

In this study, we have taken the following analytical steps for the samples:

1.

They were precleaned by a Soxhlet procedure in Odense (see Sections B.2 and B.7.1);
2.

Subsequently, they were further pretreated by standard methods in Groningen (see Section B.2);
3.

The cleaned samples were dated by Accelerator Mass Spectrometry (AMS) in Groningen (see Sections B.3–B.6);
4.

During the study, the residual lipids in the extracts of all 30 samples after the Soxhlet cleaning were analysed, and 17 samples have been further investigated by specialized analytical chemistry methods in Pisa regarding the nature of the contamination (see Sections B.7.2–B.7.5).

B.1 Selection of Samples

The 30 samples we received from the Israel Antiquities Authority (IAA) were selected on the basis of script and presumed period so as to obtain reliable time markers in the palaeographic gap between the fourth century BCE and the second century CE. We made this selection at the start of the project on the basis of the default model in the field (see Appendix A). The dates associated with the manuscripts according to this traditional model provided balanced coverage of the timeline under investigation (as can be seen in Figure 4). Also, because the ¹⁴C dates are needed to go into the date-prediction model, we selected manuscripts that contain a sufficient number of characters in their extant material, 150–200 Brink2008 . The manuscript identity and presumed palaeographic periods of the samples were not known to the staff of the laboratories in Groningen, Odense, and Pisa at the time of the measurement. One of the 30 samples, from a date-bearing document (Mur19), was added as a control text. Its identity and date were also unknown to the laboratories at the time of measurement. Furthermore, in consultation with the IAA, the final selection of samples was determined also on the basis of practical and conservational considerations regarding specific manuscript remains. The IAA provided general indications concerning where the physical samples were taken from (see Appendix J). In our sample set, we have 28 manuscripts of animal skin, and 2 of papyrus (4Q255/4Q433a and Mur19).

From the first century CE onward, a clear distinction appears in the manuscript evidence between the square bookhand script and the standard cursive style yardeni2000-mq , but such a distinction is less pronounced in the manuscript evidence of earlier periods. This also applies to the distinctions made between formal, semiformal, and semicursive styles. Across the continuum of the chronological range covered by the scrolls, exemplary specimens for some styles are lacking Cross2003 . Often manuscripts exhibit a mixture of these presumed styles Popovic2023 ; Tigchelaar2020 ; https://doi.org/10.25592/uhhfdm.739 . Therefore, our sampled manuscripts cover all three categories and their mixtures. The cursive style has been excluded from our sampling, except for Mur19 which was used as a validation test for ¹⁴C.

B.2 Soxhlet Treatment and AAA Pretreatment

Castor oil was used in the 1950s by the original team of scholars reconstructing and editing the Dead Sea Scrolls to clean the manuscripts and to improve readability of the text. But the castor oil needs to be removed, because it would give a misleading ¹⁴C age that was “younger” than the true age of the sample. Later testing showed that not all castor oil will be removed even by the standard AAA (Acid-Alkali-Acid) protocol, let alone by the reduced form of the standard protocol used in the 1990s in Zurich and Tucson rasmussen2001effects ; rasmussen2003reply ; rasmussen2009effects .

Before the actual start of the project, we received 2 test samples from the IAA which were relatively large (tens of milligrams). These were materials without context but of scrolls origin according to the IAA. Both samples were subjected to the standard AAA treatment, but the material immediately started dissolving before our eyes during the first acid step. This meant we could not apply the standard method; also considering the test samples were much larger than the identified manuscript samples we were to receive.

In our project the first step was to pre-clean the samples by a liquid extraction with suitable solvents, performed inside a Soxhlet apparatus to remove the castor oil contamination. The Soxhlet treatment was carried out in Odense, initially in three but subsequently in four extraction steps. The latter was done for redundancy, and not because of suspicion that the three-step procedure was not sufficient within the given dating uncertainty. The fourth step furthermore guaranteed that even more potential contaminants were removed. Castor oil is a plant product, which consists of several triglycerides and free fatty acids. The Soxhlet treatment is designed to remove lipid material to a high extent, and the analyses done in Pisa by HPLC-MS and Py-GC/MS are performed to demonstrate that the amount of the remaining lipid material, including fatty acids, is below a threshold which does not significantly skew the radiocarbon date (see section B.7).

The scrolls are extremely delicate material. Their fragility is an issue for the chemical cleaning of samples for radiocarbon dating. In the 1990s, the Zurich and Tucson laboratories had to adjust or stop the AAA pretreatment because the samples were dissolving Bonani1992 ; Jull1995 . Most of these samples were much larger than the samples in the current study. With much smaller sample materials, we also had to adjust the standard chemical pretreatment.

Following the Soxhlet treatment in Odense, the samples were further prepared for dating in Groningen. The pretreatment was adapted to Acid only in a “soft” form: 0.5–1% HCl, refrigerator temperature (ca. 4°C) and only for 10 minutes. Next, we dried the sample in an oven at a temperature of 80°C overnight. Using diluted HCl and skipping the Alkali step is necessary because of the delicate nature of the samples. This is justified because of the conditions the scrolls were kept in. No significant amounts of foreign materials that could cause errors larger than the measurement uncertainties were observed. Our procedure is proven correct because the sample with a known historical date (Mur19) was ¹⁴C dated correctly. Combined with the Soxhlet treatment, this is the optimum treatment for this delicate material, and generally effective.

The scrolls were stored in caves in the Judaean desert in the absence of humic acids and constant groundwater. In particular the humic acids constitute a problem for many other archaeological excavations worldwide and they are the main reason that necessitates the alkaline bath in the standard pretreatment protocol (the second A in AAA). The environment in the caves can be characterized as limestone, gypsum and marls — none of which has the potential to inflict alkali-soluble compounds onto the parchments. Similarly, bat guano and excretions from other small animals who have possibly found their way into the caves over the centuries are unlikely to contain humic acids, and therefore their deposits are likely to be dissolvable in either the more polar solvents of the Soxhlet treatment (i.e., the ethanol) or in the acidic bath of the pretreatment in the radiocarbon laboratory. And even further, the pyrolysis-gas-chromatography measurements did not reveal any compounds unaccounted for (see Section B.7.4); that includes the alkanes that can be considered markers for bat guano Queffelec2018 .

B.3 AMS Measurements

After cleaning, the samples were combusted into CO₂ gas. For the GrA dates, the gas is subsequently reduced to graphite using H₂. Subsequently, the ¹⁴C content was measured in this graphite. This method was also applied by the GrM machine for routine dating. However, this machine also has the option to measure the ¹⁴C content in CO₂, skipping the graphite production step. This is very useful for small samples, as is the case for many scroll samples. Therefore, for scroll samples measured by the Micadas, the gas source was used. For more details on measurement procedures see Dee2019 .

For the 30 samples in this study, there is a grand total of 131 individual AMS runs. This total number includes duplicate samples and multiple runs. In most cases a solid date can be calculated for the separate runs done for a particular scroll, based on averaging. The numbers reported reflect the measurements by AMS. In addition, there are aspects of sample integrity and pretreatment which are hard or even impossible to quantify. We have rejected 10 AMS runs for technical reasons, resulting in a final number of 121 accepted runs.

The ¹⁴C content in the sample is measured by AMS. The original AMS was a 2.5 MV Tandetron accelerator van2000status . It was decommissioned in 2017, and replaced by a Micadas system synal2007micadas . This took place during the project, so that both machines have been used to date the scroll samples. This allows for internal intercomparison (see Table LABEL:tab:summarized-c14). The Tandetron dates have laboratory code GrA; for the Micadas, this is GrM.

B.4 AMS Dating Results

Radiocarbon dates are reported by convention in BP, using a defined halflife and reference radioactivity for ¹⁴C, and a correction for isotopic fractionation using the stable isotope ¹³C mook1999reporting . The BP dates are converted to calendar dates, using the IntCal20 calibration curve (reimer2020intcal20 ) and OxCal software Oxcal . The calibration results in a non-Gaussian probability distribution of calendar dates. This distribution is given in 1 $\sigma$ (68.3% confidence) and 2 $\sigma$ (95.4% confidence) date ranges.

For the 30 samples, 27 yielded accepted dates; only 3 samples yielded inconsistent results and had to be technically rejected (4Q216, 11Q20, and Mur88; see Section B.6). Also, it appeared that the sample received for 4Q185 could not be ascertained as belonging to that particular manuscript. This sample is not used in our analysis (see Section B.5).

The resulting ¹⁴C dates for the 26 samples are shown in Table LABEL:tab:summarized-c14. Each individual ¹⁴C sample receives a unique laboratory number. As the table shows, each scroll is dated at least twice. In addition, many measurement batches were repeated (thus yielding two dates per graphite sample). The resulting ¹⁴C age shown is the averaged number for all accepted runs. Overall, the logistics is complex. For example, the sample 4Q114 (4QDaniel^c) has been dated in 7 runs. Two samples were received from the IAA. Graphite was prepared from all material of the first sample, and it was dated by the GrA machine. There were 3 runs from the same graphite (to increase the ¹⁴C statistics), so all have the same GrA number; the 3 runs are triplicates and can be taken together as 1 GrA date. An additional second sample was received later. From this sample we dated 4 subsamples in 4 runs by the GrM machine. Hence there are 4 GrM numbers.

The resulting BP dates are very precise, with 1 $\sigma$ uncertainties of only 15–28 years. For the full results of all runs with more details (in particular Carbon yield and $\delta$ ¹³C value), see Appendix K.

Table LABEL:tab:summarized-c14 shows the summarized results of 26 accepted ¹⁴C dates: laboratory code, sample identification, ¹⁴C age (BP), its sigma (BP), and calibrated dates (both 1 $\sigma$ and 2 $\sigma$ ranges). The OxCal plots can be seen in Appendix C.

Although the most recent calibration curve, IntCal20, has a resolution of 1 calendar year that does not mean 1-year resolution is significant. The measurement precision for the ¹⁴C dates is, at best, 15 ¹⁴C years, and often a few decades. Moreover, OxCal can be calculated for 1 year, but the default resolution of OxCal is 5 years without any interpolation. However, if the resolution is set to less than 5 years, the curve will be interpolated by a cubic function. A cubic function is a polynomial function of degree 3, which, in the case of OxCal, performs interpolation of two different data points to obtain intermediate points. This is a mathematical formulation and not a calibration of 1-year resolution. Hence, we do not take a 1-year interpolated resolution but present the raw 5-year resolution data from OxCal. For more details, we refer to https://c14.arch.ox.ac.uk/oxcalhelp/hlp_analysis_inform.html.

Furthermore, for the time range relevant for the scrolls our calibrated results are often bimodal, especially for 2 $\sigma$ distributions which we use for our further analyses for firmer grounding of our date-prediction model. The calibrated results from the 1990s were also often bimodal bonani1991radiocarbon ; Bonani1992 ; Jull1995 . This bimodality is an effect of the calibration curve not being linear, showing peaks and other irregularities caused by variations in the cosmic ray flux which produces ¹⁴C in the earth’s atmosphere vanderPlicht2022 .

Table LABEL:tab:result-4q185 shows the valid and acceptable radiocarbon date of the sample received for 4Q185 but the date cannot be used (see Section B.5).

Samples of the 3 scrolls 4Q216, 11Q20 and Mur88 did not produce acceptable ¹⁴C dates; these are summarized in Table 4 (see Section B.6).

Table 3: Summarized results of 26 accepted ¹⁴C dates: laboratory code, sample identification, ¹⁴C age (BP), sigma (BP), calibrated ranges (1

\sigma

and 2

\sigma

ranges) in 5-year resolution.

lab code

scroll

age (BP)

\sigma

calibrated ranges (1

\sigma

)

calibrated ranges (2

\sigma

)

GrA-68446

P421-Fr004

2164

345–320, 205–170 BCE

355–285, 230–150 BCE

GrA-68447

4Q504

(4QDibHam^a)

GrA-69793

P206-Fr003

2303

405–365 BCE

410–355, 285–230 BCE

GrM-10677

4Q52 (4QSam^b)

GrM-10678

GrA-69794

P285-Fr002

2153

345–320, 205–165 BCE

355–300, 210–100, 70–60 BCE

GrM-10679

4Q176 (4QTanh)

GrM-10680

GrA-69795

P224-Fr001

2168

345–315, 205–175 BCE

355–285, 230–160 BCE

GrM-13252

4Q114 (4QDan^c)

GrM-13253

GrM-13254

GrM-13255

GrM-10659

P891-Fr003

1940

25–45, 55–125 CE

10–205 CE

GrM-10660

5/6Hev1b (Ps)

GrA-69810

P585-Fr001

2028

45 BCE–10 CE

90–80 BCE, 55 BCE–30 CE, 45–60 CE

GrM-10661

4Q161 (4QpIsa^a)

GrM-10662

GrM-11151

P1111-Fr010

2226

365–350, 295–205 BCE

375–345, 320–200 BCE

GrM-11152

4Q70 (4QJer^a)¹¹1This fragment was previously unidentified, but see now for a positive identification Tigchelaar2020B .

GrM-11170

GrM-11171

GrM-11153

P1093-Fr005

2155

345–320, 200–165 BCE

355–290, 210–100 BCE

GrM-11154

4Q47 (4QJosh^a)

GrM-11172

GrM-11155

P271-Fr002

2152

350–315, 205–150, 130–120 BCE

355–285, 230–220, 210–95, 75–55 BCE

GrM-11156

4Q23 (4QLevNum^a)

GrM-11166

P177-Fr001

2100

155–90, 75–55 BCE

170–50 BCE

GrM-11167

4Q255/4Q433a

(4QpapS^a/4Qpap

Hodayot-like Text B)

GrM-11184

GrM-11185

GrM-11168

P977-Fr004

1967

20–80, 100–110 CE

35–15 BCE, 5–120 CE

GrM-11169

11Q5 (11QPs^a)

GrM-11186

GrM-11187

GrM-14380

P393-Fr005

2123

175–100, 70–60 BCE

340–325, 200–50 BCE

GrM-14381

4Q3 (4QGen^c)

GrM-14228

GrM-14229

GrM-13385

P1081a-Fr002

2115

175–95, 75–55 BCE

340–330, 200–50 BCE

GrM-13386

4Q27 (4QNum^b)

GrM-13387

Px232-Fr001

2007

45 BCE–25 CE

50 BCE–65 CE

GrM-13388

Mas1k (MasShirShabb)

GrM-14175

GrM-14223

GrM-14382

P386-Fr001

2169

350–310, 210–170 BCE

360–280, 235–145, 135–120 BCE

GrM-14383

4Q206 (4QEn^e ar)

GrM-14230

GrM-14241

GrM-14565

P237-Fr007

2182

355–290 210–175 BCE

360–275, 260–245, 235–165 BCE

GrM-14566

4Q30 (4QDeut^c)

GrM-14395

GrM-14242

GrM-14243

GrM-13389

P904-Fr009

2077

110–45 BCE

165–40, 10–1 BCE

GrM-13390

4Q201/4Q338

(4QEn^a ar/

4QGenealogical List)

GrM-14173

GrM-14174

GrM-14396

P810-Fr011

2148

345–320, 205–150 BCE

350–310, 210–100, 70–55 BCE

GrM-14397

4Q259 (4QS^e)

GrM-14244

GrM-14245

GrM-14398

P180-Fr004

2130

200–100 BCE

345–320, 205–90, 80–50 BCE

GrM-14399

4Q416

(4QInstruction^b)

GrM-14246

GrM-14359

GrM-14400

P215-Fr004

2059

100–70, 60–35, 15 BCE–5 CE

155–130 BCE, 125 BCE–10 CE

GrM-14401

4Q2 (4QGen^b)

GrM-14360

GrM-14361

GrM-14567

P122A-Fr001

2126

195–185, 180–100, 70–60 BCE

345–320, 205–50 BCE

GrM-14568

4Q375

(4QapocrMoses^a)

GrM-14362

GrM-14363

GrM-13391

P534-Fr002

1998

40–10 BCE, 1–30, 40–60 CE

45 BCE–75 CE

GrM-13392

XHev/Se2

(XHev/Se Num^a)

GrM-14224

GrM-14225

GrM-14569

P147-Fr019

2148

345–320, 205–150 BCE

355–300, 210–95, 75–55 BCE

GrM-14570

4Q541

(4QapocrLevi^b)

GrM-14364

GrM-14365

GrM-14571

P330-Fr004

2159

350–315, 205–165 BCE

355–285, 230–100 BCE

GrM-14572

4Q521

(4QMessianic

Apocalypse)

GrM-14377

GrM-14366

GrM-13393

P107-Fr010

2151

345–315, 205–155 BCE

355–290, 210–95, 70–55 BCE

GrM-13394

4Q267

(4QDamascus^b)

GrM-14226

GrM-14227

GrM-14573

P879-Fr001

1987

35–15 BCE, 5–65 CE

45 BCE–85 CE, 95–110 CE

GrM-14574

Mur19 pap WrDiv

GrM-14378

GrM-14379

As was done in the 1990s Bonani1992 ; Jull1995 , we also tested our procedure by dating a date-bearing manuscript, Mur19. The text of Mur19 refers to “year 6 of Masada”, which is now understood as a reference from the first Jewish revolt against Rome to 71/72 CE Benoit1961 ; koffmahn1963dating ; yadin1965excavation ; goodblatt1999dating ; eshel2003documents ; Eshel2005 ; Wise2015 . The 2 $\sigma$ calibrated range is 45 BCE–85 CE (91.5%), 95–110 CE (3.9%). The ¹⁴C date is clearly consistent with the historical date, 71/72 CE.

B.5 Result not to be used for palaeography: 4Q185

From a radiocarbon point of view, the dating of the sample is a valid and acceptable result. However, because the sample fragment cannot be attributed to a larger manuscript, the date cannot be used for our palaeographic analysis.

For 4Q185 (4QSapiential Work), we had requested Plate 801 fragment 1. Because Plate 801 fragment 1 was sewn and encapsulated for exhibition, the IAA sent sample Plate 801 fragment 3 instead. Unfortunately, it is very uncertain that this sampled fragment is part of manuscript 4Q185. From a palaeographic perspective, identification with 4Q185 is doubtful. E.g., the letter ayin is different from other occurrences in the manuscript (see also Pajunen2011 ). For that reason, the measurement results cannot be used for our palaeographic purposes.

B.6 Technically rejected results: 4Q216, 11Q20, and Mur88

The various AMS runs for scrolls 4Q216, Mur88, and 11Q20 resulted in internally inconsistent results. No valid ¹⁴C date could be deduced. Therefore, the results are rejected for technical reasons.

For all three scrolls, different samples were received from the IAA in subsequent batches. The first samples were measured by the GrA machine, the subsequent samples were later during the project measured by the GrM machine.

For 4Q216 (4QJub^a), the first sample was measured for graphite (GrA-69799). For the second sample, two gas samples were measured (GrM-10675, 10676). The GrA and GrM measurements do not provide mutually consistent dates. In other words, both samples received from the IAA do not give consistent results. In addition, the measurements yield ¹⁴C dates which are impossibly old. We conclude that the sample material may not be homogeneous.

For 11Q20 (11QTemple^b), the first sample was measured for graphite in triplicate (GrA-69800). For the second sample, two different parts of the scroll sample were taken, and two gas samples measured for each (GrM-10681, 10682, 18827, 18828). The 3 GrA measurements are internally consistent, the same for the 4 GrM results. However, GrA and GrM do not provide mutually consistent dates. Also here, both samples received from the IAA do not give consistent results. We conclude that the sample material may not be homogeneous.

For Mur88 (MurXII), the first sample was measured for graphite in triplicate (GrA-69806). For the second sample, two different parts of the scroll sample were taken, and two gas samples measured for each of them (GrM-10663, 10664, 18829, 18830). The resulting GrA and GrM measurements yield three different ¹⁴C dates. Also here, the sample material may not be homogeneous.

For the full results of these runs with more details (in particular Carbon yield and $\delta$ ¹³C value) see Appendix K.

Table 4: Technically rejected results: 4Q216, 11Q20, and Mur88, laboratory code, sample identification, 14C age (BP), sigma (BP)

lab code

scroll

age (BP)

\sigma

GrA-69799

P385–Fr011

4Q216 (4QJub^a)

2342

GrM-10675

GrM-10676

P385–Fr011

4Q216 (4QJub^a)

2979

GrA-69800

P577-Fr014

11Q20 (11QTemple^b)

2027

GrM-10681

GrM-10682

P577-Fr014

11Q20 (11QTemple^b)

2183

GrM-18827

GrM-18828

P577-Fr014

11Q20 (11QTemple^b)

2202

GrA-69806

P64-Fr001

Mur88 (MurXII)

1950

GrM-10663

GrM-10664

P64-Fr001

Mur88 (MurXII)

1951

GrM-18829

GrM-18830

P64-Fr001

Mur88 (MurXII)

2053

B.7 Analytical Chemistry

B.7.1 Soxhlet extraction

Upon arrival of the samples in Odense, they were photographed, if this was not already done in Groningen. Detailing what was said in section B.2, the chemical cleaning procedure developed to remove later added contamination such as, e.g., castor oil, was the following. Three Soxhlet apparatuses were operated in parallel, with three samples mounted simultaneously one in each chamber. The Soxhlet apparatuses had different volumes: the first one operated with 100 mL of solvent, the second with 70 mL and the third with 50 mL of solvent. All solvents were of the highest quality available (LC-grade for Liquid Chromatography).

The cleaning procedure was initiated by running the whole set of solvents with no sample mounted, intended to clean the apparatus, the stainless-steel cage and glass utensils. Then a sample was placed in the stainless-steel cage mounted in a Soxhlet apparatus chamber. The first solvent was added to the lower flask. The first solvent was LC-grade ethanol LiChrosolv (1.11727.2500 from Merck). This was operated for one hour corresponding to ca. 50 turnovers of the solvent over the sample. The second solvent was LC-grade n-hexane LiChrosolv (1.03701.2500 from Merck), which was operated for four hours, corresponding to ca. 240 turnovers of the solvent over the sample. The third solvent applied was LC-grade ethanol LiChrosolv (1.11727.2500 from Merck), operated for one hour, corresponding to ca. 50 turnovers. After each step in the cleaning procedure samples of 8 mL of each of the solvent were transferred to pre-cleaned glass vials. That is, three samples of 8 mL of ethanol, hexane, and ethanol were procured after each step in the cleaning procedure. They were placed in a heating apparatus operating at 80°C, which evaporated the solvents in the glass vials to dryness, after which the glass vials were sealed with a lid. The condensate was later to be re-dissolved and analyzed by HPLC-MS in Pisa (see section 7.2). After cleaning, the samples were removed from the stainless-steel cages and brought to dryness for one night at 60°C at zero humidity in a Memmert HCP 108 Climate chamber. Following this, the samples were weighed, packed, and shipped to Groningen, there to undergo pretreatment and dating following ¹⁴C protocols.

This three-step Soxhlet protocol, which was developed by rasmussen2009effects , was applied to the first batch of 10 samples (4Q52, 4Q114, 4Q161, 4Q176, 4Q185, 4Q216, 4Q504, 11Q20, Mur88, 5/6Hev1b) which were analyzed in the project. Following the chromatographic-mass spectrometric analyses in Pisa of this first set of solvents, it was decided that a fourth cleaning step should be added to the procedure for the remaining 20 samples. This was done for redundancy, and not because of proof or suspicion that the three-step procedure was not sufficient within the given dating uncertainty. The fourth step was added to further ensure that castor oil and many other contaminants were removed even in the worst case scenario. The fourth Soxhlet step was performed using a 30:70 mixture of dichloromethane:hexane, both of LC-grade purity (dicholoromethane CHROMASOLV 34856 by Sigma-Aldrich, and n-hexane as described above), operated for one hour, corresponding to ca. 60 turnovers of the solvent over the sample.

B.7.2 Raman spectroscopy, optical microscopy, Py-GC/MS, and HPLC-MS analysis

The study of the materials constituting the scrolls was performed in Pisa using a multi-analytical approach based on chromatographic and spectroscopic analytical techniques. The use of these complementary approaches allowed us to characterize both the original materials of the parchments and to evaluate the possible occurrence of modern materials used for consolidating/restoring the scrolls. These results were used to define the best cleaning strategy to remove from the scrolls the modern materials that could affect the dating, and to evaluate the efficiency of the purification steps. In detail:

•

Raman spectroscopy and optical microscopy (OM) were used as non-invasive and non-destructive methods to evaluate the general appearance of the parchments and to characterize the possible occurrence of inorganic materials.
•

Analytical pyrolysis coupled with gas chromatography and mass spectrometry (Py-GC/MS) analyses were performed on small (ca. 100 µg) sub-samples of the samples before these went into cleaning treatment by Soxhlet and AAA to characterize the organic material constituting the scrolls and to evaluate the possible presence of modern synthetic materials used as consolidating materials. This technique represents one of the best methods to obtain a complete picture of the organic materials in a sample degano2018recent . Pyrolysis consists of a thermal decomposition of organic materials in absence of oxygen. This process leads to the formation of low molecular weight species that can be separated by gas chromatography and identified by mass spectrometry. This analytical approach allows to obtain specific molecular markers that can be used to identify the source of organic materials.
•

Liquid chromatography coupled with mass spectrometry (HPLC-MS) was applied to evaluate the content of lipid materials present in Soxhlet extracts from parchments during the cleaning steps. This is among the best approaches for the separation and characterization of complex mixtures of lipid materials, such as castor oil. The use of mass spectrometry as detection system allows to obtain information on the glyceride chemical structure la2021liquid . This information cannot be achieved using more conventional analytical approaches such as GC/MS. Moreover, this method allows to detect very low amounts of analytes.

B.7.3 Results of the optical microscopy and Raman spectroscopy analyses performed on 17 samples

The microscopy observations and micro-Raman analyses were performed on samples 4Q2, 4Q3, 4Q27, 4Q30, 4Q114, 4Q201/4Q338, 4Q206, 4Q216, 4Q259, 4Q267, 4Q375, 4Q416, 4Q521, 4Q541, Mas1k, Mur19, XHev/Se2. All these samples were characterized by similar appearance, except for sample Mur19 that showed a different morphology, suggesting the use of a different material as a writing support.

Several samples featured microscopic black spots with diameters in the range of 10-200 µm, except for sample 4Q114 that was characterized by one black spot of approximately 600 µm. Raman spectroscopy was applied to investigate the chemical composition of the spots. For several samples, the Raman spectra featured the typical peaks at 1350 and 1580 cm^-1 corresponding to the Raman wavenumbers typical of C-C of amorphous carbon (signals not detected in the background). For example, Figure 6 reports the spectrum obtained from one spot on the sample 4Q216, and Table LABEL:tab:B7XX presents the OM photographs along with a description of the observed surface and summarizes the relevant information obtained by Raman spectroscopy.

The biggest black spot from the scroll 4Q114 was sampled separately, and radiocarbon dated to 2390±60 BP (GrM-13256). The size of the black spot was ca. 600 µm in diameter, with an observed thickness of ca. 50 µm, translating into a calculated mass of ca. 14 µg. Thus, with a sample mass of 6.2 mg for the sample radiocarbon dated for 4Q114, the contamination mass fraction from the black spot would be ca. 0.2% and thus, the effect of such contamination is negligible, whatever its age.

[Uncaptioned image] — Table 5: Optical microscope pictures and observations

B.7.4 Results of the Py-GC/MS analysis performed on 17 samples

Py-GC/MS was used in order to evaluate the possible presence of synthetic materials used as consolidating materials on the scrolls and to characterize the original parchment material: the 17 samples (4Q2, 4Q3, 4Q27, 4Q30, 4Q114, 4Q201/4Q338, 4Q206, 4Q216, 4Q259, 4Q267, 4Q375, 4Q416, 4Q521, 4Q541, Mas1k, Mur19, XHev/Se2) were directly analyzed without any prior sample pretreatment using a multi-shot pyrolyzer EGA/PY-3030D (Frontier Lab, Japan) coupled with a 6890 N gas chromatography system with a split/splitless injection port, and with a 5973 mass selective single quadrupole mass spectrometer (Agilent Technologies). The complete instrumental conditions are reported in la2019synthetic .

The pyrolytic profile of all these 17 samples featured molecular markers that can be related to the pyrolysis of animal hide or scroll (pyrrole and diketopiperazines), except for sample Mur19 that was instead characterized by the presence of anhydro sugars and levoglucosan, typical of a cellulose-based material colombini2009organic . This is consistent with the observation that Mur19 is a papyrus fragment. Figure 7 reports the chromatogram obtained for the sample from the parchment of 4Q521.

Samples 4Q3 and 4Q206 showed the presence of the markers of polyethylene glycol. The pyrograms of samples 4Q3 and 4Q30 also contain the peaks due to hexadecanonitrile and octadecanonitrile, which are the Py-GC/MS markers characteristic for egg. Samples 4Q521 and Mur19 were characterized by the presence of an acrylic resin. Finally, samples 4Q2, 4Q267, 4Q541, and Mur19 showed the presence of retene: this molecule is a marker characteristic of the combustion of resinous wood and can be indicative of the exposure of the scrolls to a fire in the space where writing took place or could be due to residues related to the illumination with torches. Table 6 summarizes the materials detected in the different parchment samples.

Pyrolysis allowed us to pinpoint the presence of exogenous materials, as consolidation synthetic materials (acrylic resin), or lipids. After disclosing the nature of the contamination, we were able to design the proper cleaning procedures to remove any unwanted consolidant.

The use of a further cleaning step using dichloromethane ensured the total removal of all the synthetic materials, as proven by pyrolysis analyses performed on a subsection of the samples after cleaning and prior to ¹⁴C dating.

Table 6: Summary of the materials detected in the different parchment samples.

Samples	Identified organic materials
4Q216	proteinaceous material, lipid material
4Q3	proteinaceous material, lipid material, egg
4Q27	proteinaceous material, lipid material
Mas1k	proteinaceous material, lipid material
4Q206	proteinaceous material, lipid material
4Q30	proteinaceous material, lipid material, egg
4Q201/4Q338	proteinaceous material, lipid material
4Q259	proteinaceous material, lipid material
4Q416	proteinaceous material, lipid material
4Q2	proteinaceous material, lipid material, retene
4Q375	proteinaceous material, lipid material
XHev/Se2	proteinaceous material, lipid material
4Q541	proteinaceous material, lipid material, retene
4Q521	proteinaceous material, lipid material, acrylic resin
4Q267	proteinaceous material, lipid material, retene
Mur19	lignocellulose material, acrylic resin, retene
4Q114	proteinaceous material, lipid material

B.7.5 Liquid chromatography-mass spectrometry results of the analysis of residual lipids in the extracts from the 30 samples after cleaning

HPLC-MS was applied to evaluate the presence of lipid materials. The dried extracts were reconstituted in 150 µL of iso-propanol/methanol, 10:90, filtered (PTFE syringe, 0.45 µm pore size) and analyzed. HPLC-ESI-Q-ToF analyses were carried out using a 1200 Infinity HPLC, coupled with a Quadrupole-Time of Flight tandem mass spectrometer 6530 Infinity Q-ToF detector by a Jet Stream ESI interface (Agilent Technologies, USA). The complete instrumental conditions are reported in la2013core .

The analyses were performed on the extracts from the two different sample pretreatments by Soxhlet, i.e., the three-step and the four-step extraction.

The comparison of the results obtained on the extracts with reference blanks allowed us to highlight the effective performances of the cleaning procedures, showing that the glyceride content after the last step was below 7.0 micrograms for both the approaches. The cleaning procedure proved to be effective for removing the lipid materials from the scroll samples, since all the solutions obtained after the last extraction step were characterized by the presence of triglycerides and fatty acids at or below blank level. Figure 8 shows a comparison of all final cleaning steps with the respective blanks for both the fatty acids and the triacylglycerols.

In particular, the worst case encountered in the entire data set was 7.0 µg of acylglycerols detected in the fourth cleaning step of 4Q3. These triglycerides can originate from the original parchment, or they can originate from later contamination such as castor oil. There is no way to determine the origin; it can also be a mixture of ancient and recent materials. If we, as a worst-case scenario, assume that all the triglycerides detected in 4Q3 were modern contamination, then it would skew a 2000-years old parchment sample with only 12.6 years.

As stipulated, this is a worst-case scenario depending on all triglycerides to be modern, which is an unlikely assumption because triglycerides are a normal ingredient of animal skin ghioni2005evidence . Furthermore, all other samples are well below the 7.0 µg level.

Appendix C OxCal plots: ¹⁴C determinations and calibrated date plots

Here, we present the OxCal plots for the 26 accepted samples. No plots were produced for the 3 technically rejected samples (see Section B.6), nor for the 1 sample of which the identity could not be ascertained (see Section B.5).

Appendix D Palaeography and radiocarbon dating of the Dead Sea Scrolls

D.1 Comparing radiocarbon results and palaeographic estimates

We make the comparison between the radiocarbon dates (Table LABEL:tab:summarized-c14 in Appendix B) and previous palaeographic estimates on the basis of the estimates given in the official publication series, Discoveries in the Judaean Desert (DJD), as these are considered the standard in the field, but sometimes we include references to estimates of other scholars when relevant.

However, we also critically assess previous palaeographic estimates. We do that on two levels. First, we reason according to the relative typology of the so-called Cross model and assess its application to individual manuscripts. This leads occasionally to palaeographic assessments that correct previous ones. Second, we desist from translating a relative typology to an absolute chronology. Because of the lack of date-bearing documents for the time-period one cannot impose the traditional framework’s unsubstantiated chronological limitations on when the so-called Hasmonaean and Herodian script features would have started to develop (see Section A.2.2). This also applies to chronological distinctions within the general indications of Hasmonaean-type and Herodian-type scripts. Cross suggested chronological ranges of 50 years, and sometimes even shorter ranges of 25–50 years, as he assumed a rapid development of the script from the Hasmonaean period onward, contrary to the presumed slow development in the third century BCE. However, this assumption of a rapid evolution remains unsubstantiated, too (see below). As more reliable time markers, our study’s ¹⁴C calibrated ranges demonstrate older date ranges than previously thought for individual manuscripts as well as for the beginnings of the Hasmonaean/Herodian scripts.

We can compare the radiocarbon dates with previous palaeographic estimates only in a general sense, not as a rigid application of these estimates. The early 1990s guideline that editors of manuscripts in the DJD series would date according to the typological specimens of Cross’s 1961 article has proved unfortunate. One problem is that many of the palaeographic estimates offered in the DJD series since the 1990s suffer from an insufficient understanding of Cross’s model, producing unreliable estimates Tigchelaar2020 . This unreliability is further exacerbated by the problems within Cross’s palaeographic model that conflates supposed historical and political developments with palaeographic style developments.

Cross Cross2003 presented few specimens for other scholars to work with. This makes it difficult to substantiate style developments within, for example, Hasmonaean formal script, and to account for the complexity of script in individual manuscripts. Moreover, Cross also suggested mutual influences between formal, semiformal, and semicursive in such a way that sometimes a typological development of an individual letter is thought to have occurred earlier in, e.g., semicursive than in formal script (e.g., for samek).

For the Hasmonaean formal script Cross Cross2003 ; Cross1998 singled out only three manuscripts, assuming absolute dates between ca. 175–30 BCE: 4Q28, 4Q30, and 4Q51 (for a number of individual letters, Cross also referred to 1QIsa^a and 4Q1, as respectively middle and early Hasmonaean formal, as well as to 4Q109 and 4Q504 as early Hasmonaean semiformal). Thus, 4Q30 is said to be “a typical Hasmonaean” script, without explanation why that is so, from the middle of the period, 125–100 BCE (Cross might have used 1QIsa^a instead but because he deemed it to have more idiosyncratic forms he gave preference to 4Q30 which he understood to have been copied by a more conventional scribe; no further substantiation is provided for these claims). The other two manuscripts are at the outer ends of the Hasmonaean formal script spectrum, apparently for having script style elements in common with earlier and later periods. So 4Q28 is presented as transitional between Archaic and the beginning of the Hasmonaean development (175–150 BCE) and 4Q51 as a late transitional script from the end of the Hasmonaean period or the beginning of the Herodian period (50–25 BCE).

For the Herodian formal script Cross singled out seven manuscripts, assuming absolute dates between 30 BCE and 70 CE: 1QM, 4Q27, 4Q37, 4Q85, 4Q113, 5/6Hev1b, and Mur24. In fact, only four manuscripts are singled out for the Herodian formal script. 1QM is presented as “a typical early Herodian formal script” (ca. 30–1 BCE), while 4Q113 would represent “a developed Herodian formal script” (20–50 CE) and 4Q37 and 4Q85 late Herodian formal scripts from respectively ca. 50 CE and ca. 50–68 CE. 4Q27 is said to be “a typical exemplar of the extremely popular Round semiformal style” (also called rustic, and considered distinct from the Vulgar semiformal) from the early Herodian period, ca. 30 BCE–20 CE. The final two manuscripts are actually considered so-called post-Herodian, assuming absolute dates between 70–135 CE: 5/6Hev1b was estimated by Cross from 75–100 CE (Flint suggested 50–68 CE DJD38 ) and Mur24 is a date-bearing document from 133 CE. Cross saw in some Herodian scripts the types of individual letters mixed so that semiformal can “invade” formal or Vulgar semiformal “makes its way into the formal character”, e.g., mem,

Apart from evaluating individual letters, there is no method in the field for dating an entire manuscript on the basis of mixed evidence of ‘older’ and ‘later’ forms of individual letters. Perhaps some scholars apply a form of quantification, weighing the instances of ‘older’ and ‘later’ forms, but this is never explicated. Rather, the assumption generally seems to be that ‘later’ forms cannot have developed earlier but ‘older’ forms can still have been in use at a later time, whether or not as a case of ‘archaizing’. While it may certainly be true that ‘older’ forms can have been in use for a long time, the claim that ‘later’ forms cannot have developed earlier remains unproven for lack of dated evidence. This means that what are perceived as, e.g., late Hasmonaean or early Herodian letter forms may have developed earlier than currently thought.

Even if one adopts Cross’s typological development, the issue of the absolute dating or calibration of the types remains Tigchelaar2020 . A mixture of ‘older’ (more ancient) and ‘later’ (more developed) forms can appear in one and the same manuscript. A focus on individual letters alone cannot be indicative for earlier or later chronology, whether relative or absolute. The study of individual manuscripts demonstrates a more complex development (see, e.g., for 4Q1 Tigchelaar2023 ). There are examples of experienced palaeographers coming up with widely diverging dates for the same scrolls. Thus, a range of individual manuscripts cannot be fitted precisely in a sequence on the basis of traditional palaeography.

The radiocarbon dates and the palaeographic estimates are two independent information sources about history, based on two different methodologies: one is a physically measured “yardstick of time”, the other is a cultural and qualitative assessment. At present, in the absence of an abundance of date-bearing manuscripts between the third century BCE and the first century CE, radiocarbon dates (¹⁴C) derived from manuscript samples are more reliable time markers. The palaeographic estimates do not provide absolute or fixed dates.

With these caveats in mind, Figure 1 in the main article shows the comparison between the (accepted) 2 $\sigma$ calibrated ranges and previous palaeographic estimates (see the worksheet in Appendix L for the specific data and information). Additional plots can be found in Appendix H where Figure 29 and 30 presents the effect of including or excluding minor peaks to the 2 $\sigma$ calibrated ranges and Figure 31 presents the outcome of selecting 1 $\sigma$ calibrated range.

D.1.1 Whole or partial overlap

Comparing previously given palaeographic estimates and our ¹⁴C 2 $\sigma$ results, shows that 17 of the 26 sampled manuscripts in our project have whole or partial overlap. This applies to: 4Q23, 4Q47, 4Q52, 4Q70, 4Q161, 4Q176, 4Q201/4Q338, 4Q255/4Q433a, 4Q259, 4Q504, 4Q521, 4Q541, 11Q5, Mas1k, Mur19, 5/6Hev1b, XHev/Se2.

4Q47 is a good example of how palaeographic estimates cannot be precise or clearly substantiated. Ulrich DJD14 reports that Cross had identified its script as Hasmonaean—thus dating it probably in the second half of the second century or the first half of the first century BCE—but refrained from offering a more precise estimate within the Hasmonaean period. Langlois Langlois2011 and Puech Puech2015 favoured the first half of the first century BCE. Langlois referred to some letters showing a typologically older form (bet, dalet, vav, khet, nun), while others would have a form more in line with those seen in late Hasmonaean or early Herodian periods (aleph, he, tet, samek, pe). However, considering, e.g., aleph one can see two forms, one of them being a typologically older form where the left leg often connects to the middle of the diagonal instead of more toward the top of it; the same for samek that appears in both closed (younger) and open (older) form. Also, ayin is often small, considered an older form, while yod shows the triangular head, seen as typical for the late Hasmonaean period. Instead of trying to fit this manuscript overall into a linear date estimate, the mixed typological evidence can be better explained as demonstrating overlapping or partly adjacent style developments.

In Section B.4 in Appendix B we noted that our calibrated results are often bimodal, especially for 2 $\sigma$ distributions. 4Q47 is an example of such bimodal calibrated results, also for the 1 $\sigma$ distribution. The ¹⁴C 2 $\sigma$ calibrated range of 210–100 BCE (61.6% probability) overlaps with the broad palaeographic estimate ‘Hasmonaean’—but less with the more specific ones of Puech and Langlois—and also allows for dating the script style of 4Q47 to the first half of the second century BCE.

The older 2 $\sigma$ calibrated range of 355–290 BCE (33.8%) is far removed on the timeline from previous palaeographic estimates. Although the older 2 $\sigma$ peak represents a mathematically valid solution of the dating process, the younger calibrated peak must be preferred for 4Q47 over the older calibrated peak. Following the palaeographic principle to compare the script of an undated manuscript to that of dated writings with a similar script (see Section A.2.2 in Appendix A), it should be noted that 4Q47 does not compare to the extant typological evidence from date-bearing Aramaic manuscripts from the fourth century BCE. Typologically, the script of 4Q47 does not correspond to that of the script in date-bearing documents from the Persian period such as those from Bactria or from Wadi Daliyeh from the same region as the Dead Sea Scrolls. So, from a palaeographic perspective, 4Q47 is clearly younger than where the older calibrated peaks appear on the timeline.

Prior to the discovery of the Wadi Daliyeh documents, 4Q52 was argued by Cross to be the oldest manuscript among the Dead Sea Scrolls, and it certainly has the best cards for being the oldest biblical manuscript. In the official publication, Cross et al. estimated 4Q52 to ca. 250 BCE DJD17 .

The ¹⁴C evidence is bimodal for the 2 $\sigma$ distribution. The younger peak of 285–230 BCE (16.6% probability) agrees well with the palaeographic estimate. The older calibrated range is 410–355 (78.9%). Although it cannot be ruled out completely from a palaeographic perspective, this older date seems typologically slightly too early for the script in 4Q52 in comparison to date-bearing documents from Elephantine from the late fifth century BCE and date-bearing documents from Bactria from 353 to 324 BCE, although it is difficult to factor in consequences of geographical variance for script variations. A date range in the second half of the fourth century BCE would seem more suitable for 4Q52. Following palaeographic principle, 4Q52 would have to be dated chronologically nearer to the Wadi Daliyeh manuscripts, especially WDSP 1 from 335 BCE (see Section A.2.2). But for that date range there is no ¹⁴C result.

Hence, from a palaeographic perspective a clear preference for one of the two peaks in the probability distribution cannot be substantiated. The 2 $\sigma$ range of 410–355 is perhaps only a few decades too old and not one to two centuries as for most other bimodal results of our ¹⁴C measurements. So, in the case of 4Q52, the older peak cannot be rejected as a possible solution with as much confidence as for most other ¹⁴C samples with bimodal evidence.

4Q176 has two script styles (plausibly from two scribes): the script of fragment 1–2 i looks entirely different from 1-2 ii. The ¹⁴C sample in this study was taken from fragment 1-2 ii. Strugnell Strugnell1970 and Tigchelaar Tigchelaar2019 characterized its script as ‘middle Hasmonaean’, i.e., ca. 125–75 BCE. Strugnell’s palaeographic analysis can be easily misunderstood. He explains that many of the letter forms of the second script style seem to be Herodian, such as bet, tet, mem, and qoph. Yet, because the script is not formal but semiformal these forms must be dated to the middle Hasmonaean period. Fragment 1-2 ii shows less uniformity in size than fragment 1-2 i, e.g., kaph or medial mem. This can be understood as a typologically older feature where kaph and medial mem are still larger other than letters. The ideal of a base line seems not yet well developed. The three-stroke he and the small-sized ayin seem archaic. On the other hand, the bet has a broad base stroke and protrudes to the right, in formal script generally typologically connected to late Hasmonaean or early Herodian. But if the distinction between formal and semiformal cannot be clearly made, 4Q176 is another example of mixed evidence.

4Q176 is another example of bimodal calibrated results, having in addition also minor peaks of low probability. The 2 $\sigma$ calibrated range of 210–100 BCE (64.2% probability) and the minor peak of 70–60 BCE (0.7%) are consistent with previous palaeographic assessments. The 2 $\sigma$ calibrated range of 210–100 BCE also makes an older dating of the script style possible.

These assessments for 4Q47, 4Q52, and 4Q176 also apply to 4Q23, 4Q70, 4Q161, 4Q255/4Q433a, 4Q259, 4Q504, 4Q521, 4Q541, Mas1k, and XHev/Se2. Only in the case of 4Q201 and 11Q5 do the ¹⁴C results indicate a date range that goes in the direction of a younger possible date, whereas in almost all cases the direction is toward an older possible date range.

Regarding 4Q201, Milik’s edition Milik1976 suggested the first half of the second century BCE, and most scholars have accepted this estimate. He considered its script to be quite archaic and connected to the third and second-century BCE semicursive or semiformal scripts, perhaps more dependent on the Aramaic writing of northern Syria or Mesopotamia than on those of Judaea or Egypt. Similar comparisons with northern Syria have been made for 4Q17 and 4Q109, but concrete connections cannot be substantiated. Puech Puech2017 also saw the script as semiformal/semicursive, dating from ca. 200 BCE, while Langlois Langlois2011 gave an estimate of ca. 150 BCE.

4Q201 has a 2 $\sigma$ calibrated range of 165–40 BCE (93.6%) and a minor peak of 10–1 BCE (1.9%). This overlaps with the palaeographic estimates, but instead of an older date, a younger date than previously considered is also possible.

The script of 4Q201 is hard to assess, in part because the scribe used a pen with a thick, worn nib to write small letters, which may account for the atypical aleph. Yet, apart from archaic forms of samek and shin nothing is typologically incongruent with the early Hasmonaean script.

As for 11Q5, Sanders DJD4 understood its script as transitional from early to late Herodian, comparing it to 4Q113 and also 1QM, 4Q27, 4Q37, and 4Q51. He estimated its script to the first half of the first century CE, possibly slightly earlier than 4Q113, Cross’s specimen for “a developed Herodian formal script”. However, clear typological distinctions on the level of individual letters between ‘early’, ‘developed’, and ‘late’ Herodian according to Cross’s specimens are not that easily made. For example, one may consider aleph which from early to late Herodian would advance to an inverted “v” form of the left leg and oblique axis, or dalet, where the horizontal stroke breaks through the right leg, and see that there is no difference here between the ‘developed’ and ‘late’ Herodian specimens of 4Q113, 4Q37, and 4Q85. On the other hand, the sharp bent in the right leg of ayin and sin/shin may be seen in early as well as developed and late Herodian exemplars, whereas in some manuscripts that are considered to be late Herodian the sharp bent is not clearly shown, e.g., Mur88 and 5/6Hev1b. As to more general features of the Herodian formal script, one may consider a generally uniform letter size, a base line, ligatures, and the development of keraiai or serifs. But beyond a general impression, these features are difficult to use for a clear typological differentiation of manuscripts within the Herodian formal script.

11Q5 has a 2 $\sigma$ calibrated range of 5–120 CE (92.2%) and a minor peak of 35–15 BCE (3.3%), showing clear overlap with the different presumed Herodian palaeographic periods, even post-Herodian. The measurement has a standard deviation of only 18 in ¹⁴C years (BP). The length of the 2 $\sigma$ calibrated range, 35 BCE–120 CE, is caused by the shape of the calibration curve in this period when converting the BP dates to calendar dates. Scholars of the Dead Sea Scrolls may consider a date later than 70 CE for 11Q5 unlikely because the scrolls found in the Qumran caves are assumed to have been hidden in the summer of 68 CE Popovic2012 .

4Q259 is notorious for its widely varying palaeographic estimates in the second-first centuries BCE. Cross Charlesworth1994 described 4Q259 as written in an unusual semicursive with mixed semicursive and semiformal script features. He gave 50–25 BCE as a date estimate. Earlier, Milik Milik1976 had suggested the second half of the second century BCE (Milik used the older reference number 4Q260), while later Puech Puech1998 suggested the first half of the first century BCE, preferably shortly after 100 BCE. Puech argued for this date on a combined basis of a palaeographic analysis of Cryptic A script (compared to 4Q298 and especially to 4Q249 and 4Q317) and the ¹⁴C dating of 4Q317 Jull1995 .

4Q259 has a 2 $\sigma$ calibrated range of 210–100 BCE (69.7%) and a minor peak of 70–55 BCE (1.4%). The 2 $\sigma$ calibrated range of 210–100 BCE agrees with the two older palaeographic estimates of Milik and Puech, whereas the minor peak of 70–55 BCE is nearer to Cross’s estimate. The bimodal evidence for 4Q259 shows an older 2 $\sigma$ calibrated range of 350–310 BCE (24.3%), but, as for 4Q47, this older peak can be rejected as possible solutions based on typological comparison with date-bearing Aramaic manuscripts from the fourth century BCE.

Following Cross’s typology, Puech Puech1998B analysed 4Q521 as a Hasmonaean formal script and estimated it between 100–80 BCE. This manuscript was also radiocarbon dated in the 1990s Jull1995 . That BP date (1984 ± 33) now has to be recalibrated according to the IntCal20 calibration curve (reimer2020intcal20 ), which results in a 2 $\sigma$ date range of 45 BCE–120 CE. According to the bimodal evidence of our study, the younger 2 $\sigma$ calibrated range is 230–100 BCE (57.5%), while the older peak in the 2 $\sigma$ range of 355–285 BCE (38.0%) can be rejected as a possible solution due to comparative typological evidence from date-bearing Aramaic manuscripts from that period. The difference in age between the two radiocarbon tests may be due to the Soxhlet procedure cleaning castor oil from the sample, but it is not possible to quantify or ascertain that. The palaeographic estimate of 100–80 BCE and our 2 $\sigma$ calibrated range of 230–100 BCE connect in the year 100 BCE. So, considering measurement uncertainties, 4Q521 can be taken as a partial overlap.

The script of 5/6Hev1b was considered by Cross Cross2003 to be a post-Herodian formal, estimated from 75–100 CE (Flint DJD38 suggested 50–68 CE). This sample was the least precise ¹⁴C result in our study, with a standard deviation of 28 years in BP, and calibrated in 2 $\sigma$ to 10–205 CE. The large calibrated date range, caused by the shape of the calibration curve in this period, clearly encompasses previous palaeographic estimates, but also moves in both a much older and a much younger direction of possible dates.

D.1.2 No overlap

Nine out of 26 samples yield (accepted) 2 $\sigma$ calibrated ages that do not overlap with previous palaeographic estimates. In all 9 cases, the ¹⁴C results give calibrated age ranges that are older than previous palaeographic estimates. Yet, in light of our critical assessment, the older ¹⁴C age ranges are in most cases also palaeographically possible and realistic. This applies to: 4Q2, 4Q3, 4Q27, 4Q30, 4Q114, 4Q206, 4Q267, 4Q375, 4Q416.

4Q30 was Cross’s “typical Hasmonaean” script specimen from the middle of the period, 125–100 BCE, like 1QIsa^a Cross2003 . The calibrated result for this sample in our study is bimodal. According to the ¹⁴C measurement, 4Q30 has a 2 $\sigma$ calibrated range of 235–165 BCE (36.7%) and a minor peak of 260–245 BCE (1.4%). The older 2 $\sigma$ peak of 360–275 BCE (57.4%) can be rejected as a possible solution based on palaeographic comparison with date-bearing manuscripts in Aramaic script from the period. Though Cross gave the more narrow estimate from 125–100 BCE, White Crawford estimated more broadly from 150–100 BCE DJD14 . An earlier date range, say in the first half of the second century BCE, as indicated by ¹⁴C, is realistic and possible. In general, there is no reason to chronologically limit the script identified as Hasmonaean to the upper range of the political-historical period of the same name in the mid-second century BCE (see Section A.2.2). The sequence of relative typology can chronologically easily be moved to an older age range. Though in general 4Q30 shows a more uniform letter size, at the level of individual letters, the often not yet ‘standard’ letter size of aleph and the often small ayin point to earlier typology in the Hasmonaean script. As we also argued for, for example, 4Q47 and 4Q176 (Section D.1.1), 4Q30 shows mixed typological evidence.

4Q27 was Cross’s “typical exemplar of the extremely popular Round semiformal style”, initially estimated by him to be early Herodian (ca. 30 BCE–20 CE) but later slightly revised by Jastram and Cross to the latter half of the first century BCE DJD12 . 4Q27 has a 2 $\sigma$ calibrated range of 200–50 BCE (94.2%) and a minor peak of 340–330 BCE (1.3%) that can be rejected as a possible solution for palaeographic reasons. The 2 $\sigma$ calibrated range of 200–50 BCE comes near the revised palaeographic estimate. The calibrated date has a large range. This is caused by the measurement’s standard deviation of 26 ¹⁴C years (BP) in combination with the shape of the calibration curve in this period.

Interestingly, the 2 $\sigma$ calibrated range for another specimen of the Herodian round semiformal, 4Q161, is 55 BCE–30 CE (92.1%), with two minor peaks of 90–80 BCE (1.7%) and 45–60 CE (1.7%). This may suggest a longer and somewhat older age range for this Herodian-type script than only the latter half of the first century BCE. Palaeographically, there are also many differences between 4Q27 and 4Q161. In 4Q161 the long extending base strokes of kaph, broad dalet, ligatures, and strikingly penned tet and shin stand out, whereas 4Q27 shows less tendency to broadening of letters. Although possible from a ¹⁴C perspective, a date in the first half of the second century BCE for 4Q27 seems unlikely from a typological perspective in comparison with other manuscripts. Nonetheless, there are four more Herodian-type manuscripts dated to that range by ¹⁴C in our study: 4Q3, 4Q267, 4Q375, and 4Q416.

4Q267 is another example of Cross’s early Herodian round semiformal. Yardeni related 4Q267 to 4Q397 as possibly written by the same scribe and estimated it from 30 BCE–20 CE DJD18 (Yardeni did not take over Cross’s round semiformal categorization and understood its script as formal). 4Q267 was also radiocarbon dated in the 1990s Jull1995 . That BP date (2094 ± 29) now has to be recalibrated according to the IntCal20 calibration curve (reimer2020intcal20 ), which results in a 2 $\sigma$ calibrated range of 200–40 BCE (94.0%) and a minor peak of 10 BCE–5 CE (1.5%). According to our study, 4Q267 has a 2 $\sigma$ calibrated range of 210–95 BCE (65.3%) and a minor peak of 70–55 BCE (1.6%), whereas the older 2 $\sigma$ range of 355–290 BCE (28.6%) can be rejected as a possible solution due to comparative typological evidence from date-bearing Aramaic manuscripts from that period. The difference in age between the two radiocarbon tests may by due to the Soxhlet procedure cleaning castor oil from the sample, but it is not possible to quantify or ascertain that.

From a typological perspective it is difficult to understand the script of 4Q267 being chronologically so near to quite different typological specimens in the second century BCE. We may have to reckon with overlapping or partly adjacent style developments but in this case it would severely impact the relative typology dominant in the field, not just moving it chronologically and keeping the relative typology intact.

4Q267 might be an outlier, yet this ¹⁴C result raises the fundamental issue of how the absolute, chronological dating of typological differences in a linear sequence has been substantiated. Cross assumed a slow development of the Aramaic/Hebrew script in the third century BCE and he assumed a rapid evolution of the script in the Hasmonaean and Herodian eras, but he could not substantiate either assumption, due to the lack of date-bearing documents. He assumed but did not demonstrate that the finer typological distinctions had to be chronologically sequenced one after the other instead of existing partially next to each other (see, similarly, Sirat1986 ).

Cross wavered with his palaeographic estimate of the ‘semicursive’ 4Q114 from the late second century BCE (125–100) to ca. 100–50 BCE, and, under influence of the finds of Wadi Daliyeh, back to the late second century BCE DJD16 , “no more than about a half century younger than the autograph”, Cross said Cross1961B . Interestingly, Cross dated 4Q114 contemporary to the formal hand of 4Q30. 4Q114 preserves Daniel 8–11, a part of the book which scholars argue on literary-historical grounds to have been composed in the 160s BCE. 4Q114 has a 2 $\sigma$ calibrated range of 230–160 (45.9%) and an older 2 $\sigma$ range of 355–285 (49.5%) that can be rejected as a possible solution based on comparative typological evidence from date-bearing Aramaic manuscripts from that period.

Because of its scribal errors, it is unlikely that the scribe of 4Q114 was the author. But the early date and low scribal quality of 4Q114 shed new light on the production and circulation of literature in ancient Judaea: its date is indicative for the speed of the text’s spread, and the low quality of the manuscript may indicate it originated in a social context close to the original author Popovic2023 ; future research may further validate this. 4Q114 would then have been copied very soon after the assumed composition of Daniel 8–11. The ¹⁴C 2 $\sigma$ date of 230–160 BCE for 4Q114 is matched by a very much comparable older ¹⁴C date of 4Q30.

For 4Q206, Milik Milik1976 gave an estimate from the first half of the first century BCE, and simply referred to four of the exemplary Hasmonaean manuscripts given by Cross (4Q30, 4Q51, 4Q114, 4Q398), apparently with no concern for their differences in style and for Cross dating these quite differently. In his recent edition in consultation with Puech, Drawnel Drawnel2019 estimated 4Q206 to be from the middle of the first century BCE. It is interesting that two of Milik’s typological comparanda, 4Q30 and 4Q114, have ¹⁴C results in our study similar to 4Q206: the 2 $\sigma$ calibrated range for 4Q206 is 235–145 BCE (45.8%) with a minor peak of 135–120 BCE (1.1%); the older 2 $\sigma$ range of 360–280 BCE (48.6%) can be rejected as a possible solution for palaeographic reasons. In each of these cases the ¹⁴C results indicate an earlier chronological date than the palaeographic estimates. But typologically some letters are slightly different and commonly seen as a later development of the letter form, e.g., bet, mem, and ayin. Yet, other letters show varied forms within 4Q206 and some compare well with instances from 4Q30, e.g., aleph, he. So 4Q206 may be another example of mixed typological evidence.

Then there are four Herodian-type manuscripts whose ¹⁴C dates extend into the second century BCE: 4Q2, 4Q3 and 4Q375 and 4Q416.

The script of 4Q2 has been described as late Herodian or even post-Herodian (ca. 50–68+ CE), in part because of the increasing use of keraiai DJD12 . While the script is typologically certainly Herodian, the assumption that calligraphic features are typical for its latest period cannot be substantiated. 4Q2 has a 2 $\sigma$ calibrated range of 125 BCE–10 CE (90.3%) with a minor peak of 155–130 BCE (5.2%), providing a date range up to 10 CE, which seems realistic to us.

The case of 4Q3 is more difficult. Its script was tersely described as “an Herodian formal hand dating from the middle to end of that period (c. 20-68 CE)” DJD12 . Indeed, the script of 4Q3 features several letters and elements generally regarded to be developed Herodian, like the small tick above the crossbar of the final mem. Yet, some letters have older shapes which are uncommon in those developed Herodian formal hands, such as the ‘horned’ dalet. 4Q3 has a 2 $\sigma$ calibrated range of 200–50 BCE (92.0%) and a minor peak of 340–325 BCE (3.5%) that can be rejected as a possible solution for palaeographic reasons. The palaeographic rule of thumb that the latest forms are indicative for its age would militate against the 2 $\sigma$ range of 200–50 BCE. Yet, the exact moment when those latest forms have arisen has not been substantiated in the field. Future evidence may further validate this.

Strugnell provided a judicious analysis of the palaeography of 4Q375, comparing its style to that of the round or rustic semiformal series which is generally associated with early Herodian, but also arguing that, typologically, it must be an early exemplar since some letters do not yet have the typically Herodian forms DJD19 . True to his custom, he did not translate this typological assessment into a calendar date, but in Cross’s correspondence between hand and style this would amount to ca. 50–25 BCE, which would nearly agree with the 2 $\sigma$ calibrated range of 205–50 BCE (89.5%). Considering the uncertainties in the palaeographic estimate, this is acceptable. The 2 $\sigma$ peak of 340–320 BCE (6.0%) can be rejected as a possible solution for palaeographic reasons.

Also for 4Q416, Strugnell carefully analysed its individual letters, arguing that in most cases these should be placed between 4Q51 and 1QM, hence “in a date transitional between the late Hasmonaean and the earliest Herodian hands” DJD34 . He judged the script of 4Q416 to be earlier than those of 4Q415, 4Q417, and 4Q418 by some twenty-five years so that a palaeographic estimate of 50–25 BCE presents itself. 4Q416 has a 2 $\sigma$ calibrated range of 205–90 BCE (78.1%) and a smaller peak of 80–50 BCE (9.4%). Considering the uncertainties in the palaeographic estimate, this is acceptable. The 2 $\sigma$ peak of 345–320 BCE (8.0%) can be rejected as a possible solution for palaeographic reasons.

D.1.3 Concluding the comparison between radiocarbon results and palaeographic estimates

Based on this comparison between (accepted) 2 $\sigma$ calibrated dates and previous palaeographic estimates we make the following concluding observations.

Overall, the ¹⁴C results indicate older date ranges for individual manuscripts. Only two manuscripts, 4Q201 and 11Q5, have date ranges that go in the direction of a younger possible range (5/6Hev1b has a range both a bit older and much younger). Thus, Hasmonaean-type manuscripts have ¹⁴C date ranges that allow for older dates in the first half of the second century BCE, and sometimes also up to the latter part of the third century BCE, instead of the late second century or early first century BCE. There are no compelling palaeographic or historical reasons that preclude these older dates as reliable time markers for the Hasmonaean script (this also applies to the solid third-century BCE range for 4Q70 and its Archaic-type script).

The ¹⁴C results for most manuscripts confirm the basic distinction between Hasmonaean-type manuscripts that are older, and Herodian-style manuscripts that are younger, and, for that matter, also between Archaic-type (4Q52 and 4Q70) and Hasmonaean-type manuscripts. However, the ¹⁴C date ranges for manuscripts that are traditionally considered Hasmonaean and Herodian are quite differently distributed across the timeline.

As can be seen in Figure 1 in the main article, the twelve Hasmonaean-type manuscripts in our sample set have (accepted) 2 $\sigma$ calibrated date ranges from the second and first century BCE, as expected, and most extend also into the late third century BCE. Three Herodian-type manuscripts (4Q161, Mas1k, XHev/Se2) have (accepted) 2 $\sigma$ calibrated date ranges from the latter half of the first century BCE and the first century CE, as expected. Two Herodian-type manuscripts have date ranges in the first century CE, as expected, but also extend into the second century CE (11Q5 and 5/6Hev1b, the latter even into the early third century CE). 4Q2 has a date range extending from the early first century CE back to the early second century BCE. And five Herodian-type manuscripts have (accepted) 2 $\sigma$ calibrated date ranges in the second century BCE (4Q3, 4Q27, 4Q267, 4Q375, and 4Q416), though 4Q27 extends into the first century BCE.

This adds a third component to our critical assessment. In addition to critiquing the application of traditional typology to individual manuscripts and dismantling unsubstantiated historical suppositions and chronological limitations, the results of this study also question the validity of the relative typology as such. The traditional relative typology can be maintained but not in all cases. The spread of the Hasmonaean-type manuscripts over the timeline does not affect Cross’s relative typology in a major way but the older, second-century BCE date ranges of the Herodian-type manuscripts do affect the relative typology potentially in a major way.

Individual manuscripts frequently show mixed typological evidence: a manuscript can have different forms of individual letters that are considered ‘older’ or ‘younger’ according to the traditional palaeographic framework. There is, however, not a good method to assess what this means in terms of relative placement of the individual manuscript, nor, for that matter, what it means for relative typology in general. The rule of thumb that the typologically latest forms should determine one’s palaeographical estimate of a manuscript, presupposes existing palaeographical date markers and a decision which features are typologically important or indicative. The dated Wadi Daliyeh discoveries showed that features that were supposed to be significantly later, already appeared many decades earlier than previously expected. Moreover, it is assumed but not substantiated that typological differences must be translated to chronological linear sequences. Instead of a linear development, as Cross and others have assumed, the possibility of overlapping or partly adjacent style developments must be considered.

So, the so-called Hasmonaean script can indeed be regarded older than the so-called Herodian script but the ¹⁴C results of this study indicate that the Herodian script was present earlier than previously thought. This suggests that these scripts were not transitioning from the mid-first century BCE onward (the so-called late Hasmonaean/early Herodian category of manuscripts) but that much earlier they already existed partially next to each other.

This study shows that there are no cogent reasons for limiting the palaeographic dating of style developments to political-historical periods such as Hasmonaean or Herodian. The terms ‘Hasmonaean’ and ‘Herodian’ might still be employed for types of script, but these cannot be converted to specific date ranges. For date estimations of individual manuscripts, one should rather use concrete age ranges.

D.2 Combining palaeography and radiocarbon data to train the artificial intelligence-based date-prediction model

In this study, we combine palaeography and radiocarbon dating to train our date-prediction model. It should be stressed that this means a combination of qualitative and quantitative approaches and methods. Palaeography is a qualitative approach, based on expert knowledge, which is similar to the role of, for example, epigraphy as expert knowledge in assael2022restoring . This is different from, for example, an archaeological or geological stratigraphy that often can be quantified on the timeline. From a palaeographic approach alone it is not possible to pinpoint an exact date, date range, or date limit on the timeline of the period of the Dead Sea Scrolls, because palaeographic dates are estimates and do not provide absolute or fixed dates. For example, in our research, palaeography tells us that most scrolls cannot date to the fourth century BCE but we cannot assign a limiting number like 300 BCE. Moreover, typological script development is a gradual process, not necessarily following a linear time trajectory.

Most Dead Sea Scrolls are typologically younger when compared to the script in Aramaic date-bearing documents from the fourth century BCE. Therefore, when any knowledgeable palaeographer is presented with the bimodal evidence in the 2 $\sigma$ range as is the case with our study, then they certainly will reject the older peak as a possible solution, as we have already explained (Sections D.1.1 and D.1.2), a principle also tacitly applied by Bonani1992 ; Jull1995 . Hence, in these cases of bimodal evidence typologically younger is also chronologically younger.

However, it is an open research question whether some of the oldest Dead Sea Scrolls might be older than their palaeographic estimated date in the mid-third century BCE. Following the palaeographic principle to compare the script of an undated manuscript to that of dated writings with a similar script (see Section A.2.2 in Appendix A), we have already argued that 4Q52 would have to be dated chronologically nearer to the Wadi Daliyeh manuscripts, especially WDSP 1 from 335 BCE (see Section D.1.1). This may also apply to 4Q70, which has 2 $\sigma$ calibrated ranges of 320–200 BCE (79.2%) and 375–345 BCE (16.3%). The older 2 $\sigma$ peak can be rejected as a possible solution based on typological comparison but the younger 2 $\sigma$ peak’s extending into the fourth century BCE cannot be completely ruled out, although 4Q70 is typologically further removed from the date-bearing Aramaic manuscripts from the fourth century BCE than 4Q52 and a date in the third century BCE for 4Q70 is more likely from a palaeographic perspective. However, such a qualitative assessment cannot be characterised as a specific quantitative prior (an expected date with mean and standard deviation) in the timeline. Therefore, palaeographers cannot give an exact date such as 280, 300, or 320 BCE as the year before which none of the Dead Sea Scrolls can be dated.

In order to train our artificial intelligence-based date-prediction model, we use the accepted 2 $\sigma$ calibrated data from 24 of the 26 accepted ¹⁴C results (Table LABEL:tab:summarized-c14). For the training of Enoch, the data from two manuscript samples are not used: Mur19 and 4Q52. Because of its cursive script, the papyrus fragment Mur19 is, at the moment, not relevant for Enoch. We also leave 4Q52 out of consideration because, in this case, we cannot decide between the two peaks in the probability distribution (see Section D.1.1). This is why we work with the tentative addition or deletion of 4Q52 in the training of our algorithm. This is how we get from 26 accepted ¹⁴C results to 24 manuscripts used as the primary training set for Enoch.

Appendix E Artificial intelligence (AI) in dating the scrolls

In this project, we do use deep learning for image processing (binarisation) but have refrained from properly using it for the date prediction. See Appendix F, explaining the objections to the use of deep learning for the date prediction, including an analysis of an experiment we executed, using transfer learning starting with a state-of-the-art foundational deep-learning model.

E.1 Data preparation

Our first step is to collect and prepare the data for the date prediction model. We collect the images of the manuscripts for each of the ¹⁴C samples with accepted dates. We have used 24 manuscripts as the primary training set for our date-prediction model (a complete list can be found in Appendix I in the supplementary materials). The physical 24 radiocarbon-dated manuscripts are visually spread out on many individual fragment images of the IAA’s Leon Levy Dead Sea Scrolls Digital Library collection dssllweb . In addition to this primary training set, we have created different combinations of training data to perform comparative analyses and further check the robustness of the model (see Subsection E.9 for details). We obtained a data set of 75 images from the 24 radiocarbon-dated manuscripts. We use 62 of these images to train our model (Figure 9 shows the size distribution of the training images after the preprocessing steps). The remaining 13 images, chosen deliberately and randomly, are passed as unseen test data to validate the robustness and reliability of the model’s performance. We also select a large number of images to perform tests on the date prediction model. Once the images are selected, we start with the preprocessing task, where we use BiNet, the neural network architecture, to extract the characters. The binarization, along with alignment correction and fragment arrangement, provides better-quality images (see Subsection E.1.1). It is extremely important to obtain the highest quality of binarized images. This is because the image quality determines the success of the feature computation and the ultimate date regression model.

E.1.1 Preprocessing: binarization, alignment and arrangement correction

We start with the multispectral band images for each fragment to use an image fusion technique designed in-house dhali2019binet to create pseudo-colour images from a weighted combination of the band images (see Figure 10). The resultant pseudo-colour images offer high contrast and facilitate better separation of ink from backgrounds, a task commonly known as binarization. Both training and test images go through the same preprocessing techniques. It is important to note that although many modern deep-learning methods can be trained directly using the colour/grayscale without binarization, this approach is not suitable for dating the scrolls. Direct end-to-end solutions, i.e., classification or clustering of the training images with testing images, may seem feasible, but there is a risk of obtaining completely inaccurate results. Artificial neural networks, for instance, may make decisions based on superficial correlations with the texture of the parchment, leading to erroneous outcomes. Therefore, isolating only the ink traces (foreground) and excluding any other material features in the images (background) is crucial. BiNet is a deep-learning-based method specially designed to binarize scroll images. Instead of using a simple filtering technique, BiNet uses a neural network architecture for the binarization task and therefore yields better output dhali2019binet (see Figure 11).

Once the binarization process is complete, additional cleaning of the images is performed. This cleaning step aims to remove any extra noise or speckles that were not completely removed by the binarization technique. This is a crucial procedure to ensure that features are extracted only from the characters of each image. Subsequently, rotation and alignment correction are also performed. If the images are rotated at some angle to the horizontal axis, it can affect feature calculations that rely on rotation invariance. Therefore, rotation correction is applied to align the text lines horizontally. In some cases, a minor affine transformation and stretching correction are executed in a selective manner. These corrections are specifically intended to align the twisted text lines caused by the degradation of the parchment. In many cases, one manuscript contains multiple fragments. In these cases, we put the fragments together and arrange them into a single image (see Figure 12). The GIMP tool, a free and open-source graphics editor, is used for rotation and arrangement correction gimp . It is important to note that the alignment and arrangement corrections are mostly done with the training images to obtain accurate feature extractions for the style periods represented by those images. However, these corrections are only done for some of the test images due to the limitation of time and resources. Most test images are used directly after binarization, sometimes leading to an unrealistic prediction due to damaged and deformed characters (see Figure 13). If any test images need special attention in the future, extra steps can be performed to obtain a better image for a better prediction from Enoch.

E.2 Data augmentation

We have a very limited number of radiocarbon-dated manuscripts from which we derive the training images. During the writing process of any document, writers naturally introduce variability even within the same time period. In order to address both issues of data scarcity and writing variations within a period, we perform data augmentation by introducing acceptable variation to the data. The small random shape perturbations will, on the one hand, ensure the system’s robustness and, on the other hand, consider variations of writing styles within a particular period. In machine learning, augmentation is an often-used method to counteract the effects of lack of data and imbalance in sampling MUMUNI2022augmentation . We augment training and testing data by generating synthetic images using random geometric distortions bulacu2009morph .

We perform data augmentation using applying random elastic ‘rubber-sheet’ transforms. For each pixel $(i,j)$ of the column images, a random displacement vector $(\Delta x,\Delta y)$ is generated. The complete image’s displacement field is smoothed using a Gaussian convolution kernel with a standard deviation $\sigma$ . We then rescale the field to an average amplitude $A$ . The new morphed image $(i^{\prime},j^{\prime})$ is generated using the displacement field and bilinear interpolation:

i^{\prime}=i+\Delta x,j^{\prime}=j+\Delta y.

(1)

Two parameters control this morphing process: the smoothing radius $\sigma$ and the average pixel displacement $A$ . Both parameters are measured in units of pixels. In our experiment, we empirically chose a displacement value of $1.0$ and a smoothing radius of $8.0$ (see Figure 14).

E.3 Allographic codebook with neural networks

After binarization with BiNet dhali2019binet , connected components of ink were fragmented on Y-minima, to prevent large blobs of multi-character components, yielding ‘fraglets’. For each fraglet, the contour curve was determined, running over the edge of a connected component in a counter-clockwise manner. Each contour-pixel sequence is ‘time’ normalized to 200 samples, (cosine, sine) pairs, yielding a feature vector of 400 values. Using the Kohonen Kohonen1982 self-organizing map neural network, codebooks of $70\times 70$ and $80\times 80$ prototypical contours were computed PlosOne (see Figure 15). As a proof of concept, $590$ manuscripts from the Dead Sea Scrolls collection were manually labeled as ‘Hasmonaean’ ( $Nhas=307$ ) or ‘Herodian’ ( $Nher=283$ ) by a palaeographer. During training on half of the data, each codebook element obtained the counts for its occurrence in ‘Hasmonaean’ or ‘Herodian’ manuscripts, respectively. During testing, each manuscript is characterized by its relative occurrence of Herodian-like vs. Hasmonaean-like fraglets. Using the 80x80 map and applying a linear SVM SVMlite on this 2D feature representation, a classification accuracy of 93 ( $\pm$ 2.3%) was obtained, computed over 20 random odd/even splits of the 590 manuscripts. Individual accuracy test results: 90.9, 89.2, 93.9, 91.2, 94.2, 90.2, 93.2, 94.2, 91.2, 95.3, 92.9, 96.3, 94.6, 92.2, 95.3, 92.5, 96.6, 88.8, 94.9, 93.2 (%). On the basis of this pilot experiment and earlier work PlosOne , the allographic feature was deemed a usable candidate for the more fine-grained manuscript-dating algorithm using the carbon-dated training samples.

E.4 Textural-level features

Similarly, the ‘Hinge’ feature Bulacu2007 ; PlosOne was chosen because of its ability to capture curvature-related differences between different samples of handwriting (see Figure 16). It addresses the occurrence of different degrees of roundness or sharpness of the path described by the edge between ink traces and paper. Its ability to classify between ‘Hasmonean’ and ‘Herodian’ styles is less powerful than in the case of allographic fraglets. Using the nearest mean and the Chi-square distance on a 195-dim hinge feature delivers 63.5% accuracy ( $\pm$ 2.9%). This dimensionality is still high, and collinearity problems due to feature correlation need to be avoided. Subsequently using PCA, selecting the 15 largest eigenvectors and applying a linear SVM for this binary classification task yields 73.1% accuracy ( $\pm$ 0.24%). Still, on the basis of the complementary nature of the allographic and textural feature methods, it was decided to include the Hinge feature for the manuscript dating problem.

E.5 Adjoined feature

As shown in bulacu2006 , the combination of a fraglet codebook and the hinge feature proved to be very effective in writer identification. The assumption in the current study is that different historical style periods are revealed by the statistical characteristics both of allographic shape fragments and of angular distributions. Consider, for instance, a manuscript with predominantly vertical and horizontal strokes (‘formal’) and a manuscript written in a more informal (‘cursive’) style, both containing their characteristic shape elements in individual characters. The use of the two feature methods together will capture the underlying shape differences. The feature combination is realized by an adjoining of the two feature vectors: the arrays of feature values are combined in a single array containing the combined descriptor. Adjoined features are the weighted combination of both Hinge and Fraglet. The adjoining results in a feature vector of $5365$ dimensions, preserving the handwriting style description from both feature levels.

E.6 Date-prediction model

We employ our date-prediction model once the features are calculated from the images. Given, for each manuscript, a style feature vector, we now address the transformation of this representation into an OxCal-type curve, i.e., a vector containing the estimated date probabilities for the sample. Because of the small size of the data set, high-parametric models such as period-specific temporal codebooks He2016 cannot be used here. We use conditional modeling using Bayesian Ridge regression Hoerl2000 that applies Bayesian inference to estimate the model parameters for date prediction. First, a prior distribution is placed on the model parameters, which expresses known constraints on the values of the parameters. The prior distribution is then updated with the observed data using Bayes’ rule to obtain the parameters’ posterior distribution and predicted dates.

We propose the Bayesian approach due to the nature of our target output data: the ¹⁴C data are not a single point on the timeline but are given as a distribution of probable dates within sigma ( $\sigma$ ) ranges. Hence, the probabilistic approach allows the use of all available information while remaining explainable. Furthermore, we can observe a full posterior distribution, which is used to assess the uncertainty of the estimated dates. Finally, the Bayesian approach also allows the model to indicate error margins for predictions on unseen data.

E.6.1 Unmodelled values from OxCal

The input of the date prediction model is the feature vectors of the training images, along with their probability distribution from radiocarbon dating as labels. We obtain the probability distribution as unmodelled raw values of 5-year resolution after the radiocarbon calibration was performed using OxCal v4.4.2 Oxcal ; Oxcal2 . We created a new project code for the 26 sample manuscripts using the ¹⁴C age (BP) and sigma (BP) values (see Table LABEL:tab:summarized-c14 for the BP values. For the code, please check the Zenodo repository (https://doi.org/10.5281/zenodo.11371749)). Using the measured BP values, the code (entirely reproducible) uses the simple format:

Plot()
  {
  R_Date("Q-number", age(BP), sigma(BP));
   };

OxCal generates the clean unmodelled (BCE/CE) probability values in a .csv file once the code is run. We obtain these values for each individual sample from OxCal options: View $\gg$ Raw output. Please note that we do not specify any resolution in our code. Hence, our raw data are in the default resolution of 5 years, which is the same as the resolution of the IntCal curves, so no interpolation or binning is needed. It is possible to set the resolution to less than 5. Then, the curve will be interpolated by a cubic (or linear if that option is set) function by OxCal (as explained in Section B.4 in Appendix B).

E.6.2 Calibrated dates from 2-sigma ranges

Having performed the radiocarbon dating in its entirety (Appendix B) and then the palaeographic evaluation of the calibrated dates (Appendix D), as was also done in Bonani1992 ; Jull1995 , we use the calibrated dates from the 2 $\sigma$ range for firmer grounding of our date-prediction model. In the case of bimodal evidence, palaeography determines that in most cases the younger 2 $\sigma$ peak should be used for analysis. But, palaeography cannot be characterised as a specific quantitative prior (an expected date with mean and standard deviation) in the timeline. The issue is that we cannot assume a single point-spread density (Gaussian) along the time axis. Palaeography, in this sense, does not deliver a point-wise prior. What palaeography can deliver, is the identification of a point on the timeline that represents the historical impossibility of a range of dates on the left or right. This knowledge is based on intersubjective expert knowledge (Section D.2). Therefore, the use of expert knowledge in our case is not based on the usual Bayesian-plus-Gaussian method but on a more direct use of existing, qualitative, and intersubjective palaeographic knowledge, which allows splitting the 2 $\sigma$ calibrated date range into a ‘left-half’ vs. ‘right-half’ time region of interest.

Specifically, palaeographic knowledge allows to make a binary split in the OxCal distribution of bimodal 2 $\sigma$ ranges using the Heaviside function with the position of the step being placed at an innocuous low-probability point on the curve, where the probability has a plateau around zero. Applying a Heaviside multiplicative bias on the empirical density function is a valid Bayesian approach to perform peak selection. Another, e.g., smooth logistic variant of the step function or a cumulative Gaussian could have been used. In either case this would require the specification of a steepness parameter, the value of which is unknown. Similarly, from the palaeographic constraints, the standard deviation which would be needed under a Gaussian, i.e., point-localized density assumption in the Bayesian reasoning process, is not known. Note that apart from the Gaussian, many other distribution functions exist, e.g., Poisson, Weibull, Gamma, etc., for other applications in Bayesian reasoning. Such distribution functions could be used, if there are reasons to assume them. We do not have any reason for a detailed distribution choice and can only choose to disregard ‘impossible time regions’. In our case, for most bimodal 2 $\sigma$ calibrated ranges we assume that the palaeographic evaluation that a left-most or right-most) region is impossible is correct (see Section D.2 in Appendix D), leading to a collapse of the probabilities in that range. The shape of the remaining distribution reflects the likelihood of dates.

Thus, the procedure we use is as follows:

1.

We perform the radiocarbon dating in its entirety, with the calibrated dates having been generated using OxCal data with only ¹⁴C (BP) and $\sigma$ (BP), see Appendix B.
2.

We use the calibrated dates from the 2 $\sigma$ range for firmer grounding of our date-prediction model. Only in the case of bimodal evidence in these 2 $\sigma$ ranges do we apply the Heaviside-function at a near-zero probability point on the curve to reject older peaks and accept younger/‘right-hand’ peaks as a possible solution, on the basis of expert palaeographic knowledge (Appendix D).
3.

From OxCal, we obtain the raw data of the probability densities of the 2 $\sigma$ ranges, which are used as input in our date-prediction model.
4.

We work with the inclusion and exclusion of so-called minor or smaller probability peaks, which in 12 out of 14 instances have a probability of less than $4\%$ ; in the remaining two cases, it is 5.2% and 9.4%. The inclusion or exclusion of these peaks has minimal and insignificant consequences for the interpretation of the results (see Section E.7).
5.

Because applying the Heaviside function for bimodal evidence leaves less than $95.4\%$ of the entire 2- $\sigma$ probability for each sample, we normalise the accepted 2 $\sigma$ calibrated probabilities. The output probability predictions of the dating model are also balanced and normalised using both weights and data augmentation (see Section E.7).
6.

Within the accepted part of the 95.4% confidence range, the points in the probability distribution curve as calculated by OxCal are used as target values in the training of Enoch. The output distribution delivered by the Enoch model is a mixture of Gaussians approximating the shape of the OxCal curve. From those outputs, we select the 1 $\sigma$ range for a clear, narrow visualization of the predicted date ranges (see Figure 1 in the main article). This choice is independent of the original selection of OxCal calibration ranges because the two methods are fundamentally different.

We emphasise that we do not perform modelling within the OxCal programme as is commonly used in radiocarbon dating practice. This does not compromise the transparency and reproducibility of our procedure. Using the ¹⁴C dates in BP and their measurement uncertainties ( $\sigma$ ), all plots can be reproduced using OxCal. Our reasoning for rejecting part of the bimodal data (see Appendix D) and the exact 2 $\sigma$ probabilities we use (see Table E.6.2) are provided. Yet, other researchers can also use all our ¹⁴C data instead of following our reasoning for accepting part of the bimodal data, and justify their reasoning.

Table 8: Unmodelled radiocarbon calibrated dates for 2

\sigma

ranges. Please note that the 2

\sigma

values are the same as Table LABEL:tab:summarized-c14 in Appendix B. However, in this table, the highlighted date ranges indicate each sample’s accepted 2

\sigma

intervals for the Enoch model.

Q-number	2 $\sigma$ range		2 $\sigma$ range		2 $\sigma$ range		2 $\sigma$ range
4Q504	-355	-285	-230	-150
4Q52	-410	-355	-285	-230
4Q176	-355	-300	-210	-100	-70	-60
4Q114	-355	-285	-230	-160
5_6Hev1b	10	205
4Q161	-90	-80	-55	30	45	60
4Q70	-375	-345	-320	-200
4Q47	-355	-290	-210	-100
4Q23	-355	-285	-230	-220	-210	-95	-75	-55
4Q255_4Q433a	-170	-50
11Q5	-35	-15	5	120
4Q3	-340	-325	-200	-50
4Q27	-340	-330	-200	-50
Mas1k	-50	65
4Q206	-360	-280	-235	-145	-135	-120
4Q30	-360	-275	-260	-245	-235	-165
4Q201_4Q338	-165	-40	-10	-1
4Q259	-350	-310	-210	-100	-70	-55
4Q416	-345	-320	-205	-90	-80	-50
4Q2	-155	-130	-125	10
4Q375	-345	-320	-205	-50
Xhev_Se2	-45	75
4Q541	-355	-300	-210	-95	-75	-55
4Q521	-355	-285	-230	-100
4Q267	-355	-290	-210	-95	-70	-55
Mur19	-45	85	95	110

In the following subsections, we present the mathematical derivation of the Bayesian regression from simple linear regression as used in Enoch, our date prediction model.

E.6.3 Linear regression

Given a set of training data $\{(\mathbf{x_{n}},t_{n})\}^{N}_{n=1}$ comprising $N$ observations of dimensionality $M$ , where $\mathbf{x}_{n}\in\mathbb{R}^{M}$ , $t_{n}\in\mathbb{R}$ , the goal in a regression model is to find a linear mapping $f:\mathbb{R}^{M}\rightarrow\mathbb{R}$ which approximates $t_{n}$ given $\mathbf{x_{n}}$ as close as possible. Furthermore, the mapping should generalize to values outside the training data. From a probabilistic perspective, the aim is to model the predictive distribution $p(t_{n}\mid\mathbf{x}_{n})$ . In a linear regression model, the assumption is that the target variable $t$ is given by a deterministic function $f(\mathbf{x}_{n},\mathbf{w})$ with added Gaussian noise, such that:

t_{n}=f(\mathbf{x}_{n},\mathbf{w})+\epsilon

(2)

where $\epsilon$ is a Gaussian random variable with mean 0 and inverse variance parameter $\beta$ , also called the precision. In a linear model where $f(\mathbf{x}_{n},\mathbf{w})=\mathbf{w}^{T}\mathbf{x}$ , the predictive distribution takes the form

p(t_{n}\mid\mathbf{x}_{n})=\mathcal{N}(\mathbf{w}^{T}\mathbf{x}_{n},\beta^{-1}).

(3)

Even though the predictive distribution is indirectly used to optimize the parameters of the linear regression model, we do not explicitly model this distribution. We will see in the next section that in the Bayesian interpretation of linear regression, we instead stay within a probabilistic framework and model the full predictive distribution, leading to several advantages over the standard linear regression model.

We first turn to parameter estimation for a linear model. This means estimating a value for the weight vector $\mathbf{w}$ that fits the data well. Most commonly, the least-squares criterion is used to estimate the weight vector $\mathbf{w}$ :

\mathbf{w}^{*}=\operatorname*{arg\,min}_{\mathbf{w}^{*}}\sum_{n=1}^{N}(t_{n}-% \mathbf{w}^{T}\mathbf{x}_{n})^{2}

(4)

This can be justified using maximum likelihood estimation if we assume that the training data is independent and identically distributed (i.i.d.). This works as follows. Let $\mathbf{X}=(\mathbf{x}_{1},\dots,\mathbf{x}_{N})^{T}$ and $\mathbf{t}=(t_{1},\dots,t_{N})^{T}$ . The log-likelihood of the training data can then be written as

$\displaystyle\ln p(\mathbf{t}\mid\mathbf{X},\mathbf{w},\beta)$	$\displaystyle=\ln\prod^{N}_{n=1}p(t_{n}\mid\mathbf{x}_{n},\mathbf{w},\beta)$
	$\displaystyle=\ln\prod^{N}_{n=1}\mathcal{N}(t_{n}\mid\mathbf{w}^{T}\mathbf{x}_% {n},\beta^{-1})$
	$\displaystyle=\sum^{N}_{n=1}\ln\mathcal{N}(t_{n}\mid\mathbf{w}^{T}\mathbf{x}_{% n},\beta^{-1})$
	$\displaystyle=\sum_{n=1}^{N}\ln\{(2\pi)^{-1/2}\beta^{1/2}\text{exp}(-\frac{% \beta}{2}(t_{n}-\mathbf{w}^{T}\mathbf{x}_{n})^{2})\}$
	$\displaystyle=\frac{N}{2}\ln\beta-\frac{N}{2}\ln 2\pi-\beta E_{D}(\mathbf{w}),$	(5)

where we make use of (3). The $E_{D}(\mathbf{w})$ term represents a sum-of-squares function, defined as

E_{D}(\mathbf{w})=\frac{1}{2}\sum_{n=1}^{N}(t_{n}-\mathbf{w}^{T}\mathbf{x}_{n}% )^{2}.

(6)

Considering that maximizing the likelihood function with respect to $\mathbf{w}$ only depends on $E_{D}(\mathbf{w})$ , expression (5) can be maximized by maximizing $-E_{D}(\mathbf{w})$ , or equivalently, minimizing $E_{D}(\mathbf{w})$ . This corresponds to the least-squares objective shown in (4).

E.6.4 Ridge regression

We now turn to the ridge regression model, an extension of the linear regression model with more desirable properties, such as mitigating over-fitting. Concretely, we add a prior distribution over the weights $\mathbf{w}$ , leading to the following log-likelihood function:

\ln p(\mathbf{t}\mid\mathbf{X},\mathbf{w},\beta)+\ln p(\mathbf{w}\mid\alpha).

(7)

The prior distribution over the weights can be interpreted with the Bayes rule, showing the relationship to a posterior distribution over $\mathbf{w}$ :

p(\mathbf{w}\mid\mathbf{t})\propto p(\mathbf{t}\mid\mathbf{w})p(\mathbf{w}),

(8)

where we omit the $\mathbf{X}$ , $\alpha$ , and $\beta$ terms to keep the notation uncluttered. In other words, maximizing (7) corresponds to maximizing a posterior distribution over $\mathbf{w}$ . The question now arises what is a suitable form of the prior distribution $p(\mathbf{w})$ ? To ensure that $p(\mathbf{w}\mid\mathbf{t})$ has the same functional form as $p(\mathbf{t}\mid\mathbf{w})$ , we choose $p(\mathbf{w})$ to be a conjugate prior of $p(\mathbf{t}\mid\mathbf{w})$ , namely a multivariate isotropic Gaussian distribution, taken to be zero-centered with precision parameter $\alpha$ . The log-likelihood then becomes

$\displaystyle\ln p(\mathbf{t}\mid\mathbf{X},\mathbf{w},\beta)+\ln p(\mathbf{w}% \mid\alpha)$	$\displaystyle=$
$\displaystyle\sum^{N}_{n=1}\{\ln\mathcal{N}(t_{n}\mid\mathbf{w}^{T}\mathbf{x}_% {n},\beta^{-1})\}+\ln\mathcal{N}(\mathbf{w}\mid 0,\alpha^{-1}\mathbf{I})$		(9)
	$\displaystyle=$	(10)
$\displaystyle\frac{N}{2}\ln\beta-\frac{N}{2}\ln 2\pi-\beta E_{D}(\mathbf{w})+% \frac{M}{2}\ln\alpha-\frac{M}{2}\ln 2\pi-\alpha E_{W}(\mathbf{w}),$		(11)

where $M$ denotes the number of dimensions of the weight parameter $\mathbf{w}$ and $I$ denotes the identity matrix. Note that the first three summands of (11) correspond to (5). The $E_{W}(\mathbf{w})$ term represent a regularization term, defined by

E_{W}(\mathbf{w})=\frac{1}{2}\mathbf{w}^{T}\mathbf{w}.

(12)

By removing terms from (11) that do not depend on $\mathbf{w}$ , we end up minimizing the sum of two terms, $E_{D}(\mathbf{w})$ and $E_{W}(\mathbf{w})$ , denoting the data-dependent error and the regularization error, respectively. The relative importance of both terms is controlled by the $\alpha$ and $\beta$ hyperparameters. Equivalently, we minimize:

\beta E_{D}(\mathbf{w})+\alpha E_{W}(\mathbf{w}).

(13)

If we combine $\alpha$ and $\beta$ into one hyperparameter $\lambda=\alpha/\beta$ , we can equivalently write

E_{D}(\mathbf{w})+\lambda E_{W}(\mathbf{w}),

(14)

Which corresponds to

\frac{1}{2}\sum_{n=1}^{N}(\mathbf{w}^{T}\mathbf{x}_{n}-t_{n})^{2}+\frac{% \lambda}{2}\mathbf{w}^{T}\mathbf{w},

(15)

Which forms the ridge-regression objective function. The $\lambda$ hyperparameter can control the degree of parameter shrinkage hastie2009elements , whereby the weight parameters are shrunk by imposing a penalty on their size. This brings the additional task of setting $\lambda$ , generally done using cross-validation. By setting the gradient of (15) with respect to $\mathbf{w}$ to 0 and solving for $\mathbf{w}$ , the approximate solution can be expressed in closed form using the standard equations:

\mathbf{w}=(\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^{T}% \mathbf{t}.

(16)

E.6.5 Bayesian regression

We now turn to a Bayesian treatment of the ridge regression model discussed in the previous subsection. First, consider the relationship we established between the posterior over $\mathbf{w}$ and the product of the likelihood and prior, as shown in (8). This is the point at which the ridge and Bayesian regression models diverge in their approach. For the ridge regression model, a point estimate for the weight vector $\mathbf{w}$ is obtained by using maximum a posteriori estimation (MAP), which involves maximizing the right-hand side of (8). We now discuss the alternative, fully Bayesian treatment, which explicitly models the posterior distribution on the left-hand side of (8).

Recall that we defined the prior distribution $p(\mathbf{w})$ as a conjugate prior to the likelihood function, leading to a multivariate Gaussian distribution. The result is that the posterior $p(\mathbf{w}\mid\mathbf{t},\mathbf{X})$ also will have a Gaussian distribution. We can thus rewrite (8) to:

\mathcal{N}(\mathbf{m}_{N},\mathbf{S}_{N})\,\propto\,\mathcal{N}(\mathbf{X}% \mathbf{w},\beta^{-1}\mathbf{I})\mathcal{N}(0,\alpha^{-1}\mathbf{I}),

(17)

where the posterior is a Gaussian with mean $\mathbf{m}_{N}$ and covariance $\mathbf{S}_{N}$ . We can use the Bayes theorem for Gaussian random variables to find $\mathbf{m}_{N}$ and $\mathbf{S}_{N}$ . From this, it follows:

	$\displaystyle\mathbf{m}_{N}$	$\displaystyle=\beta\mathbf{S}_{N}\mathbf{X}^{T}\mathbf{t}$		(18)
	$\displaystyle\mathbf{S}_{N}^{-1}$	$\displaystyle=\beta\mathbf{X}^{T}\mathbf{X}+\alpha\mathbf{I}.$		(19)

It is worth noting the correspondence between the point estimate of $\mathbf{w}$ obtained in the ridge regression solution (16) and the mean of the posterior $\mathbf{m}_{N}$ . If we fully write out $\mathbf{m}_{N}$ , we see that

	$\displaystyle\mathbf{m}_{N}$	$\displaystyle=\beta(\beta\mathbf{X}^{T}\mathbf{X}+\alpha\mathbf{I})^{-1}% \mathbf{X}^{T}\mathbf{t}$
		$\displaystyle=\beta(\beta(\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I}))^{-1}% \mathbf{X}^{T}\mathbf{t}$
		$\displaystyle=\beta(\beta^{-1}(\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I})^{-1% })\mathbf{X}^{T}\mathbf{t}$
		$\displaystyle=(\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^{T}% \mathbf{t},$

which corresponds to (16). This means that the mode of the posterior distribution corresponds to the ridge regression solution. However, we use the full posterior distribution over $\mathbf{w}$ in the Bayesian regression approach rather than taking the mean $\mathbf{m}_{N}$ as a point estimate. This works as follows. We first note that once we obtained the posterior distribution over $\mathbf{w}$ , the predictive distribution informed by the training data can now be written as

p(t\mid\mathbf{x},\mathbf{t},\mathbf{X})=\int p(t\mid\mathbf{x},\mathbf{w})p(% \mathbf{w}\mid\mathbf{t},\mathbf{X})\,d\mathbf{w}

(20)

for an input $\mathbf{x}$ , where we once again omit the $\alpha$ and $\beta$ terms for readability. Noting that (20) is a marginal distribution and a convolution of two Gaussians, we can once again make use of Bayes theorem for Gaussian variables, resulting in the predictive distribution

p(t\mid\mathbf{x},\mathbf{t},\mathbf{X},\alpha,\beta)=\mathcal{N}(t\mid\mathbf% {m}_{N}^{T}\mathbf{x},\beta^{-1}+\mathbf{x}^{T}\mathbf{S}_{N}\mathbf{x}).

(21)

The mean of this distribution is simply the mean of the posterior distribution multiplied by the input vector. As can be seen from the variance parameter of this equation, the predictive variance associated with an input $\mathbf{x}$ consists of a sum of two terms, which can be understood as follows. The first term expresses variance due to the noise in the training data. The second term describes the uncertainty associated with $\mathbf{w}$ , which varies according to the input $\mathbf{x}$ .

Given this predictive distribution, we can make predictions for new input values by calculating the conditional expectation,

\mathbb{E}[t\mid\mathbf{x},\mathbf{t},\mathbf{X}]=\int t\,p(t\mid\mathbf{x},% \mathbf{t},\mathbf{X})\,dt.

(22)

An alternative is to directly take the mean $\mathbf{m}_{N}$ of the posterior as an estimate for $\mathbf{w}$ , which is used in some implementations of the Bayesian regression model scikit-learn-bayesian-ridge .

E.6.6 Hyperparameter selection

In a Bayesian framework, defining a prior distribution over one or both hyperparameters, also known as a hyperprior, can be used in finding hyperparameters using cross-validation. We can then marginalize all the parameters, which leads to a predictive distribution of the form.

p(t\mid\mathbf{x},\mathbf{t},\mathbf{X})=\iiint(t\mid\mathbf{x},\mathbf{w})p(% \mathbf{w}\mid\mathbf{t},\mathbf{X},\alpha,\beta)p(\alpha,\beta\mid\mathbf{t},% \mathbf{X})\,d\mathbf{w}\,d\alpha\,d\beta.

(23)

Unfortunately, this expression is analytically intractable. Nevertheless, a framework for calculating an approximation, named evidence approximation bishop2006pattern , can compute estimates for $\alpha$ and $\beta$ . This framework is also referred to as type II maximum likelihood, which involves maximizing the marginal likelihood function $p(\mathbf{t}\mid\alpha,\beta)$ , where $\mathbf{w}$ has been integrated out. We will not go into depth into the evidence approximation framework. For more extensive treatment, the reader is referred to the statistical bookbishop2006pattern . It should be noted that this approach, where we include the hyperparameters as part of the training process by regarding them as random variables, does not necessarily lead to better estimates than those obtained with cross-validation. Nevertheless, automatically finding hyperparameters as part of the training process can be helpful in certain situations, for example, if cross-validation is not feasible.

The output of the date prediction model is a probability estimate for each 10-year bin in our timeline, along with error margins to estimate uncertainty. Within the Bayesian regression, we apply parameter constraints to restrict the uncertainties to non-negative values as they do not impact the probability estimation of our model, and the final results and interpretation. However, future research can explore the feasibility of an asymmetric error estimation above the x-axis. The choice of the 10-year bin is made empirically, and we keep the option of changing the bins to 5 or 15 for either thinner or thicker plots.

E.7 Data balancing

In addition to our original training data and the date prediction model described in the previous sections, we also employ two types of data balancing techniques to help reduce the time-axis bias in the training data. As can be seen in Figures 17 and 18, the training data is biased in the sense that there are many more high probabilities in the -200 to the -150 region. This creates much higher priors in that region. However, this bias is caused only by the samples that were chosen to be radiocarbon-dated and are not representative of the actual prior probability for the whole Dead Sea Scrolls collection. In order to make the predictions less dependent on the priors within the training data, two data-balancing strategies were implemented.

The first method concerns balancing using weights, where the output probabilities from the model are dampened or boosted based on the weights provided by the overall accumulated distribution seen in Figures 17 and 18.

The other data-balancing implementation was through augmentation, where underrepresented training data was compensatorily oversampled based on the overall accumulated distribution. The technical details for both implementations will be described in the following subsections.

Please note that in addition to cross-validation and leave-one-out statistical tests (see Section E.9.1), we also check the sensitivity of the model with the inclusion and exclusion of minor peaks on the (accepted) 2 $\sigma$ ranges. Figures 17 and 18 already show minimum changes over the overall probability distribution. This can be better visualized from Figure 19. The Euclidean distance calculated over the whole range sampled by five years is $0.104$ between the two (accepted) ranges (with and without minor peaks). The chi-square and Bhattacharyya distances are $0.124$ and $0.044$ , respectively, showing no significant changes in the overall probability distribution. The predicted test results also remain unchanged. It is important to note, that incorporating the minor peaks did not lead to horizontal time shifts in existing high-probability peaks, at all (see Figure 19).

E.7.1 Balance using weights

Given probability $p_{i}$ where $i$ is a given year, threshold $T$ , (binned) accumulated C14 probability $cum\_c14$ , maximum accumulated C14 probability $M=\max(cum\_c14)$ , and the number of summations that generated the accumulated probability in a bin $n_{cum\_c14_{i}}$ . The weighted probability of each bin $w_{p_{i}}$ is calculated as:

w_{p_{i}}=\begin{cases}\frac{p_{i}}{cum\_c14_{i}}&\text{if }(p_{i}>T\cdot M)% \text{ and }(n_{cum\_c14_{i}}>2)\\ p_{i}&\text{otherwise}\end{cases}

(24)

The weighted probabilities are then normalized to ensure that the scale of the weighted predictions is consistent with the original predictions. The process is described below.

Given the global maximum probability values in the original predictions, $max\_p$ , and in the weighted predictions, $max\_weighted\_p$ , the normalization process for each probability value $w_{p_{i}}$ in the weighted predictions is described as follows:

Calculate the normalized probability $w_{p\_norm_{i}}$ :

w_{p\_norm_{i}}=\frac{w_{p_{i}}}{max\_weighted\_p}\cdot max\_p

(25)

E.8 Balance using augmentation

For augmentation, five scrolls were chosen to be duplicated in the training data in order to boost the underrepresented prior probabilities within the training data. Table 9 details the scrolls and number of fragments after the augmentation was applied, and Figure 20 shows the effect this had on the overall accumulated probabilities.

Table 9: Table detailing number of duplications after augmentation procedure

Scroll

Number of fragments

originally in the training data

Number of fragments

after augmentation

4Q2

4Q161

5_6Hev1b

11Q5

Mas1k

XHev_Se2

E.9 Training options

For training, we create three main pools of data:

•

¹⁴C dated manuscripts
•

¹⁴C dated manuscripts with the addition of the old ¹⁴C from the 1990s
•

¹⁴C dated manuscripts with augmentation

Within each of these training datasets, there are different possible subsets of the training data that are usable (all the following subsets include the ¹⁴C dated manuscripts):

•

4Q52 can be included or excluded
•

Internally dated scrolls can be included or excluded
•

Maresha Ostracon can be included or excluded

E.9.1 Leave-one-out statistical test

In order to test whether the model predictions are robust, we performed a leave-one-out statistical test. The ‘leave-one-out’ (LOO) statistical test is a resampling technique used to evaluate the performance or robustness of a statistical model. In LOO, each sample in our training data is sequentially removed (left out), and the model is then trained on the remaining data points. The left-out observation is then used to evaluate the model’s performance or make predictions. This process is repeated for each observation in the training data. The goal of this statistical test is to get an indication of the variability of the model predictions by checking whether the model is overfitting the training data and the impact of outliers on the model’s performance.

LOO is commonly employed when the dataset is small (which is the case for our dataset) or when it is important to assess the model’s performance on each individual observation. The LOO test is a valuable tool for assessing the reliability and generalizability of a statistical model by iteratively evaluating its performance on subsets of the training data while systematically excluding each sample. Given a training set of size $N$ , we train $N$ models by leaving out one different data point for each model and training with the remaining $N-1$ data points. Then, on the test set, we make predictions using all $N$ models and overlap the resulting predictions obtained from each model. This gives a visual representation of the amount of variation between the predictions of the different models.

E.9.2 Gaussian of Gaussian

We obtain the probability and error margin for each 10-year bin from the date-prediction model. This gives us vertical Gaussian for each 10-year over the entire timeline. To convert them into a single Gaussian on the horizontal time axis, we perform a tool, dubbed ‘Gaussian of Gaussian’ on the predicted dates.

This program generates $1000$ iterative attempts of randomly drawing a wave shape instance from the $n-bins$ distribution (over the entire timeline), assuming Gaussians $(\mu,\sigma)$ per bin. The max $y-peak$ position of the wave shape can be detected along the $x-axis$ : For our manuscript dating problem, $x$ represents the year value. In this manner, it becomes possible to estimate the uncertainty of peak detection in the style-based OxCal approximation, and its effect on the date estimation. This addition to the Enoch method allows to obtain an estimate of date variability, similar to the output of OxCal itself. We explored whether smoothed distribution shapes were needed for this, but a detailed analysis fortunately revealed that the method could be kept simple: Smoothing of the shape often led to an $x-axis$ shift and an increase of the $x$ variability. The shape asymmetry of the (assumed) peak shape causes this time bias. Hence, we avoided any smoothing and used the raw, unfiltered generated histograms. The implicit assumption is that the ‘maximum’ co-occurs with a peak. Comparative plots for different information sources are obtained using the ‘Gaussian of Gaussian’ (see Appendix H).

Appendix F On the use of pre-trained deep learning methods for image-based dating

F.1 Considerations on the use of training deep learning neural networks on a problem with only 24 examples

Since the mathematical proof by Hornik Hornik1989 , it took some time but today, deep multilayer neural networks have excelled in many applications, especially since the advent of large data sets and the increase in computational power. However, as observed in the introduction of this article, the likelihood of success is low when training a deep network with too many parameters on a tiny data set. There is a serious risk of an ‘overfit’, i.e., a computed solution that appears to be performant on a training data set but fails to generalize (interpolate) properly when presented with unseen data Vapnik2000 . We have looked at a list of 44 modern deep-learning vision models that were published since 2010 and were cited minimally 100 times. Such models have, on average, 454 million weights (coefficients) which are computed from 715 million data points in training, on average, i.e., per single model. A ‘data point’ is a tuple of an image and its corresponding desired model output vector for classification, regression or generative task. The meta-analysis table is kindly provided by villalobos2022run ; epochMachineLearningData2022 . The most recent, transformer-based, models will even have billions of parameters and data points. It is evident that such large modern models can never be trained from scratch on a data set with just a few dozen, i.e., 24, radiocarbon-dated images as data points, for our problem.

An alternative approach would be the use of deep transfer learning TransferZhuang2021 ; ribani2019survey where an existing deep neural network, trained on a sufficiently large image data set Krizhevsky2017 , is fine-tuned on a smaller set of ¹⁴C-based dated images. In such a case, a hidden layer from a frozen pre-trained network is chosen as the shape-feature vector, and a new post-processing multilayer perceptron or dense network layer is trained to transform that feature vector to produce the output vector required by the actual task, for a given input image sample. We will mention five objectionable points to the use of deep transfer learning for the date prediction task.

Point 1. It is questionable whether currently common networks that are trained and designed for natural full-colour RGB photographic image classification will deliver a shape feature vector in their penultimate layer that is optimal for writer identification in bi-tonal manuscript images. Bitonal manuscript images have a flat-white background, and the interesting patterns are in the ink traces only. Such material is rarely present in generic photographic image collections. At the very least, there will be serious worries concerning efficiency, because about two-thirds of the connection weights are likely to be superfluous.

Point 2. Even if the colour-channel argument is dismissed, end users may argue that an opaque neural Krizhevsky2017 network or vision transformer ViTpaperDosivitskiy2020 method that is pretrained on non-representative image material (’photos of cats, dogs and urban scenes, etc.’) would not be acceptable for answering scholarly questions. To put this in comparison, current deep foundation models are not considered a good basis for the serious application in medical diagnostics in radiology yet and massive data would need to be collected in order to achieve such a status WilleminkFoundationRadiology2022 .

Point 3. Current deep-learning methods rely on images that are often very small, i.e., 224x224 or 512x512 pixel images. Only recently, with increased memory capacity in GPUs, images of 768x768 pixels can be used. This leads to many problems in the real-world application of deep learning, e.g., in a medical context Thambawita2021 . Our manuscript-image sizes are large and of variable aspect ratio, with widths of $\mu_{w}=3871$ ( $\pm\sigma_{w}=1069$ ) pixels and heights of $\mu_{h}=3857$ ( $\pm\sigma_{h}=740$ ) pixels. On top of the other restrictions mentioned here, using current deep learning would require a downscaling of the high-quality manuscript images with a factor of 5 to 7, with considerable loss of information. Whereas recent vision transformers ViTpaperDosivitskiy2020 are better suited to deal with large images, they are based on extraneous very large image and photographic collections (cf. Point 2).

Point 4. Alternatively, tiling Haja2021 would unnecessarily complicate the analyses because of the likely imbalance of character content between the tiles and damage to the original character appearance at the tile margins. In spite of the success and allure of deep learning, dealing with large, variable-sized images has not been fundamentally solved. This puts a limit on their applicability in several scientific domains. In microscopy Campanella2019 and astronomy ivezic2019lsst , multi-gigapixel, terabyte images are already common. As in our case, current convolutional neural networks, per se, cannot process an original whole image in its unscaled entirety without information loss, e.g., for a prediction task.

Point 5. Regardless of the methodological problems in the face of sparse data, an end-to-end deep learning approach, i.e., transforming image pixels into a date prediction directly, has the disadvantage of limited explainability. If dedicated features can be used that are explainable and a regression model can be trained that requires limited data, such a modular approach has a distinct advantage. Still, it is worthwhile to explore a deep-learned variant for date prediction, as more (radiocarbon-)dated samples will become available.

F.2 An attempt in using state-of-the-art deep-learning methods, PNASNet

However, in order to empirically illustrate the problems with current deep learning, even when used in a transfer-learning setup, we have used a common foundational model (PNASNet) and used its output to estimate a date probability distribution. Using a pretrained PNASNet PNASnet5-Liu2018 , we rescaled each high-resolution manuscript image to the ’passport-photo’ size, which is customary for these models, i.e., $331\times 331$ pixels. We then used the penultimate layer ( $N_{hidden}=4320$ ) of this existing network as a pretrained feature vector for a date-estimation output layer with an ’OxCal’ format probability target function. Figures 21 and 23 show the evolution of the loss curve in a typical training session. Although there is some variance present, the model seems to converge more or less after the presentation of 32,000 batches. However, when looking at the validation curve (Figure 22 and 24), we can see that the loss remains highly irregular. The most likely reason for this behavior is that, in spite of the frozen pre-trained mass of PNASNet, the number of fluid weights that need to be estimated for the transfer task is still in the order of 457k ( $N_{hidden}=4320$ x $N_{OxCal}=106$ ). This number is irresponsibly high, in relation to the small number of images in the data set. The subpar loss level in comparison to training, the horizontality and irregularity of the validation loss curves gives us very strong support for the decision to avoid this pathway. At the very least, it can be concluded that considerable additional research would be needed to improve the DL-transfer performance, from here. Given the particular conditions of this study, we have avoided the use of deep learning for the regression task, waiting for the data sparsity to be solved.

Appendix G Enoch’s date predictions for 135 previously undated manuscripts

Before we discuss the results of the palaeographic post-hoc evaluation of the 135 unseen samples (Section G.4), we elaborate on the physical and image quality of the data, as well as explain how to read Enoch’s prediction plots and elaborate on how Enoch differs from traditional palaeography.

G.1 On the physical and image quality of the data

In order to appreciate how the Enoch model works, it should be noted what challenges the data pose, physically and image-wise.

As we have mentioned before, the Dead Sea Scrolls are extremely delicate material (see Section 1 in the main article and Section B.2 in Appendix B). In a few cases, the physical evidence consists of largely intact bookrolls of several meters in length, such as the Great Isaiah Scroll (see Section 5 in the main article). But in most cases, what were once large and small bookrolls are now only extant as fragmentary, deteriorated remains of various sizes and shapes. This means that the Enoch model has to deal with very diverse material remains that are available as digital images (see Section 2.1 in the main article).

The physical state of the data affects the image quality in various manners. For example, papyrus fragments often have damage patterns that affect the ink remains of the letters differently than fragments of animal skin remains do. Or, some manuscripts are represented by large, relatively well-preserved fragments, whereas others only have one small, badly damaged fragment left. Our binarized images for Enoch sometimes combine different fragments of a manuscript that are available on separate image plates of the IAA (e.g., 4Q86). Thus, the data for Enoch consists of diverse image types. To elaborate on what was briefly mentioned in Section 6.2 in the main article, image preparation treatment is important to further improve Enoch’s prediction results. The model does not change its prediction with the same set of training and testing data, but predictions can change (read “improve”) because of better-cleaned images.

This also means that two or more predictions for the same manuscript can have different results because the underlying data consists of diverse image types that warrant a diverse spread in the plots. Unlike MPS He2016 (or other historical manuscripts), the Dead Sea Scrolls images are all different in shape, orientation, number of characters, ink thickness, etc. Considering 4Q57, for example, there are nine “curves” (plots), which the palaeographers used in the first evaluation, because there are nine images. It should be noted that these are nine different individual images, due to improved/updated preprocessing performed over time, of 4Q57, often of different fragments. As the model receives different images (features), it produces different curves (plots). Each image represents a different set of evidence (from character shapes/features) for each bin. If all nine plots of 4Q57 were exactly the same, then that would be problematic because each image fragment is different even though they are from the same manuscript.

G.2 How to read a prediction plot

Each prediction plot produced by Enoch presents the output of the Bayesian regression model and pertains to an individual manuscript test sample. In the plot, the X-axis delineates the chronological timeline, partitioned into 10-year bins, while the Y-axis conveys probability values in the form of means with error bars. This representation encapsulates the model’s endeavour to infer the probable dating of the manuscripts, each within a 10-year interval, across a temporal expanse from 310 BCE to 200 CE years.

Within each 10-year bin, a pair of values is obtained: the mean and its corresponding error usually expressed as standard deviation. The mean is a point estimate, indicating the central tendency of the predicted manuscript dates for the given bin, thereby proposing an approximation of the most plausible date within the specific timeframe. The associated error, or standard deviation, serves as a critical metric showing the magnitude of variability inherent in the predicted dates and concurrently serves as a measure of uncertainty. An ‘ideal’ date prediction has a high probability and a low error.

The plot looks like a series of bars, like a histogram. By looking at these bars, we look for any patterns in the dates over time and gauge how confident or uncertain we are about these estimates. So, for individual plots, we look at the level of the mean value and the size of the error bars around it, to decide the most probable date or date range for that individual manuscript.

The discrete prediction bars can be mathematically smoothed into continuous curves, yielding Gaussian Mixture Models (GMMs) as a representation. This transformation allows for a more nuanced and probabilistic portrayal of the underlying distribution of predicted manuscript dates. If the smoothed prediction depicts a unimodal distribution, choosing the probable date range is easy (see Figure 25). However, it requires more attention when the prediction is bimodal. The reader then needs to pay more attention to the error bars and the means for each 10-year bin (see Figure 26). This cannot be easily solved with an algorithm: A high probability value is ‘good’, but not if it is accompanied by a large uncertainty. In that case, the choice of a stable estimate with a slightly lower mean probability may be advisable.

G.3 On shared characteristics and finding matches elsewhere

In order to appreciate how the Enoch model differs from traditional palaeographic approaches, we elaborate on what was briefly mentioned in Section 6.2 in the main article, namely that Enoch emphasizes shared characteristics and similarity matching, whereas traditional palaeography focuses on dissimilarities that are assumed indicative for style development.

Enoch’s Bayesian regression model performs the quantitative analysis of textural and allographic feature vectors. These feature vectors encapsulate various handwriting characteristics, offering a systematic representation for predictive modeling. Unlike the traditional approach of human palaeographers, who often seek dissimilarities, this model employs a similarity-based strategy. It strives to uncover patterns and relationships within the feature space by quantifying the resemblance of test images to the training set with known ¹⁴C date distribution. Leveraging a Bayesian framework, the model offers a probabilistic and data-driven means of attributing dates to unseen manuscripts. It thus complements the qualitative expertise of human palaeographers with a quantitative approach that can reveal subtle patterns and associations within the data. 4Q27 provides an excellent example of this approach, with six images in training Enoch and two in test prediction. Figure 27 shows one of the training images, and Figure 28 shows the prediction plot for one of the two test images.

Another example is that of 4Q319 (see Section 5 and Figure 2 in the main article). Here, the AI experts’ preferred reading of the prediction plot (see Table LABEL:tab:expert-AI-135undated) is for the younger range, against the so-called ‘biased range’ of 200–100 BCE. The occurrence of ‘young’ peaks can be seen as shape information suggesting an alternative to the bias (that is present for the 200-100 BCE range). The overall shape on the left-hand side is likely due to the OxCal-based training. However, in spite of the lesser occurrence of younger fragments in the training data, the right-hand part shows that from the style-based analysis a younger date is possible due to the stable results (with small error bars).

G.4 Palaeographic post-hoc evaluation

The 135 unseen samples were chosen for various palaeographic and historical reasons, such as a diachronic cross-section of a biblical book (Psalms), manuscripts that share the same writing style, or for no particular reason at all.

Second, for the evaluation, we took the best image for each manuscript in terms of image quality: better-cleaned images give better results (see Section G.1). The AI experts among the article’s authors, M.D. and L.S., made visual evaluations of the scans in order to ensure that data is of sufficient quality with a sufficient number of characters in the used sample. The list of which specific image for each manuscript sample was used in the evaluation can be seen in Table LABEL:tab:135-images. Still, there remain very poor and difficult images for a number of scrolls to work with but we kept them in the test and did not want to tweak the data. So, in cases where the best available image is still a very poor image, we worked with that (see some examples as illustrations of this very poor quality in Table LABEL:tab:bad25-images). However, we kept all the images so the reader can see all the implications and improvements that can be obtained from careful preprocessing of the images. In Zenodo data repository (https://doi.org/10.5281/zenodo.10629480), the images are organized in three different directories: the first one with all 359 images for the 135 manuscripts, the second one with the selected 135 images, and the final one with 25 images to illustrate the poor quality of images.

Third, we did different balancing tests (see Section E.7 in Appendix E) and so produced different prediction plots. Yet, in the final evaluation we only use the balanced 0.05 plots, which we also indicate in Zenodo (https://doi.org/10.5281/zenodo.10629480), in the description of Organization of the data. All other prediction plots are also available for readers to see the different balancing tests that we have done (see inside Enoch-predictions.tar.gz file in the Zenodo data repository).

Fourth, the AI experts performed a blind reading of the balanced 0.05 prediction plots. They had no knowledge of the manuscript dates and only read the prediction plots, giving estimated minimum and maximum ranges (see Table LABEL:tab:expert-AI-135undated; for more details on how to read a plot, see Sections G.2 and G.3). These estimated minimum and maximum date ranges were then passed on to the palaeographers to assess the outcomes as “realistic” or “unrealistic”.

It is even possible to provide an algorithm to read the plots but the design philosophy of our date prediction model is based on the assumption that it is better to stay close to the known systematics in dealing with OxCal curves with date probabilities than to reside to an ‘oracle’ approach where an algorithm proposes a hard date range. The user can inspect the output of our Enoch model in a similar way as the OxCal curve analysis would ensue.

Fifth, in their qualitative post-hoc evaluation, the palaeography experts among the article’s authors, M.P. and E.T., regarded a date prediction as “realistic” if a prediction corresponds (partly or wholly) with their palaeographic estimates, the basis for which was already explained in detail in Appendix A and Section D.1, and “unrealistic” when it does not. In other words, if there is an overlap between our palaeographic estimates and the machine-learning-based dating, even if the overlap is minimal, we regard the model’s date prediction as “realistic”, and “unrealistic” when there is no overlap, i.e., when it is older or younger (“too old” or “too young”). We provide our palaeographic date estimates for each of the 135 manuscripts (see Table LABEL:tab:expert-AI-135undated), with the general principle in mind that we work with a 50-years range and allow for +/- 25 years on either side. Sometimes, e.g., in the case of quite idiosyncratic handwriting, we allow for an even broader range of 100 years. It should also be noted that if the data are of poor quality, especially if only little material is left and therefore few characters to inspect, then palaeographic estimations are more difficult to make. In other words, the palaeographic dates are not hard date ranges, but expert estimates. Still, for the evaluation we used the 50-years range in a strict sense for reasons of clarity, so that if there was only a 5- or 10-year gap we deemed the prediction as “too old” or “too young”.

Summarized, our post-hoc palaeographic assessment is based on the following considerations:

1.

In line with Cross and all other palaeographers, we make a distinction between Hasmonaean- and Herodian-style writing;
2.

Our palaeographic date estimates of these styles vis-à-vis one another are informed by the traditional view of the Hasmonaean script as being older than the Herodian script. Our ¹⁴C results confirm for most manuscripts the basic distinction between Hasmonaean-type manuscripts that are older, and Herodian-style manuscripts that are younger. Yet, for Herodian-type script, our ¹⁴C results indicate that Herodian script was present earlier than previously thought. Our evaluation of the implications of the ¹⁴C data for Hasmonaean-type script provides evidence for dates in the second century BCE and also allows for the late third century BCE, and for Herodian-type script to be already in existence earlier side by side with Hasmonaean-type script in the second century BCE (see Section D.1.3). Thus, we took into account the general tendency in the ¹⁴C results that date both individual manuscripts and the emergence of the ‘Hasmonaean’ and ‘Herodian’ scripts about 50–75 years earlier than according to traditional palaeography;
3.

Linear typological developments within both Hasmonaean- and Herodian-type script have been stated by scholars, rather than substantiated with external date-bearing evidence. This makes traditional assumptions about “within script” linear typological development problematic, in our view even more so of ‘Herodian’ than of ‘Hasmonaean’. Especially for script generally seen as Late Herodian, we would not exclude a date around the turn of the era or somewhat earlier. We reckon with the possibility of a longevity of script types longer than traditionally assumed. Cross assumed a rapid development of the script from the Hasmonaean period onward. He suggested chronological ranges of 50 years, and sometimes even shorter ranges of 25–50 years for typological developments, but these assumptions remain unsubstantiated.

It should be noted that other researchers can follow our evaluation by taking the range estimates (see Table LABEL:tab:expert-AI-135undated) and/or look at the prediction plots (from Zenodo repository (https://doi.org/10.5281/zenodo.8168210)), then consider the specific images of the manuscripts in question in the IAA’s Leon Levy Dead Sea Scrolls Digital Library collection dssllweb and/or consider our binarized images (from the previously mentioned Zenodo repository), and take into account our considerations (see Appendix A and Section D.1). Or, instead of following our reasoning for a “realistic” or “unrealistic” assessment, they can make their own palaeographic post-hoc assessment, and justify their reasoning.

Also, please note that there are different probability values for each 10-year bin’s prediction within these minimum and maximum ranges. So, AI experts’ minimum and maximum values limit a probable range, but the range is not the final estimated date. One needs to read the probability plots to better estimate within the minimum-maximum range. This means that the range can sometimes be wide, but by reading the probability values along with the uncertainty estimates (or error bars), a reader can even narrow down to a more precise date range if they wish to do so.

The blind range estimation by the AI experts shows the distributions of year ranges in Table 10.

Table 10: Spread estimation (blind-test) by the AI expert

Range	Count	Prozentualer Anteil
280 years	2	1.48%
240 years	1	0.74%
210 years	4	2.96%
190 years	2	1.48%
170 years	6	4.44%
160 years	5	3.70%
150 years	5	3.70%
140 years	4	2.96%
130 years	8	5.93%
120 years	5	3.70%
110 years	8	5.93%
100 years	7	5.19%
90 years	18	13.33%
80 years	9	6.67%
70 years	4	2.96%
60 years	11	8.15%
50 years	22	16.30%
40 years	8	5.93%
30 years	5	3.70%
20 years	1	0.74%
Total:	135	100.00%

Some year ranges are so wide that the date prediction loses its effect of offering a limited number of quantified probability options within the time period under consideration. Fortunately, the instance of wide prediction ranges is limited within the 135 test samples. The definition for “wide range” is informed by the accepted 2 $\sigma$ calibrated ranges which are the training data for the Enoch model and are on average 135 years, including the so-called minor peaks, or 110 years excluding the so-called minor peaks (see Figures 17 and 18, and Table LABEL:tab:2-sigma-accepted). Twenty-nine of the 135 test samples (21%) have a date range of more than 130 years, whereas 42 of the 135 test samples (31%) have a date range of more than 110 years.

In most cases, the date prediction range is well below 135 or 110 years, often only ca. 50 years (16%), which has the highest count of all the ranges (see Table 10).

The current average year value is 69.35 years, excluding wide ranges above 110 years, and 76.32 years, excluding wide ranges above 135 years. If one were to indiscriminately include all ranges, then the current average year value would be 98.76. The median value is 90 years. It should be noted that these average year values, as well as the median value, can change if, in the future, more manuscripts are tested. Also, if the image quality is further improved, these numbers can also be affected and improved (see below).

Most date ranges are indeed below or up to 90 years (78 out of 135 test samples). It should be noted that the possibility was claimed for the traditional palaeographic model to be able to fix a characteristic bookhand or the copying of a manuscript within 50 years or even 25–50 years, but that this was not substantiated with external date-bearing evidence (see Section D.1). Now, our Enoch model can produce prediction ranges of 50 years that are empirically based on physical evidence derived from ¹⁴C and geometric evidence from shape-based analysis. Enoch outperforms the ¹⁴C results: Enoch’s predictions are even narrower than the ¹⁴C date ranges in the time period 300–50 BCE, provide a more fine-grained distribution (as mentioned in Section 4 in the main article).

As can be seen in Table 2 in the main article, 107 (79%) of the 135 undated manuscripts were judged to have obtained a realistic date prediction. Of course, the wider the range of years of prediction plots are, the more manuscripts show an overlap between our palaeographic estimates and the machine-learning-based dating. If we disregard the 42 date predictions with a spread wider than 110 years, then the percentage of realistic predictions drops to 50% (68 out of 135) or to 73% (68 out of 93). Thus, even with a stricter selection rule, only allowing the narrow-range estimates, still a decent percentage of palaeographically realistic evaluations can be obtained from the harvest of undated material. Moreover, if we would also take into account the image quality of the samples and choose instead not to use data of very poor quality then the performance of the Enoch model becomes even more impressive. Twenty-five images are of poor quality (see Table LABEL:tab:bad25), leaving 110 images and samples in the test, of which 91% have a realistic prediction. Again, from these 110 images, if we ignore the 36 date predictions with a spread wider than 110 years, then the percentage of realistic predictions amounts to 61% out of 110, or 89% out of 74.

In the post-hoc evaluation, the palaeographers refrained from a decision in 4 cases (“see comment 1–4” in Table LABEL:tab:expert-AI-135undated). The comments are as follows:

1.

4Q73: we consider this test sample a borderline as we would expect an older dating, ca. 100 BCE or ca. 75 BCE, in view of our considerations, especially the ¹⁴C results for Hasmonaean manuscripts (Section G.4). The traditional palaeographic date estimation, middle of the first century BCE DJD15 , comes close to Enoch’s date prediction;
2.

4Q379: we consider the semicursive script in this manuscript difficult to date. Some semicursive manuscripts are easier to date, but this one is difficult, also according to the traditional palaeographic model there is too little to go on. Therefore, we refrain from a decision; 4Q379 could be around 100 BCE and then the prediction is realistic, but it could also be later. Cf. also DJD22 : the general indication “Hasmonaean semicursive” (263) indicates the difficulty in dating;
3.

4Q398: this is again a manuscript in semicursive script, and difficult to date. Other palaeography experts gave the following dates: Puech, second quarter of the first century BCE Puech2015b ; Yardeni, 50–1 BCE DJD10 . The prediction plot would be compatible with the latter date;
4.

4Q522: typologically, we would characterize the script as late Hasmonaean, but the date of the prediction model seems slightly too old to us. We would expect a slightly younger date, ca. 100–75 BCE, in view of our considerations (Section G.4). The traditional palaeographic date estimate by Puech is late Hasmonaean, second third of the first century BCE Puech1998B .

Two observations on the basis of these comments:

1.

Outside nice formal bookhands, ordering Dead Sea Scrolls manuscripts according to typology can be difficult for palaeographers, especially for the semicursive script. In addition to the physical and image quality of the data (see Section G.1), script diversity can also pose a challenge for the Enoch model. More specifically, Enoch can handle formal and semiformal scripts well in predicting their age range, but manuscripts written in semicursive script are more difficult to date at the current stage. This can be explained by the fact that Enoch was not yet trained enough on this (only two ¹⁴C samples, 4Q114 and 4Q255/4Q433a, are in semicursive script);
2.

The range 100–50 BCE is underrepresented in Enoch’s date predictions. This can be explained by the distribution of ¹⁴C samples across the time line, having little evidence securely fixed for this part of the time line: 4Q201, 4Q255/4Q433a, 4Q27, and 4Q2 cover (part of) the range 100–50 BCE but all of them extend beyond the range as well. Roughly speaking, Enoch predicts Hasmonaean-type manuscripts before 100 BCE and Herodian-type manuscripts after 50 BCE. Still, it should be noted that the range 100–50 BCE is not completely left devoid of Enoch’s prediction plots, as the plots for 4Q185, 4Q554, and 11Q13 show, albeit with a wide range of 150 years for 4Q554.

From the machine-learning perspective, these problems can be sorted out as more samples from critical time periods are added to the training data.

G.5 6 July 2021 test

Earlier in the project, a test was conducted on 6 July 2021. The test consisted of giving manuscripts with unseen ¹⁴C results to the AI experts to see whether Enoch would give date prediction estimates that match the ¹⁴C results. However, at the start of the test, it was unknown to the AI experts that the samples were chosen because of ¹⁴C results being available for them afterward.

The ¹⁴C results were taken from the 1990s ¹⁴C dating of the Dead Sea Scrolls Bonani1992 ; Jull1995 . The assumption was that the manuscripts chosen were not contaminated with castor oil as these manuscripts were not handled by the original team of editors in the 1950s Doudna1998 ; carmi200214c ; rasmussen2009effects . This applies to 1QIsa^a, 1QpHab, 1QapGen, 1QS, 1QH^a, 11Q19, Mas1l.

Two more manuscripts were added for other reasons. 4Q53 was added because scholars assume that it was written by the same scribe as 1QS. 4Q319 was added because it is actually the same manuscript as 4Q259 Hempel2020 , which was subjected to ¹⁴C dating by our own project.

The test was filmed. The film captures the whole process that was conducted in one go. The film can be accessed here: https://doi.org/10.5281/zenodo.8167946

Table 11: AI experts’ (blind) range estimation (est_min and est_max) and palaeography experts’ evaluation (pal_eval) with year range (pal_min and pal_max).

Q-num	est_min	est_max	pal_eval	pal_min	pal_max
1QapGen	-50	-1	realistic	-50	-1
1QH^a	-140	-1	realistic	-50	-1
1QIsa^a	-200	-100	realistic	-175	-125
1QpHab	-40	10	realistic	-25	25
1QS	-190	-100	realistic	-150	-100
2Q3	-40	130	realistic	-50	-1
2Q14	-310	-100	too_old	-75	-25
2Q24	-40	10	realistic	-50	-1
3Q6	-10	120	realistic	-25	25
4Q13	-40	20	realistic	-50	-1
4Q27	-30	20	realistic	-75	-25
4Q28	30	120	too_young	-200	-150
4Q38	-30	10	realistic	-50	-1
4Q38a	-190	-60	too_old	-50	-1
4Q53	-40	10	too_young	-150	-100
4Q57	-80	120	realistic	-1	50
4Q58	30	120	realistic	-1	50
4Q73	-40	10	see_comment 1	-100	-50
4Q76	-190	-150	realistic	-175	-125
4Q83	-210	-150	realistic	-175	-125
4Q84	-200	-50	realistic	-50	-1
4Q85	-140	70	realistic	25	75
4Q86	30	120	too_young	-75	-25
4Q87	-40	90	realistic	-25	25
4Q88	-190	20	realistic	-100	-50
4Q89	-50	120	realistic	25	75
4Q90	-30	-10	realistic	-75	-25
4Q91	-40	10	realistic	1	50
4Q92	-190	-100	realistic	-100	-50
4Q93	-50	30	realistic	-75	-25
4Q94	-30	20	realistic	-50	-1
4Q95	-30	90	realistic	-50	-1
4Q96	-30	10	realistic	-50	-1
4Q97	-50	10	realistic	-50	-1
4Q98	-190	-150	too_old	-50	-1
4Q98a	-40	-10	realistic	-75	-25
4Q98b	-10	120	realistic	25	75
4Q98c	10	120	realistic	25	75
4Q98f	-30	60	too_young	-100	-50
4Q98g	-200	-150	realistic	-175	-125
4Q109	-300	-240	realistic	-250	-150
4Q160	-200	-110	realistic	-175	-125
4Q161	-40	80	realistic	-50	-1
4Q163	-190	-1	realistic	-125	-75
4Q166	-40	70	realistic	-50	-1
4Q167	-50	120	realistic	-50	-1
4Q171	-50	70	realistic	-50	-1
4Q175	-150	-1	realistic	-150	-100
4Q184	-40	80	realistic	-50	-1
4Q185	-190	-80	realistic	-100	-50
4Q196	-190	-110	too_old	-100	-50
4Q203	-170	-60	too_old	-50	-1
4Q212	-100	30	realistic	-75	-25
4Q215	-30	70	realistic	-50	-1
4Q215a	-30	20	realistic	-50	-1
4Q216	-190	-110	too_old	-75	-25
4Q227	-40	30	realistic	-50	-1
4Q252	-40	20	realistic	-50	-1
4Q256	-40	10	realistic	-75	-25
4Q258	-50	20	realistic	-50	-1
4Q266	-190	-100	realistic	-100	-50
4Q267	-170	20	realistic	-50	-1
4Q271	-40	20	realistic	-75	-25
4Q272	-50	10	realistic	-75	-25
4Q274	-170	-40	realistic	-75	-25
4Q276	-30	60	realistic	-75	-25
4Q277	-30	110	realistic	-75	-25
4Q284a	-160	80	realistic	-50	-1
4Q301	-40	-1	realistic	-75	-25
4Q303	-30	110	realistic	-50	-1
4Q319	-40	-1	realistic	-125	-25
4Q325	-40	40	realistic	-50	-1
4Q373	-300	-240	too_old	-100	-50
4Q375	-30	20	realistic	-50	-1
4Q379	-190	-100	see_comment 2	-125	-75
4Q390	-30	70	realistic	-75	-25
4Q391	-200	-110	realistic	-125	-75
4Q394	-200	-110	too_old	-100	-1
4Q397	-40	60	realistic	-50	-1
4Q398	-30	10	see_comment 3	-75	-25
4Q409	-30	20	realistic	-50	-1
4Q410	-40	80	realistic	-50	-1
4Q422	-190	-110	realistic	-150	-100
4Q426	-190	-50	realistic	-100	-50
4Q431	-30	120	realistic	-50	-1
4Q432	-160	120	realistic	-50	-1
4Q434	-30	20	realistic	-75	-25
4Q436	-30	20	realistic	-50	-1
4Q437	-40	20	realistic	-50	-1
4Q439	-40	70	realistic	-25	25
4Q442	-10	120	realistic	-50	-1
4Q448	-30	10	too_young	-100	-50
4Q457	-40	-10	too_young	-150	-100
4Q471a	-160	20	realistic	-50	-1
4Q473	-30	20	realistic	-50	-1
4Q474	-30	20	realistic	-50	-1
4Q475	-30	120	realistic	-50	-1
4Q476	-30	120	realistic	-50	-1
4Q492	30	120	too_young	-75	-25
4Q493	-40	70	realistic	-75	-25
4Q494	-30	20	realistic	-50	-1
4Q501	-30	20	realistic	-75	-25
4Q508	-190	-110	too_old	-50	-1
4Q511	-30	80	realistic	-50	-1
4Q522	-200	-110	see_comment 4	-100	-50
4Q525	-50	10	realistic	-50	-1
4Q530	-30	10	too_young	-125	-75
4Q531	-30	20	realistic	-50	-1
4Q540	-40	30	too_young	-150	-100
4Q542	-50	20	too_young	-125	-75
4Q544	-200	-110	realistic	-150	-100
4Q545	-200	-100	realistic	-125	-75
4Q547	-200	-120	realistic	-125	-75
4Q554	-200	-50	realistic	-75	-25
4Q557	-200	-120	realistic	-150	-100
4Q577	-50	10	too_young	-125	-75
5-6Hev1b	-40	120	realistic	50	100
5-6Hev45	-40	120	too_old	134	134
5Q5	-200	-150	too_old	-25	25
6Q18	-160	120	realistic	-50	-1
11Q5	10	120	realistic	25	75
11Q6	30	120	realistic	-1	50
11Q7	30	120	realistic	-1	50
11Q8	-30	20	too_old	25	75
11Q13	-80	20	realistic	-50	-1
11Q14	-40	120	realistic	-1	50
11Q18	-40	120	realistic	-50	-1
11Q19	-40	-1	realistic	-25	25
11Q20	-90	120	realistic	-25	25
Mas1e	-50	30	realistic	-1	50
Mas1f	-200	-100	too_old	25	75
MasJosh	-30	20	realistic	-50	-1
Mur88	-50	120	realistic	25	75
Nash	-200	-110	realistic	-175	-125
Sdeir1	-40	120	realistic	75	125

Table 12: List of images for each manuscript sample used in the post-hoc evaluation

Q-number	Image-name	Q-number	Image-name
1QapGen	1QapGen_4_crpcln	4Q301	4Q301_2_processed
1QH^a	1QHa_QIrug-1668_cln…Lotte	4Q303	4Q303_processed
1QIsa^a	1QIsaa_col02_cleaned	4Q319	4Q319_1_crpcln
1QpHab	1QpHab_Brill2307_cleaned_MD	4Q325	4Q325_processed
1QS	1Qs_QIrug-2153_cln…Lotte	4Q373	4Q373_processed
2Q3	2Q3_processed	4Q375	4Q375_processed
2Q14	2Q14_1_processed	4Q379	4Q379_1_processed
2Q24	2Q24_processed	4Q390	4Q390_processed
3Q6	3Q6_processed	4Q391	4Q391_4_processed
4Q13	4Q13_processed	4Q394	4Q394_cleaned
4Q27	4Q27_processed_MDnew	4Q397	4Q397_1_processed
4Q28	4Q28_256-1_cleaned	4Q398	4Q398_1_processed
4Q38	4Q38_processed	4Q409	4Q409_processed
4Q38a	4Q38a_processed	4Q410	4Q410_processed
4Q53	4Q53_405_part2_cleaned	4Q422	4Q422_2_processed
4Q57	4Q57_363_part1_cleaned	4Q426	4Q426_processed
4Q58	4Q58_Brill0458_cleaned	4Q431	4Q431_processed
4Q73	4Q73_1112-1_cleaned	4Q432	4Q432_processed
4Q76	4Q76_cleaned	4Q434	4Q434_processed
4Q83	4Q83_1148_part1_cleaned	4Q436	4Q436_processed
4Q84	4Q84_3_processed	4Q437	4Q437_processed
4Q85	4Q85_2_processed	4Q439	4Q439_processed
4Q86	4Q86_processed	4Q442	4Q442_processed
4Q87	4Q87_processed	4Q448	4Q448_processed
4Q88	4Q88_3_processed	4Q457	4Q457_processed
4Q89	4Q89_processed_MDnew	4Q471a	4Q471a_processed
4Q90	4Q90_processed	4Q473	4Q473_processed
4Q91	4Q91_processed	4Q474	4Q474_processed
4Q92	4Q92_processed	4Q475	4Q475_processed
4Q93	4Q93_processed	4Q476	4Q476_processed
4Q94	4Q94_processed	4Q492	4Q492_processed
4Q95	4Q95_processed	4Q493	4Q493_processed
4Q96	4Q96_processed	4Q494	4Q494_processed
4Q97	4Q97_processed	4Q501	4Q501_processed
4Q98	4Q98_processed	4Q508	4Q508_processed
4Q98a	4Q98a_processed	4Q511	4Q511_2_processed
4Q98b	4Q98b_processed	4Q522	4Q522_cleaned_2
4Q98c	4Q98c_processed	4Q525	4Q525_2_processed
4Q98f	4Q98f_processed	4Q530	4Q530_processed
4Q98g	4Q98g_processed	4Q531	4Q531_processed
4Q109	4Q109_cleaned_MDnew	4Q540	4Q540_processed
4Q160	4Q160_137plate_cleaned	4Q542	4Q542_cleaned
4Q161	4Q161_583_part2_cleaned	4Q544	4Q544_cleaned
4Q163	4Q163_584_599_cleaned	4Q545	4Q545_processed
4Q166	4Q166_4_processed	4Q547	4Q547_processed
4Q167	4Q167_processed	4Q554	4Q554_cleaned_MDcrpcln1
4Q171	4Q171_2_processed	4Q557	4Q557_processed
4Q175	4Q175_cleaned	4Q577	4Q577_processed
4Q184	4Q184_287_cleaned	5-6Hev1b	5-6Hev1b_891_cleaned
4Q185	4Q185_160_part2_cleaned	5-6Hev45	5-6Hev45_part2_cleaned
4Q196	4Q196_cleaned	5Q5	5Q5_processed
4Q203	4Q203_906_cleaned	6Q18	6Q18_processed
4Q212	4Q212_227_cleaned	11Q5	11Q5_2_processed
4Q215	4Q215_processed	11Q6	11Q6_2_processed
4Q215a	4Q215a_processed	11Q7	11Q7_2_processed
4Q216	4Q216_cleaned_1	11Q8	11Q8_processed
4Q227	4Q227_processed	11Q13	11Q13_579_part2_cleaned
4Q252	4Q252_processed	11Q14	11Q14_cleaned_1
4Q256	4Q256_907_cleaned	11Q18	11Q18_processed
4Q258	4Q258_140_part1_cleaned	11Q19	11Q19_Brill2293_cleaned
4Q266	4Q266_cleaned_MDcrp1	11Q20	11Q20_5-new
4Q267	4Q267_2_processed	Mas1e	Mas1e_cleaned
4Q271	4Q271_processed	Mas1f	Mas1f_processed
4Q272	4Q272_processed	MasJosh	MasJosh_cleaned
4Q274	4Q274_processed	Mur88	Mur88_2_crp-cln-prcsd_cleaned
4Q276	4Q276_processed	Nash	Nash-MS-OR-…-line9to15
4Q277	4Q277_processed	Sdeir1	Sedir1_984_cleaned
4Q284a	4Q284a_processed

Table 13: List of twenty-five images of poor quality

Image-name	Image-name	Image-name
4Q540_processed	4Q53_405_part2_cleaned	4Q577_processed
4Q196_cleaned	4Q457_processed	4Q508_processed
4Q86_processed	4Q98_processed	4Q98f_processed
4Q492_processed	4Q448_processed	Mas1f_processed
4Q88_3_processed	4Q530_processed	4Q98c_processed
2Q14_1_processed	4Q98g_processed	4Q98b_processed
4Q284a_processed	5Q5_processed	4Q398_1_processed
4Q432_processed	4Q373_processed	4Q379_1_processed
6Q18_processed

Appendix H Comparative plots for different information sources

Appendix I List of images for different tests

Table 15: Complete list of 64 training images (including 4Q52; for the date prediction model) from the radiocarbon-dated manuscripts.

Q-numbers	Name	Plate	Fragment	Q-numbers	Name	Plate	Fragment
4Q2	4Q2_1	215	1,4	4Q255	4Q255_4Q433a_1	177	3R
	4Q2_2	215	3		4Q255_4Q433a_2	177	4R
4Q3	4Q3_1	393	5	4Q259	4Q259_1	683	6
4Q23	4Q23_1	271	1,4		4Q259_2	683	7
	4Q23_2	272	5, 6, 18, 7		4Q259_3	695	1,3
	4Q23_3	272	19			696	6
		401	3, 4, 5		4Q259_4	810	3R, 5R, 7R, 8R
4Q27	4Q27_1	1080	1, 2, 6	4Q267	4Q267_1	106	2,6,9,11
	4Q27_2	1081A	2		4Q267_2	107	2,9,12
	4Q27_3	1082	1	4Q375	4Q375_1	122A	1
	4Q27_4	1082	4		4Q375_2	122A	1,2
	4Q27_5	1084B	1,7,9	4Q416	4Q416_1	180	1,2
	4Q27_6	1086B	2, 8		4Q416_2	181	1
4Q30	4Q30_1	237	7		4Q416_3	181	1
	4Q30_2	238	1	4Q504	4Q504_1	421	3, 4,5
4Q47	4Q47_1	1092	1		4Q504_2	982	1
	4Q47_2	1092	3, 5		4Q504_3	982	2
4Q52	4Q52_1	42599			4Q504_4	982	2
	4Q52_3	206	1,3	4Q521	4Q521_1	330-1	1
4Q70	4Q70_1	1109	7, 11		4Q521_2	330-1	1
	4Q70_2	1110	2	4Q541	4Q541_1	147	1,19
	4Q70_3	1110	3	5_6Hev1b	5_6Hev1b_1	890	2
		1111	1	11Q5	11Q5_1	974	1
	4Q70_4	1111	3		11Q5_2	974	1
4Q114	4Q114_1	224	1		11Q5_3	975	1
4Q161	4Q161_1	583	2,3		11Q5_4	975	1
	4Q161_2	585	2,5		11Q5_5	976	1
4Q176	4Q176_1	285	1		11Q5_6	976	3
	4Q176_2	285	2		11Q5_7	977	2
4Q201	4Q201_1	821	2		11Q5_8	978	1
		904	1		11Q5_9	979	1
	4Q201_2	821	1	Mas1k	Mas1k_1	X232	1
4Q206	4Q206_1	358	1,6		Mas1k_2	X232	1
	4Q206_2	359	1,3	Xhev_Se2	Xhev_Se2_1	534	2

Table 16: List of 135 manuscripts used for making date predictions. Please note that one manuscript may contain several images in the test dataset.

Q-numbers	Q-numbers	Q-numbers	Q-numbers
1QapGen	4Q98	4Q301	4Q508
1QH^a	4Q98a	4Q303	4Q511
1QIsa^a	4Q98b	4Q319	4Q522
1QpHab	4Q98c	4Q325	4Q525
1QS	4Q98f	4Q373	4Q530
2Q3	4Q98g	4Q375	4Q531
2Q14	4Q109	4Q379	4Q540
2Q24	4Q160	4Q390	4Q542
3Q6	4Q161	4Q391	4Q544
4Q13	4Q163	4Q394	4Q545
4Q27	4Q166	4Q397	4Q547
4Q28	4Q167	4Q398	4Q554
4Q38	4Q171	4Q409	4Q557
4Q38a	4Q175	4Q410	4Q577
4Q53	4Q184	4Q422	5/6Hev1b
4Q57	4Q185	4Q426	5/6Hev45
4Q58	4Q196	4Q431	5Q5
4Q73	4Q203	4Q432	6Q18
4Q76	4Q212	4Q434	11Q5
4Q83	4Q215	4Q436	11Q6
4Q84	4Q215a	4Q437	11Q7
4Q85	4Q216	4Q439	11Q8
4Q86	4Q227	4Q442	11Q13
4Q87	4Q252	4Q448	11Q14
4Q88	4Q256	4Q457	11Q18
4Q89	4Q258	4Q471a	11Q19
4Q90	4Q266	4Q473	11Q20
4Q91	4Q267	4Q474	Mas1e
4Q92	4Q271	4Q475	Mas1f
4Q93	4Q272	4Q476	Mas1l
4Q94	4Q274	4Q492	Mur88
4Q95	4Q276	4Q493	Nash
4Q96	4Q277	4Q494	Sdeir1
4Q97	4Q284a	4Q501

Table 17: List of all 13 images that are split from training manuscripts and added to test.

Q-number	Number of images
4Q27	2
4Q161	2
4Q267	2
4Q375	1
5-6Hev1b	1
11Q5	5

Table 18: Complete list of 23 training images from a selection of previously ¹⁴C-tested manuscripts Bonani1992 ; Jull1995 .

Manuscript	Radiocarbon (BP)	Image IDs
Mas1l	2086,28	MasJosh.png
1QIsa^a	2141,32	1QIsaa_col01_cleaned.png
		1QIsaa_col02_cleaned.png
		1QIsaa_col03_cleaned.png
		1QIsaa_col34_cleaned.png
		1QIsaa_col35_cleaned.png
1QpHab	2054,22	QIrug-Qumran_extr09_2305_1QpHab_crpcln.png
		QIrug-Qumran_extr09_2306_1QpHab_crpcln.png
		QIrug-Qumran_extr09_2307_1QpHab_crpcln.png
		QIrug-Qumran_extr09_2308_1QpHab_crpcln.png
		QIrug-Qumran_extr09_2309_1QpHab_crpcln.png
		QIrug-Qumran_extr09_2310_1QpHab_crpcln.png
		QIrug-Qumran_extr09_2311_1QpHab_crpcln.png
11Q19	2030,40	QIrug-Qumran_extr09_2293_11Q19_crpcln.png
		QIrug-Qumran_extr09_2294_11Q19_crpcln.png
		QIrug-Qumran_extr09_2295_11Q19_crpcln.png
		QIrug-Qumran_extr09_2296_11Q19_crpcln.png
		QIrug-Qumran_extr09_2297_11Q19_crpcln.png
		QIrug-Qumran_extr09_2298_11Q19_crpcln.png
		QIrug-Qumran_extr09_2299_11Q19_crpcln.png
		QIrug-Qumran_extr09_2300_11Q19_crpcln.png
1QS	2041,68	QIrug-Qumran_extr09_2151_1Qs_1_crpcln.png
		QIrug-Qumran_extr09_2151_1Qs_2_crpcln.png

Table 19: Complete list of 30 images for date-bearing manuscripts from the fifth–fourth centuries BCE and the second century CE.

Manuscript	Date	Manuscript	Date
A6_11R	-411	IA06	-353
A6_12R	-411	IA17	-324
A6_13R	-411	IA21	-330
A6_14	-411	MareshaOstracon	-176
A6_15	-411	Mur24_1	133
A6_16	-411	Mur24_2	133
A6_3	-411	NS_A1r	-353
A6_4	-411	NS_A2r	-351
A6_5	-411	NS_A4r	-348
A6_7	-411	NS_A5r	-348
A6_8	-411	NS_A6r	-349
B3_1	-456	NS_C1r	-330
IA01	-348	NS_C4r	-324
IA03	-348	WDSP1_1	-335
IA04	-351	WDSP2	-352

Appendix J Radiocarbon sample information

Table 20: Information of physical samples for radiocarbon dating

IAA plate	Sample	Work	Plate-fragment	Info from IAA on DJD references, places in fragments where samples were taken, previous treatments from the 1950s onward, and sample weights
206	4Q52	4QSam^b	P206-Fr003	DJD 17: pl XXIV, fr 7; Bottom left; Maybe glues? Japanese Tissue Paper + Methylcellulose glue (2001)
421	4Q504	4QDibHam^a	P421-Fr004	DJD 7: pl LII, fr 7; Bottom; Maybe castor oil?
285	4Q176	4QTanh	P285-Fr002	DJD 5: pl XXII, fr 10; Upper left; Scotch tape; Rice paper + Perspex glue + trichlorethylene
224	4Q114	4QDan^c	P224-Fr001	DJD 16: pl XXXV, fr 3; Bottom right; Japanese Tissue Paper + Methylcellulose glue (2009)
385	4Q216	4QJub^a	P385-Fr011	DJD 13: pl I, fr 12ii; Left sheet: upper margin; Scotch tape; Rice paper + Perspex glue + trichlorethylene; Hinge of Japanese Tissue Paper + Methylcellulose glue (1992); Magen Broshi sampled the right sheet and the thread in 2003, therefore it was decided not to sample it again
801	4Q185	4QSapiential Work	P801-Fr003	Not in DJD 5. Strugnell, RevQ 7 (1970) p.257 pl I, fr h. Bottom right; Japanese Tissue Paper + Methylcellulose glue (2012); Fragment 1 is sewn and encapsulated for exhibition, therefore it was decided to sample fr 3 instead
577	11Q20	11QT^b	P577-Fr014	DJD 23: pl XLIII, fr 10b; Bottom left; Plate 608 is sewn and encapsulated for exhibition, therefore it was decided to sample plate 577 instead
64	Mur88	MurXII	P64-Fr001	DJD 2: pl LX, col X; Left sheet: bottom right; Rice paper + Perspex glue + trichlorethylene
891	5/6Hev1b	5/6HevPsalms	P891-Fr003	DJD 38: pl XXVII, fr 10; Bottom right
585	4Q161	4QpIsa^a	P585-Fr001	DJD 5: pl IV, fr 2; Middle right; Scotch tape? Rice paper + Perspex glue + trichlorethylene
206	4Q52	4QSam^b	P206-Fr003 (b)	DJD 17: pl XXIV, fr 7; Bottom left; Batch 1: additional material (4 mg)
285	4Q176	4QTanh	P285-Fr002 (b)	DJD 5: pl XXII, fr 10; Upper left; Batch 1: additional material (3mg)
224	4Q114	4QDan^c	P224-Fr001 (b)	DJD 16: pl XXXV, fr 3; Bottom right; Batch 1: additional material (c.1 mg)
385	4Q216	4QJub^a	P385-Fr011 (b)	DJD 13: pl I, fr 12ii; Left sheet: upper margin; Batch 1: additional material (4 mg)
577	11Q20	11QT^b	P577-Fr014 (b)	DJD 23: pl XLIII, fr 10b; Bottom left; Batch 1: additional material (4 mg)
64	Mur88	MurXII	P64-Fr001 (b)	DJD 2: pl LX, col X; NB IAA did not list this one in their Excel list so no additional information
891	5/6Hev1b	5/6HevPsalms	P891-Fr003 (b)	DJD 38: pl XXVII, fr 10; Bottom right; Batch 1: additional material (6 mg)
585	4Q161	4QpIsa^a	P585-Fr001 (b)	DJD 5: pl IX, fr 2; Middle right; Batch 1: additional material (4 mg)
1111	4Q70	4QJer^a	P1111-Fr010	DJD 15: pl XXIX, fr 37; Left margin, middle; IAA measurement: 7 mg
1093	4Q47	4QJosh^a	P1093-Fr005	DJD 14: pl XXXIV, fr 20; Up right diagonal margin; IAA measurement: 7 mg in two pieces
271	4Q23	4QLevNum^a	P271-Fr002	DJD 12: pl XXIII, fr 1; Bottom margin, middle; IAA measurement: 8.50 mg (one piece, broke down into 3 after weighing)
177	4Q255/ 4Q433a	4QpapS^a/ 4QpapHodayot- like Text B	P177 recto-Fr001	DJD 29: pl XV, fr. 1; Bottom margin, middle; IAA measurement: 6 mg in three pieces
977	11Q5	11QPs^a	P977-Fr004	DJD 4: pl III, fr A,B,C I; Middle-left, the sample was taken from the delaminated area adjacent to the Tetragrammaton and \<kwl¿; IAA measurement: 7 mg in two pieces
393	4Q3	4QGen^c	P393-Fr005	DJD 12: pl IX; Bottom margin, right side; IAA measurement: 8-9 mg in two pieces
1081A	4Q27	4QNum^b	P1081A-Fr002	DJD 12: pl XXXIX, fr 12; Lateral margin, bottom right; IAA measurement: 10 mg in one piece
x232	Mas1k	MasShirShabb	Px232-Fr001	Masada 6: ill 15; Bottom margin, right side; IAA measurement: 8 mg in two pieces
386	4Q206	4QEn^e ar	P386-Fr001	Milik, Books of Enoch: pl XX, fr b; Bottom, center-left, below last \<’¿; IAA measurements: 7 mg in two pieces
237	4Q30	4QDeut^c	P237-Fr007	DJD 14: pl V, fr 32; Bottom, center; IAA measurements: 8 mg in two pieces
904	4Q201/ 4Q338	4QEn^a ar/ 4QGenealogical List	P904-Fr009	DJD 36: pl I, fr 2; Bottom, right; IAA measurements: 7-8 mg in two pieces
810	4Q259	4QS^e	P810-Fr011	DJD 26: pl XV, fr 2b; Bottom; IAA measurements: 9 mg in two pieces
180	4Q416	4QInstruction^b	P180-Fr004	DJD 34: pl VI, fr 4; Bottom, left corner; IAA measurements: 8-9 mg in two pieces
215	4Q2	Gen^b	P215-Fr004	DJD 12: pl VI; Right blank margin, bottom left; IAA measurements: 9 mg in one piece
122A	4Q375	4QapocrMoses^a	P122A-Fr001	DJD 19: pl XIV, fr 1; Bottom left, middle of 2nd column; IAA measurements: 8-9 mg in one piece
534	XHev/Se2	XHev/Se Num^b	P534-Fr002	DJD 38: pl XXIX, fr 2; Bottom right corner; IAA measurements: 9 mg in two pieces
147	4Q541	4QapocrLevi^b	P147-Frag019	DJD 31: pl XIV, fr 24; Bottom left corner; IAA measurements: 8-9 mg in one piece
330	4Q521	4QMessianic Apocalypse	P330-Fr004	DJD 25: pl III, fr 10; Bottom left corner; IAA measurements: 8 mg in two pieces
107	4Q267	4QDamascus^b	P107-Fr010	DJD 18: pl XX, fr 9; Top left corner; IAA measurements: 7 mg in three pieces and some dust
879	Mur19	Mur pap WrDiv	P879-Fr001	DJD 2: pl XXX, fr 19; Top left corner; IAA measurements: 8 mg in one piece and some dust

Appendix K Data-sheet radiocarbon runs

The samples were dated by two different AMS machines characterised by codes GrA and GrM. For the GrA machine the $\delta$ ¹³C values of the IRMS are shown in the table; for GrM these are AMS values.

Table 21: Data-sheet ¹⁴C runs

fragment

KLR

samplenr

GrA

GrM

¹⁴a %

\sigma

\delta

¹³C ‰

<¹⁴a>

\sigma

age BP

\sigma

remarks

used

calibrated 1

\sigma

calibrated 2

\sigma

P206-Fr003

11089

65369

69793

75.69

0.39

-21.22

39.5

67017

10677

74.38

0.42

-21.20

67017

10678

75.05

0.43

-21.40

75.69

0.39

2237

1 GrA only

74.71

0.30

2342

2 GrM averaged

75.07

0.24

2303

3 averaged

401–369 BCE

407–356,

281–232 BCE

P421-Fr004

11090

65370

68446

76.20

0.32

-17.85

48.8

65370

68447

76.43

0.33

-18.30

46.3

65370

68446

76.68

0.31

65370

68447

76.19

0.31

76.38

0.16

2164

4 averaged

342–321,

201–175 BCE

352–287,

228–219,

211-151 BCE

P285-Fr002

11091

65371

69794

77.43

0.39

-22.19

44.2

65371

69794

77.06

0.36

65371

69794

76.20

0.43

67018

10679

75.68

0.42

-21.70

67018

10680

75.74

0.41

-22.10

76.95

0.23

2104

3 GrA averaged

75.71

0.29

2235

2 GrM averaged

76.49

0.18

2153

5 averaged

343–321,

202–166 BCE

351–304,

209–102,

67–60 BCE

P224-Fr001

11092

65372

69795

76.04

0.39

-20.50

42.7

65372

69795

76.39

0.36

65372

69795

76.14

0.43

76.21

0.23

2182

3 GrA averaged

P224-Fr001

69074

13252

76.49

0.44

-19.80

2nd run, cleaned;

69074

13253

76.73

0.39

-19.70

after soxhlet;

69074

13254

76.20

0.36

-20.30

no glue;

69074

13255

76.40

0.35

-21.00

no black spot.

76.44

0.19

2158

4 GrM averaged

76.34

0.15

2168

all 7 averaged

343–320,

202–176 BCE

352–287,

228–219,

211–160 BCE

P385-Fr011

11093

65373

69799

74.71

0.48

-21.61

46.0

74.71

0.48

2342

questionable ??

67020

10675

68.75

0.40

-21.70

67020

10676

69.24

0.38

-22.10

69.01

0.28

2979

2 GrM averaged

P801-Fr003

11094

65374

68448

77.17

0.33

-20.03

43.6

65374

68449

77.29

0.33

-20.27

46.2

65374

68448

76.93

0.31

65374

68449

77.43

0.31

77.20

0.16

2078

4 averaged

107–46 BCE

159–42,

7–5 BCE

P577-Fr014

11095

65357

69800

77.60

0.40

-20.93

44.1

65357

69800

77.71

0.36

65357

69800

77.79

0.44

67021

10681

76.51

0.43

-22.00

67021

10682

75.90

0.42

-21.00

77.15

0.18

2084

5 averaged

77.70

0.23

2027

3 GrA averaged

76.20

0.30

2183

2 GrM averaged

67021

18827

76.42

0.33

-20.70

67021

18828

75.61

0.35

-20.60

76.02

0.24

2202

2 new GrM aver

76.08

0.19

2195

4 GrM averaged

P64-Fr001

11096

65376

69806

78.22

0.39

-21.80

42.2

65376

69806

78.73

0.36

65376

69806

78.31

0.44

67022

10663

77.66

0.41

-23.00

77.66

0.41

2030

67022

10664

79.25

0.42

-21.60

79.25

0.42

1868

78.44

0.18

1950

5 averaged

78.44

0.23

1950

3 GrA averaged

78.43

0.29

1951

2 GrM averaged

67022

18829

76.90

0.33

-21.60

67022

18830

77.93

0.33

-20.90

77.44

0.24

2053

2 new GrM aver

77.83

0.18

2013

4 GrM averaged

P891-Fr003

11097

65377

69807

70.89

0.37

-21.23

41.8

questionable;

65377

69807

66.83

0.33

inhomogeneity?

65377

69807

72.95

0.43

reject GrA

67023

10659

78.07

0.38

-21.40

67023

10660

79.06

0.40

-21.20

78.54

0.28

1940

2 GrM averaged

30–42,

59–124 CE

10–204 CE

P585-Fr001

11098

65378

69810

77.96

0.38

-21.02

40.8

65378

69810

77.61

0.36

65378

69810

78.12

0.44

67024

10661

77.92

0.39

-22.80

67024

10662

76.95

0.38

-21.40

77.86

0.22

2010

3 GrA averaged

77.42

0.27

2055

2 GrM averaged

77.69

0.17

2028

5 averaged

45 BCE–8 CE

89–80 BCE,

54 BCE–27 CE,

48–57 CE

P1111-Fr010

11567

67025

11151

75.92

0.35

-19.20

67025

11152

75.37

0.30

-17.90

67025

11170

75.99

0.34

-18.80

67025

11171

76.00

0.33

-18.50

75.79

0.16

2226

4 averaged

362–351,

295–272,

266–209 BCE

375–346,

317–203 BCE

P1093-Fr005

11568

67026

11153

76.27

0.32

-22.00

67026

11154

76.41

0.32

-23.60

67026

11172

76.72

0.33

-21.60

67026

11181

77.00

0.36

-21.70

statistic failure

76.46

0.19

2155

3 averaged

343–320,

202–167 BCE

351–291,

209–104 BCE

P271-Fr002

11569

67027

11155

76.46

0.35

-24.60

67027

11156

76.52

0.32

-24.10

67027

11182

77.72

0.38

-22.80

statistic failure

67027

11183

77.79

0.38

-23.50

statistic failure

76.49

0.24

2152

2 averaged

346–316,

204–151,

129–123 BCE

351–289,

227–220,

210–97,

71–58 BCE

P177-Fr001

11570

67028

11166

77.10

0.33

-10.80

67028

11167

76.42

0.32

-10.10

67028

11184

77.70

0.38

-10.60

67028

11185

76.93

0.35

-10.60

76.99

0.17

2100

4 averaged

152–94,

74–56 BCE

167–51 BCE

P977-Fr004

11571

67029

11168

77.75

0.31

-21.80

67029

11169

78.25

0.33

-21.60

67029

11186

78.70

0.38

-21.00

67029

11187

78.91

0.49

-20.90

78.27

0.18

1967

4 averaged

23–78,

101–107 CE

31–16 BCE,

7–120 CE

P393-Fr005

11924

69725

14380

77.10

0.40

-19.00

69725

14381

76.79

0.40

-18.90

69725

14228

76.73

0.40

-19.10

69725

14229

76.45

0.40

-18.60

76.77

0.20

2123

4 averaged

174–102,

67–60 BCE

339–326,

199–89,

81–53 BCE

P1081a-Fr002

11925

69228

13385

76.69

0.36

-20.90

39.9

69228

13386

76.98

0.35

-21.20

39.7

69228

14170

77.90

0.27

-21.90

statistic failure

69228

14171

77.67

0.29

-22.20

statistic failure

76.84

0.25

2115

2 averaged

171–97,

72–57 BCE

336–330,

198–50 BCE

Px232-Fr001

11926

69229

13387

77.66

0.37

-20.70

40.3

69229

13388

77.70

0.36

-20.70

41.3

69229

14175

77.98

0.31

-22.40

69229

14223

78.20

0.39

-20.30

77.89

0.18

2007

4 averaged

41–9 BCE,

1 BCE–25 CE

46 BCE–62 CE

P386-Fr001

11927

69726

14382

76.52

0.40

-20.80

69726

14383

76.54

0.41

-21.10

69726

14230

76.12

0.40

-20.50

69726

14241

76.20

0.37

-21.30

76.34

0.20

2169

4 averaged

348–312,

206–171 BCE

356–281,

232–150,

131–121 BCE

P237-Fr007

11928

69727

14565

75.91

0.44

-20.70

69727

14566

76.47

0.44

-20.90

69727

14395

76.82

0.35

-19.90

69727

14242

75.61

0.38

-20.90

69727

14243

76.11

0.37

-20.00

69727

14384

77.38

0.39

-20.00

statistic failure

76.21

0.17

2182

5 averaged

351–295,

209–176 BCE

356–279,

256–248,

233–169 BCE

P904-Fr009

11929

69230

13389

76.31

0.39

-21.10

69230

13390

77.33

0.37

-21.10

69230

14173

77.37

0.30

-22.10

69230

14174

77.53

0.31

-22.20

69230

14172

77.82

0.29

-22.80

statistic failure

69230

77.21

0.17

2077

4 averaged

107–46 BCE

162–41,

9–2 BCE

P810-Fr011

11930

69728

14396

76.85

0.36

-18.70

69728

14397

76.64

0.36

-18.80

69728

14244

75.99

0.37

-19.20

69728

14245

76.63

0.38

-19.10

76.53

0.18

2148

4 averaged

343–322,

201–155 BCE

349–311,

206–100,

68–59 BCE

P180-Fr004

11931

69729

14398

76.60

0.38

-20.30

69729

14399

76.99

0.37

-21.10

69729

14246

76.55

0.39

-21.60

69729

14359

76.63

0.54

-20.30

76.71

0.20

2130

4 averaged

196–184,

179–104 BCE

343–321,

202–91,

79–54 BCE

P215-Fr004

11932

69730

14400

77.22

0.38

-20.40

69730

14401

77.69

0.36

-19.60

69730

14360

76.67

0.46

-19.90

69730

14361

77.74

0.42

-19.80

77.38

0.20

2059

4 averaged

97–71,

58–39,

11 BCE–2 CE

152–131 BCE,

121 BCE–10 CE

P122A-Fr001

11933

69731

14567

76.55

0.45

-20.80

69731

14568

76.20

0.44

-20.30

69731

14362

77.23

0.42

-19.00

69731

14363

76.93

0.43

-19.60

76.74

0.22

2126

4 averaged

193–189,

176–101,

67–60 BCE

342–323,

201–88,

82–53 BCE

P534-Fr002

11934

69231

13391

77.75

0.40

-19.80

69231

13392

78.14

0.36

-19.80

69231

14224

78.10

0.39

-19.40

69231

14225

77.88

0.40

-18.70

77.98

0.19

1998

4 averaged

38–13 BCE,

3–29,

43–59 CE

45 BCE–75 CE

P147-Fr019

11935

69732

14569

76.20

0.43

-18.80

69732

14570

76.55

0.43

-18.80

69732

14364

76.72

0.42

-18.20

69732

14365

76.64

0.42

-17.60

76.53

0.21

2148

4 averaged

343–320,

202–152 BCE

351–305,

209–95,

73–57 BCE

P330-Fr004

11936

69733

14571

76.26

0.44

-21.20

69733

14572

76.39

0.45

-20.90

69733

14377

76.00

0.38

-20.90

69733

14366

77.16

0.43

-20.80

76.43

0.21

2159

4 averaged

346–316,

204–165 BCE

353–286,

229–217,

211–104 BCE

P107-Fr010

11937

69232

13393

76.99

0.36

-20.40

47.6

69232

13394

76.62

0.39

-20.30

38.3

69232

14226

76.14

0.40

-19.60

69232

14227

76.16

0.40

-20.30

76.51

0.19

2151

4 averaged

344–320,

202–157 BCE

351–294,

209–98,

70–58 BCE

P879-Fr001

11938

69734

14573

77.80

0.44

69734

14574

78.15

0.44

69734

14378

78.11

0.40

69734

14379

78.23

0.39

78.08

0.21

1987

4 averaged

32–17 BCE,

7–64 CE

41–9 BCE,

1 BCE–81 CE,

98–110 CE

67806

11762

0.08000

0.030

-29.60

background age

67460

11306

78.24

0.10

-24.92

1971

Niederlande

Roman age

68084

11761

77.96

0.12

-27.31

2000

P224

69073

13256

74.25

0.54

-20.60

2390

P1081a

69093

13262

110.5

0.56

-25.71

modern age

Appendix L Worksheet of comparative data for 2 $\sigma$ calibrated ranges and traditional palaeographic estimates

Whole or partial overlap between 2 $\sigma$ calibrated ranges and palaeographic estimates in 17 of the 26 accepted samples: 4Q23, 4Q47, 4Q52, 4Q70, 4Q161, 4Q176, 4Q201/4Q338, 4Q255/4Q433a, 4Q259, 4Q504, 4Q521, 4Q541, 11Q5, Mas1k, Mur19, 5/6Hev1b, XHev/Se2 (see Appendix D.1.1).

1.

4Q23 (4QLevNum^a)
$\bullet$

355–285 BCE (29.8%), 230–220 BCE (0.8%), 210–95 BCE (62.8%), 75–55 BCE (2.1%)
$\bullet$

DJD 12:154 (Ulrich): early Hasmonaean formal script, dating from approximately the middle or latter half of the second century BCE (150–100 BCE).
2.

4Q47 (4QJosh^a)
$\bullet$

355–290 BCE (33.8%), 210–100 BCE (61.6%)
$\bullet$

DJD 14:143 (Ulrich): referring to Cross Hasmonaean formal bookhand, second half of the second century or the first half of the first century BCE (150-50 BCE).
$\bullet$

Puech, Revue Biblique 122/4 (2015), 482: hasmonéenne au mieux dans la première moitié du 1^er s. avant J.-C. (100–50 BCE).
3.

4Q52 (4QSam^b)
$\bullet$

410–355 BCE (78.9%), 285–230 BCE (16.6%)
$\bullet$

DJD 17:220 (Cross, Parry, and Saley) (ca. 250 BCE).
4.

4Q70 (4QJer^a))
$\bullet$

375–345 BCE (16.3%), 320–200 BCE (79.2%)
$\bullet$

DJD 15:150 (Tov): quoting Yardeni 1990 and Cross 1985 (Cross shifting between earlier and later dates to settle on an earlier date), the late third or early second century BCE (225–175 BCE).
5.

4Q161 (4QpIsa^a)
$\bullet$

90–80 BCE (1.7%), 55 BCE–30 CE (92.1%), 45–60 CE (1.7%)
$\bullet$

Strugnell 1970 groups this manuscript with other manuscripts such as 4Q166 and 4Q171 and gives a general indication of the script as developed rustic semiformal Herodian (see also DJD 19:112). Yardeni 2007 also lists this manuscript as part of those copied by the prolific scribe she identified and dates it to the late first century BCE to the beginning of the first century CE (30 BCE–20 CE).
6.

4Q176 (4QTanh)
$\bullet$

355–300 BCE (30.5%), 210–100 BCE (64.2%), 70–60 BCE (0.7%)
$\bullet$

Strugnell 1970:229 and Tigchelaar RevQ 2019; “middle Hasmonaean” (ca. 125–75 BCE).
7.

4Q201/4Q338 (4QEn^a ar/4QGenealogical List)
$\bullet$

165–40 BCE (93.6%), 10–1 BCE (1.9%)
$\bullet$

Milik 1976:140: first half of the second century BCE. Mixed evidence: archaic and connections with semicursive scripts of third and second centuries BCE, perhaps dependent upon the Aramaic scripts and scribal customs of northern Syria or Mesopotamia.
$\bullet$

Puech 2017:99: ca. 200 BCE.
$\bullet$

Langlois Le premier manuscrit du Livre d’Hénoch, 62–68: ca. 150 BCE.
$\bullet$

200–150 BCE
8.

4Q255/4Q433a (4QpapS^a/4QpapHodayot-like Text B)
$\bullet$

170-50 BCE (95.4%)
$\bullet$

DJD 26:8, 20, 24, 29 (Alexander/Vermes, following Cross): 125–100 BCE.
9.

4Q259 (4QS^e)
$\bullet$

350–310 BCE (24.3%), 210–100 BCE (69.7%), 70–55 BCE (1.4%)
$\bullet$

DJD 26:8, 20, 24, 133 (Alexander and Vermes, also referring to Cross): 50–25 BCE. Late Hasmonaean/Early Herodian semicursive, with mixed semicursive and semiformal features. But 4Q259 is difficult to date palaeographically. Suggestions range from 50–25 BCE (Cross), second half second century BCE, 150–100 BCE (Milik), to first half first century BCE, preferably shortly after 100 BCE, 100–75 BCE (Puech).
10.

4Q504 (4QDibHam^a)
$\bullet$

355–285 BCE (45.4%), 230–150 BCE (50.1%)
$\bullet$

DJD 7:137 (Baillet): “L’écriture est une calligraphie asmonéenne, qui peut dater des environs de 150 avant J.-C.” Cross: 175–150 BCE.
11.

4Q521 (4QMessianic Apocalypse)
$\bullet$

355–285 BCE (38.0%), 230–100 BCE (57.5%)
$\bullet$

DJD 25:3–5 (Puech): formal Hasmonaean script, following Cross; first quarter of the first century BCE (100–80 BCE).
12.

4Q541 (4QapocrLevi^b ar)
$\bullet$

355–300 BCE (24.6%), 210–95 BCE (68.2%), 75–55 BCE (2.7%)
$\bullet$

DJD 31:227 (Puech): Hasmonaean, to the end of the second century BCE, ca. 100 BCE; the writing is of the type of 1QS, 1QIsa^a, 4Q175, but posterior to 4Q504 (125–100 BCE).
13.

11Q5 (11QPs^a)
$\bullet$

35–15 BCE (3.3%), 5–120 CE (92.2%)
$\bullet$

DJD 4:6–9 (Sanders): first half of the first century CE (1–50 CE).
14.

Mas1k (ShirShabb)
$\bullet$

50 BCE–65 CE (95.4%)
$\bullet$

Masada 6:120 (Newsom and Yadin; Newsom HSS 27:168): developed Herodian formal hand, late Herodian formal hand, ca. 50 CE. Also: DJD 11:239.
15.

Mur19 (pap WrDiv)
$\bullet$

45 BCE–85 CE (91.5%), 95–110 CE (3.9%)
$\bullet$

Cursive script with internal date of 71/72 CE validates radiocarbon date. The text refers to “year 6 of Masada”. See Section B.4 in Appendix B.
16.

5/6Hev1b (Ps)
$\bullet$

10–205 CE (95.4%)
$\bullet$

DJD 38:143: late Herodian, understood as 50-68 CE. Cross: 75–100 CE.
17.

XHev/Se2 (XHev/Se Num^a)
$\bullet$

45 BCE–75 CE (95.4%)
$\bullet$

DJD 38:174 (Flint): late Herodian, 50–68 CE.

Older 2 $\sigma$ calibrated ranges in 9 of the 26 accepted samples: 4Q2, 4Q3, 4Q27, 4Q30, 4Q114, 4Q206, 4Q267, 4Q375, 4Q416 (see Appendix D.1.2).

1.

4Q2 (Gen^b)
$\bullet$

155–130 BCE (5.2%), 125 BCE–10 CE (90.3%)
$\bullet$

DJD 12:31 (Davila): late Herodian or even post-Herodian formal hand, ca. 50–68+ CE.
2.

4Q3 (4QGen^c)
$\bullet$

340–325 BCE (3.5%), 200–50 BCE (92.0%)
$\bullet$

DJD 12:39 (Davila): Herodian formal hand, dating from the middle to the end of that period, ca. 20–68 CE.
3.

4Q27 (4QNum^b)
$\bullet$

340–330 BCE (1.3%), 200–50 BCE (94.2%)
$\bullet$

DJD 12:211 (Jastram): following Cross, early Herodian semiformal, 30 BCE–20 CE, earlier in that range.
4.

4Q30 (4QDeut^c)
$\bullet$

360–275 BCE (57.4%), 260–245 BCE (1.4%), 235–165 BCE (36.7%)
$\bullet$

DJD 14:15 (White Crawford): following Cross, typical Hasmonaean book hand, 150–100 BCE. But Cross 2003 gives a more narrow date of 125–100 BCE.
5.

4Q114 (4QDan^c)
$\bullet$

355–285 BCE (49.5%), 230–160 BCE (45.9%)
$\bullet$

DJD 16:270 (Ulrich, following Cross): late second century BCE, no more than about a half century younger than the autograph, 125–100 BCE.
6.

4Q206 (4QEn^e ar)
$\bullet$

360–280 BCE (48.6%), 235–145 BCE (45.8%), 135–120 BCE (1.1%)
$\bullet$

Milik 1976:225: Hasmonaean, probably first half first century BCE, also referring to Cross 1961: p. 138, fig. 2, lines 2 (4Q30) and 3 (4Q51) and p. 149, fig. 4, lines 2 (4Q114) and 4 (4Q398), 100–50 BCE.
7.

4Q267 (4QDamascus^b)
$\bullet$

355–290 BCE (28.6%), 210–95 BCE (65.3%), 70–55 BCE (1.6%)
$\bullet$

DJD 18:1, 96 (Yardeni): formal early Herodian, Cross’s round semiformal; connects this manuscript with 4Q397 as possibly same scribe, 30 BCE–20 CE.
8.

4Q375 (4QapocrMoses^a)
$\bullet$

345–320 BCE (6.0%), 205–50 BCE (89.5%)
$\bullet$

DJD 19:112 (Strugnell): early Herodian, rustic semiformal, 30 BCE–20 CE. Compare with 4Q27, 4Q161, both radiocarbon and palaeography.
9.

4Q416 (4QInstruction^b)
$\bullet$

345–320 BCE (8.0%), 205–90 BCE (78.1%), 80–50 BCE (9.4%)
$\bullet$

DJD 34:74–76 (Strugnell and Harrington): Herodian, between 4Q51 and 1QM, hence “in a date transitional between the late Hasmonaean and the earliest Herodian hands”, and Strugnell judged the hand of 4Q416 to be earlier than the hands of 4Q415, 4Q417, and 4Q418 by some twenty-five years (76), 50–25 BCE.

Dating ancient manuscripts using radiocarbon and AI-based writing style analysis

Abstract

keywords:

1 Radiocarbon dating

2 Integration of multiple dating methods

2.1 Deep neural networks for detection of handwritten ink-trace patterns

2.2 Extracting features for style attribution

2.3 Bayesian ridge regression

2.4 Testing Enoch

3 14C dates and palaeographic estimates

4 Validation of Enoch

5 Harvest of Enoch’s date predictions for previously undated manuscripts

6 Discussion and conclusions

6.1 Aramaic/Hebrew script development in ancient Judaea

6.2 The Enoch approach to dating ancient manuscripts

7 Online content

8 Supplementary information

9 Acknowledgments

Declarations

References

Appendix A The dating problem of the Dead Sea Scrolls

A.1 Too few date-bearing manuscripts to compare with

A.2 Weak workarounds

A.2.1 Not so absolute time markers

A.2.2 Unsubstantiated palaeographic and historical premises

A.3 The way out of the gap

Appendix B Radiocarbon dating of the Dead Sea Scrolls

B.1 Selection of Samples

B.2 Soxhlet Treatment and AAA Pretreatment

B.3 AMS Measurements

B.4 AMS Dating Results

B.5 Result not to be used for palaeography: 4Q185

B.6 Technically rejected results: 4Q216, 11Q20, and Mur88

B.7 Analytical Chemistry

B.7.1 Soxhlet extraction

B.7.2 Raman spectroscopy, optical microscopy, Py-GC/MS, and HPLC-MS analysis

B.7.3 Results of the optical microscopy and Raman spectroscopy analyses performed on 17 samples

B.7.4 Results of the Py-GC/MS analysis performed on 17 samples

B.7.5 Liquid chromatography-mass spectrometry results of the analysis of residual lipids in the extracts from the 30 samples after cleaning

Appendix C OxCal plots: 14C determinations and calibrated date plots

Appendix D Palaeography and radiocarbon dating of the Dead Sea Scrolls

D.1 Comparing radiocarbon results and palaeographic estimates

D.1.1 Whole or partial overlap

D.1.2 No overlap

D.1.3 Concluding the comparison between radiocarbon results and palaeographic estimates

D.2 Combining palaeography and radiocarbon data to train the artificial intelligence-based date-prediction model

Appendix E Artificial intelligence (AI) in dating the scrolls

E.1 Data preparation

E.1.1 Preprocessing: binarization, alignment and arrangement correction

E.2 Data augmentation

E.3 Allographic codebook with neural networks

E.4 Textural-level features

E.5 Adjoined feature

E.6 Date-prediction model

E.6.1 Unmodelled values from OxCal

E.6.2 Calibrated dates from 2-sigma ranges

E.6.3 Linear regression

E.6.4 Ridge regression

E.6.5 Bayesian regression

E.6.6 Hyperparameter selection

E.7 Data balancing

E.7.1 Balance using weights

E.8 Balance using augmentation

E.9 Training options

E.9.1 Leave-one-out statistical test

E.9.2 Gaussian of Gaussian

Appendix F On the use of pre-trained deep learning methods for image-based dating

F.1 Considerations on the use of training deep learning neural networks on a problem with only 24 examples

F.2 An attempt in using state-of-the-art deep-learning methods, PNASNet

Appendix G Enoch’s date predictions for 135 previously undated manuscripts

G.1 On the physical and image quality of the data

G.2 How to read a prediction plot

G.3 On shared characteristics and finding matches elsewhere

G.4 Palaeographic post-hoc evaluation

G.5 6 July 2021 test

Appendix H Comparative plots for different information sources

Appendix I List of images for different tests

Appendix J Radiocarbon sample information

Appendix K Data-sheet radiocarbon runs

Appendix L Worksheet of comparative data for 2σ𝜎\sigmaitalic_σ calibrated ranges and traditional palaeographic estimates

3 ¹⁴C dates and palaeographic estimates

Appendix C OxCal plots: ¹⁴C determinations and calibrated date plots

Appendix L Worksheet of comparative data for 2 $\sigma$ calibrated ranges and traditional palaeographic estimates