Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.

There is a newer version of the record available.

Published May 31, 2024 | Version v5
Software Open

Code and data (feature files, OxCal data) for dating ancient manuscripts using radiocarbon and AI-based writing style analysis

  • 1. University of Groningen

Description

The dataset is associated with the following article:
Title: Dating ancient manuscripts using radiocarbon and AI-based writing style analysis
Authors: Mladen Popović, Maruf A. Dhali, Lambert Schomaker, Johannes van der Plicht, Kaare Lund Rasmussen, Jacopo La Nasa, Ilaria Degano, Maria Perla Colombini, and Eibert Tigchelaar
(Under review)

This data set is collected for the ERC project:
The Hands that Wrote the Bible: Digital Palaeography and Scribal Culture of the Dead Sea Scrolls
PI: Mladen Popović
Grant agreement ID: 640497
Project website: https://cordis.europa.eu/project/id/640497

 

Copyright (c)     University of Groningen, 2024. All rights reserved.
Disclaimer and copyright notice for all data contained on the *.tar.gz file:

1) permission is hereby granted to use the code for research purposes. It is not allowed to distribute this code for commercial purposes.

2) provider gives no express or implied warranty of any kind, and any implied warranties of merchantability and fitness for purpose are disclaimed.

3) provider shall not be liable for any direct, indirect, special, incidental, or consequential damages arising out of any use of this code.

4) the user should refer to the first public article mentioned above in this code.

5) the recipient should refrain from proliferating the code to third parties external to his/her local research group. Please refer interested researchers to this site to obtain their own copy.

 

Updates:
(Update on 31 May 2024: The code for OxCal data, accepted range, and the Heaviside distribution are arranged in separate directories. All the codes and README files are updated with additional comments. The file organisation is included in README. The HELP option (python3 main.py --h) for the Enoch model now prints all options clearly, including the use (or exclusion) of minor peaks. 
Please use the files from the latest version and disregard the previous four versions: 0.5281/zenodo.10998860, 10.5281/zenodo.10629569, 10.5281/zenodo.8195917, and 10.5281/zenodo.8168930. )

 

Overview:
This is the code for training and testing Enoch, the Bayesian regression-based date prediction model for the Dead Sea Scrolls data. The model incorporates radiocarbon data and palaeographic input to produce a probabilistic date prediction for each test manuscript. 

Organization of the files:
The *.tar.gz file contains three directories: "OxCal", "OxCal-Heaviside-accept", and "Dating-model". 

- OxCal: This directory contains the following files:
                      - The "oxcal.txt" file contains the codes for obtaining radiocarbon data from the OxCal tool (https://c14.arch.ox.ac.uk/oxcal.html). The raw data from OxCal can be found in ./C14-Oxcal/original-30/.

- OxCal-Heaviside-accept: This directory contains the following files:
                      - The "c14-accepted-range.csv" file contains the accepted 2-sigma ranges without minor peaks (min and max) for each radiocarbon sample that is used in the Enoch model.
                      - The "c14-accepted-range-w-minor-peaks.csv" file contains the accepted 2-sigma ranges with minor peaks (min and max) for each radiocarbon sample that is used in the Enoch model.
                      - The "do_accept_range_heaviside.py" file automatically generates *.csv files of the accepted ranges (reads min-max from files c14-accepted*.csv) after Heaviside distribution and saves the accepted probability ranges into new files inside ./C14-Oxcal-data/accepted-range/ or ./C14-Oxcal-data/accepted-range-w-minor-peaks/.

- Dating-model: contains the "main.py" file that runs Enoch. All the dependencies and utility files are included as well. The README-FINAL.md explains how to run the Python code. The feature files and training labels are included in the data directory.

Please refer to the original article for more details.
The image data and labels are also available at: https://doi.org/10.5281/zenodo.10998958

System requirements:
Hardware requirements:

Enoch requires a standard computer with enough RAM (preferably more than 8GBs) to support the in-memory operations.

Software requirements:

OS Requirements: The package is tested on macOS (Sonoma 14.4.1) and Linux (Ubuntu 20.04.6 LTS)
Python dependencies: (included as requirements_updated.txt file inside the *.tar.gz file)
joblib==1.0.1
matplotlib==3.4.2
numpy==1.20.3
pandas==1.2.4
scikit_learn==0.24.2
scipy==1.6.3
tqdm==4.61.0
Installation:
pip install -r requirements_updated.txt

Additional tools:
Binarization:
The images are already binarized and preprocessed. For additional user data or personal use, the BiNet tool is available for scientific use upon request (m.a.dhal(at)rug.nl).
- Dhali, M. A., de Wit, J. W., & Schomaker, L. (2019). Binet: Degraded manuscript binarization in diverse document textures and layouts using deep encoder-decoder networks. arXiv preprint arXiv:1911.07930.

Image Morphing:
In the original article, data augmentation was performed using image morphing. The tool is available on GitHub:
https://github.com/GrHound/imagemorph.c

Features for writer identification:
Lambert Schomaker
http://www.ai.rug.nl/~lambert/allographic-fraglet-codebooks/allographic-fraglet-codebooks.html
http://www.ai.rug.nl/~lambert/hinge/hinge-transform.html
L. Schomaker & M. Bulacu (2004). Automatic writer identification using connected-component contours and edge-based features of the upper-case Western script. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 26(6), June 2004, pp. 787 - 798.
Bulacu, M. & Schomaker, L.R.B. (2007). Text-independent Writer Identification and Verification Using Textural and Allographic Features,  IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Special Issue - Biometrics: Progress and Directions, April, 29(4), p. 701-717.

If you have any questions, please get in touch with us:
Maruf A. Dhali <m.a.dhali(at)rug.nl>
Lambert Schomaker <l.r.b.schomaker(at)rug.nl>

Dateien

Files (15.6 MB)

Name Size Download all
md5:3d8f730496a76a76e63d9bab6e7c17d0
15.6 MB Download

Additional details

Finanzierung

HandsandBible – The Hands that Wrote the Bible: Digital Palaeography and Scribal Culture of the Dead Sea Scrolls 640497
European Commission