From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts

James B. Harr III; Kelly A. Meiklejohn; Madelin E. Blong; Tessa Gadomski; William E. Gundling Jr

arxiv: 2606.06889 · v1 · pith:EDKOE6YAnew · submitted 2026-06-05 · 🧬 q-bio.GN

From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts

James B. Harr III , Madelin E. Blong , Tessa Gadomski , Kelly A. Meiklejohn , William E. Gundling Jr This is my paper

Pith reviewed 2026-06-27 20:27 UTC · model grok-4.3

classification 🧬 q-bio.GN

keywords biocodicologypalimpsest detectionmitochondrial genomesneural networksmachine learningparchment DNAmedieval manuscriptsDNA sequencing

0 comments

The pith

Mitochondrial genome sequencing data supports neural network classification of palimpsested versus single-use parchment folios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether DNA preserved in medieval parchment can be used to detect reused pages after chemical washing for new writing. Researchers non-destructively sampled and sequenced mitochondrial genomes from both single-use and palimpsested folios in a 14th-century manuscript, finding that both retained sufficient DNA with no significant differences in coverage or depth. They applied logistic regression and neural network classifiers to features from this sequencing data to separate the two classes. The models delivered high precision but lower recall for the minority palimpsest class, which the authors attribute to limited sample numbers. This demonstrates a computational approach to biocodicology that could help identify reused parchment without further damage to the artifact.

Core claim

The authors establish that mtGenome sequencing data from a 14th-century manuscript supports the use of machine learning classifiers, including neural networks, to distinguish palimpsested folios from single-use ones, even though genome coverage and depth show no significant differences between the two. While precision is high, recall for palimpsests is reduced due to dataset imbalance, and more samples are needed.

What carries the argument

Neural network and logistic regression classifiers trained on features from mitochondrial genome sequencing of parchment DNA to classify folios as palimpsested or single-use.

If this is right

Palimpsest preparation does not significantly compromise the integrity of mtGenomes for sequencing analysis.
Machine learning on ancient DNA data can assist non-destructive identification of reused parchment.
Integration of molecular biology and neural networks provides new computational tools for manuscript studies.
Dataset imbalance limits recall for the minority palimpsest class and requires more samples to address.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Larger datasets of palimpsest mtGenomes could make the classifiers practical for routine use in cataloging collections.
The same DNA features might reveal other forms of alteration or reuse in historical documents beyond this single manuscript.
Non-destructive DNA sampling combined with classifiers could scale to study provenance patterns across many medieval texts.

Load-bearing premise

That mtGenome sequencing data contains discriminative features allowing reliable distinction between palimpsest and single-use folios even without significant differences in coverage or depth and with few palimpsest samples available.

What would settle it

Sequencing additional confirmed palimpsest samples, retraining the classifiers, and measuring whether recall improves substantially while precision remains high.

read the original abstract

Biocodicology, the study of biological information preserved in manuscripts, offers new opportunities to examine parchment as both a textual and biological artefact. This study applies non-destructive sampling to isolate and sequence mitochondrial genomes (mtGenomes) from a 14th-century manuscript, Ms. Codex 1629, which contains both single-use and palimpsested folios. We sought to evaluate whether palimpsest preparation, including chemical washing, compromised DNA integrity and whether computational methods could aid in identifying reused parchment. DNA sequencing revealed that both single-use and palimpsested parchments retained sufficient mtGenomes for analysis, with no significant differences in genome coverage or depth. To assess the potential of computational biology in manuscript studies, we implemented machine learning classifiers, including logistic regression and neural networks, to distinguish palimpsests from single-use folios. Models achieved high precision but exhibited reduced recall for the minority palimpsest class, reflecting dataset imbalance. While additional ancient mtGenome samples from palimpsest are required and further testing is needed, this study demonstrates how integrating molecular biology and neural networks highlights new approaches for palimpsest detection and underscores the evolving role of data science in biocodicology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper measures mtGenome preservation across one manuscript's folios and applies standard classifiers, but the discrimination results are difficult to interpret given the reported lack of differences in coverage and depth.

read the letter

The core finding is that mtDNA can be recovered from both single-use and palimpsested parchment in Ms. Codex 1629 without obvious loss in coverage or depth, and that off-the-shelf logistic regression and neural networks can be trained on the resulting data. That empirical observation on DNA survival after washing is the clearest new piece.

The work applies non-destructive sampling and sequencing to a real historical object, then tests whether the sequence data can flag reused folios. The authors correctly note the class imbalance and the drop in recall for the palimpsest class, and they flag the need for more samples.

The main weakness is that the abstract states there are no significant differences in the most straightforward metrics yet still reports high precision from the models. No other features are described, no performance numbers or error bars appear, and the full methods are not supplied. Without knowing what the classifiers actually used or seeing evidence that the input representation carries a biological signal rather than noise or imbalance artifacts, the classification claim stays hard to assess. The small palimpsest sample size compounds this.

This is a pilot for people working at the intersection of biocodicology and computational methods. It is not yet strong enough on its own for a methods journal, but the direction is worth exploring. A serious referee could push for feature details, proper cross-validation, and larger sample reporting, so the paper deserves review rather than an immediate desk reject.

Referee Report

2 major / 1 minor

Summary. The paper claims that non-destructive mtGenome sequencing of a 14th-century manuscript (Ms. Codex 1629) recovers sufficient DNA from both single-use and palimpsested folios, with no significant differences in coverage or depth, and that logistic regression and neural network classifiers applied to this sequencing data can distinguish palimpsests with high precision (though reduced recall due to class imbalance).

Significance. If the classifiers can be shown to exploit genuine biological signal rather than artifacts, the work would creditably pioneer an interdisciplinary biocodicology-ML pipeline for non-invasive palimpsest detection. At present the absence of any reported discriminative features beyond the explicitly non-significant coverage/depth metrics, combined with missing sample sizes and performance numbers, prevents assessment of whether the central empirical claim holds.

major comments (2)

[Abstract] Abstract: the statement that 'DNA sequencing revealed ... no significant differences in genome coverage or depth' directly conflicts with the subsequent claim that the same mtGenome data supports neural-network classification of palimpsests; no alternative features (variant profiles, mapping statistics, base-quality distributions, etc.) are described that could explain class separation.
[Abstract] Abstract: claims of 'high precision' and 'reduced recall for the minority palimpsest class' are presented without any numerical values, sample sizes (N single-use vs. N palimpsest), confusion matrices, or statistical tests, rendering the performance assertions unverifiable and the imbalance caveat unquantified.

minor comments (1)

The manuscript should include a dedicated Methods or Results subsection specifying the exact input representation fed to the logistic regression and neural-network models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below and have revised the abstract to improve clarity and verifiability.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that 'DNA sequencing revealed ... no significant differences in genome coverage or depth' directly conflicts with the subsequent claim that the same mtGenome data supports neural-network classification of palimpsests; no alternative features (variant profiles, mapping statistics, base-quality distributions, etc.) are described that could explain class separation.

Authors: We acknowledge the potential for misinterpretation in the abstract. The classifiers operate on multiple features extracted from the mtGenome sequencing reads (variant profiles, mapping statistics, and base-quality distributions), which are described in the Methods and Results sections of the full manuscript; coverage and depth were reported separately to establish that palimpsest preparation does not destroy recoverable DNA. To eliminate ambiguity we have revised the abstract to explicitly note the additional discriminative features used by the models. revision: yes
Referee: [Abstract] Abstract: claims of 'high precision' and 'reduced recall for the minority palimpsest class' are presented without any numerical values, sample sizes (N single-use vs. N palimpsest), confusion matrices, or statistical tests, rendering the performance assertions unverifiable and the imbalance caveat unquantified.

Authors: We agree that the abstract should contain the quantitative details already present in the main text. We have revised the abstract to report the sample sizes, precision and recall values, and to reference the confusion matrix and statistical tests shown in the Results section and supplementary materials. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical sequencing + standard ML classification

full rationale

The paper reports mtGenome sequencing results from a single manuscript and applies off-the-shelf classifiers (logistic regression, neural networks) to the resulting coverage/depth and related metrics. No equations, derivations, first-principles claims, or parameter-fitting steps that are later relabeled as predictions appear anywhere in the text. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results occurs. The reported precision/recall figures are direct empirical outputs on the collected samples; the authors themselves note the small palimpsest sample size and resulting imbalance, confirming the work remains an open experimental report rather than a closed definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.1-grok · 5774 in / 1248 out tokens · 25927 ms · 2026-06-27T20:27:25.857516+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 8 canonical work pages

[1]

Harr III, 2 Madelin E

From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts 1 James B. Harr III, 2 Madelin E. Blong, 3 Tessa Gadomski, 4 Kelly A. Meiklejohn, 5 William E. Gundling Jr. 1 Assistant Teaching Professor, College of William and Mary, Williamsburg, Virginia USA 2 Molecular Laboratory Technician, Naveris Inc., Durham,...

2024
[2]

Neural network applications were performed in Python using Google Colab, which enabled flexible experimentation with classification, oversampling techniques (SMOTE), and dimensionality reduction (PCA). While models showed strong precision, palimpsests remained harder to detect due to class imbalance (largely due to the limited availability of palimpsest s...

2024
[3]

Mitochondrial genome (mtGenome) enrichment and sequencing mtGenome enrichment and sequencing were performed following the protocol from Scheible et al

was prepared with a maximum of 20 µl of any library added to the pool, and (E) pooled libraries were purified using a 1.8 X bead ratio of KAPA Pure Beads (Roche) and concentrated to 25 µl. Mitochondrial genome (mtGenome) enrichment and sequencing mtGenome enrichment and sequencing were performed following the protocol from Scheible et al. (2024) with the ...

2024
[4]

(2024) for assessing the quality of the resulting mtGenome sequences: (1) percentage of the mtGenome covered, and (2) mean read depth across the mtGenome

We used two of the metrics outlined in Scheible et al. (2024) for assessing the quality of the resulting mtGenome sequences: (1) percentage of the mtGenome covered, and (2) mean read depth across the mtGenome. Table

2024
[5]

The threshold specified by Scheible et al

15,925 75,141 96 521 ± 172 There were no significant differences between the palimpsested parchments and the single-use parchments in either the coverage of the mtGenome (size is 16,616 bp) or mean read depth (p = 0.0799 and p = 0.8397, respectively) ( Table 1 ). The threshold specified by Scheible et al. (2024) for genome coverage was 90%. When applying ...

2024
[6]

Violin plots were generated to compare the parchment type and the percent of mtGenome covered (A) and the mean read depth (B)

Violin Plots Comparing Parchment Type to Sequencing Metrics. Violin plots were generated to compare the parchment type and the percent of mtGenome covered (A) and the mean read depth (B). A violin plot is used to visualize data distribution with width indicating data density. The horizontal dashed lines on the plots indicate the thresholds for each metric...

2024
[7]

Table 3: Reclassification results following augmentation (538 single-use vs

A caveat, however, is that without more independent palimpsest samples, these results, while encouraging, remain largely inconclusive. Table 3: Reclassification results following augmentation (538 single-use vs. 538 palimpsest sample segments). Results were excellent with a balanced accuracy of 0.989. Precision Recall F1-Score Palimpsest (n=538) 0.978 1.0...

2024
[8]

M., Teasdale, M

Cassidy, L. M., Teasdale, M. D., Carolan, S., Enright, R., Werner, R., Bradley, D. G., Finlay, E. K., & Mattiangeli, V. (2017). Capturing goats: Documenting two hundred years of mitochondrial DNA diversity among goat populations from Britain and Ireland. Biology Letters, 13(3), 20160876. https://doi.org/10.1098/rsbl.2016.0876

work page doi:10.1098/rsbl.2016.0876 2017
[9]

https://find.library.upenn.edu/catalog/9958752123503681?hld_id=22418714950003681

Kislak Center for Special Collections, University of Pennsylvania. https://find.library.upenn.edu/catalog/9958752123503681?hld_id=22418714950003681

work page arXiv
[10]

G., et al

Daly, K. G., et al. (2018). Ancient goat genomes reveal mosaic domestication in the Fertile Crescent. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1746), 20160429. https://doi.org/10.1098/rstb.2016.0429

work page doi:10.1098/rstb.2016.0429 2018
[11]

Fiddyment, S., et al. (2015). Animal origin of 13th-century uterine vellum revealed using non-invasive peptide fingerprinting. PNAS, 112(49), 15066–15071. https://doi.org/10.1073/pnas.1512264112

work page doi:10.1073/pnas.1512264112 2015
[12]

Five ovine mitochondrial lineages identified from sheep breeds of the near East

Meadows JR, Cemal I, Karaca O, Gootwine E, Kijas JW. Five ovine mitochondrial lineages identified from sheep breeds of the near East. Genetics. 2007 Mar;175(3):1371-9. doi: 10.1534/genetics.106.068353. Epub 2006 Dec

work page doi:10.1534/genetics.106.068353 2007
[13]

Large-scale mitochondrial DNA analysis of the domestic goat reveals six haplogroups with high diversity

Naderi S, Rezaei HR, Taberlet P, Zundel S, Rafat SA, Naghash HR, el-Barody MA, Ertugrul O, Pompanon F; Econogene Consortium. Large-scale mitochondrial DNA analysis of the domestic goat reveals six haplogroups with high diversity. PLoS One. 2007 Oct 10;2(10):e1012. doi: 10.1371/journal.pone.0001012. PMID: 17925860; PMCID: PMC1995761

work page doi:10.1371/journal.pone.0001012 2007
[14]

T., & Kelso, J

Renaud, G., Slon, V., Duggan, A. T., & Kelso, J. (2015). Schmutzi: Estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biology, 16,

2015
[15]

https://doi.org/10.1186/s13059-015-0776-0

work page doi:10.1186/s13059-015-0776-0
[16]

PLoS One

The development of non-destructive sampling methods of parchment skins for genetic species identification. PLoS One . 5 19(3):e0299524. https://doi.org/10.1371/journal.pone.0299524

work page doi:10.1371/journal.pone.0299524

[1] [1]

Harr III, 2 Madelin E

From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts 1 James B. Harr III, 2 Madelin E. Blong, 3 Tessa Gadomski, 4 Kelly A. Meiklejohn, 5 William E. Gundling Jr. 1 Assistant Teaching Professor, College of William and Mary, Williamsburg, Virginia USA 2 Molecular Laboratory Technician, Naveris Inc., Durham,...

2024

[2] [2]

Neural network applications were performed in Python using Google Colab, which enabled flexible experimentation with classification, oversampling techniques (SMOTE), and dimensionality reduction (PCA). While models showed strong precision, palimpsests remained harder to detect due to class imbalance (largely due to the limited availability of palimpsest s...

2024

[3] [3]

Mitochondrial genome (mtGenome) enrichment and sequencing mtGenome enrichment and sequencing were performed following the protocol from Scheible et al

was prepared with a maximum of 20 µl of any library added to the pool, and (E) pooled libraries were purified using a 1.8 X bead ratio of KAPA Pure Beads (Roche) and concentrated to 25 µl. Mitochondrial genome (mtGenome) enrichment and sequencing mtGenome enrichment and sequencing were performed following the protocol from Scheible et al. (2024) with the ...

2024

[4] [4]

(2024) for assessing the quality of the resulting mtGenome sequences: (1) percentage of the mtGenome covered, and (2) mean read depth across the mtGenome

We used two of the metrics outlined in Scheible et al. (2024) for assessing the quality of the resulting mtGenome sequences: (1) percentage of the mtGenome covered, and (2) mean read depth across the mtGenome. Table

2024

[5] [5]

The threshold specified by Scheible et al

15,925 75,141 96 521 ± 172 There were no significant differences between the palimpsested parchments and the single-use parchments in either the coverage of the mtGenome (size is 16,616 bp) or mean read depth (p = 0.0799 and p = 0.8397, respectively) ( Table 1 ). The threshold specified by Scheible et al. (2024) for genome coverage was 90%. When applying ...

2024

[6] [6]

Violin plots were generated to compare the parchment type and the percent of mtGenome covered (A) and the mean read depth (B)

Violin Plots Comparing Parchment Type to Sequencing Metrics. Violin plots were generated to compare the parchment type and the percent of mtGenome covered (A) and the mean read depth (B). A violin plot is used to visualize data distribution with width indicating data density. The horizontal dashed lines on the plots indicate the thresholds for each metric...

2024

[7] [7]

Table 3: Reclassification results following augmentation (538 single-use vs

A caveat, however, is that without more independent palimpsest samples, these results, while encouraging, remain largely inconclusive. Table 3: Reclassification results following augmentation (538 single-use vs. 538 palimpsest sample segments). Results were excellent with a balanced accuracy of 0.989. Precision Recall F1-Score Palimpsest (n=538) 0.978 1.0...

2024

[8] [8]

M., Teasdale, M

Cassidy, L. M., Teasdale, M. D., Carolan, S., Enright, R., Werner, R., Bradley, D. G., Finlay, E. K., & Mattiangeli, V. (2017). Capturing goats: Documenting two hundred years of mitochondrial DNA diversity among goat populations from Britain and Ireland. Biology Letters, 13(3), 20160876. https://doi.org/10.1098/rsbl.2016.0876

work page doi:10.1098/rsbl.2016.0876 2017

[9] [9]

https://find.library.upenn.edu/catalog/9958752123503681?hld_id=22418714950003681

Kislak Center for Special Collections, University of Pennsylvania. https://find.library.upenn.edu/catalog/9958752123503681?hld_id=22418714950003681

work page arXiv

[10] [10]

G., et al

Daly, K. G., et al. (2018). Ancient goat genomes reveal mosaic domestication in the Fertile Crescent. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1746), 20160429. https://doi.org/10.1098/rstb.2016.0429

work page doi:10.1098/rstb.2016.0429 2018

[11] [11]

Fiddyment, S., et al. (2015). Animal origin of 13th-century uterine vellum revealed using non-invasive peptide fingerprinting. PNAS, 112(49), 15066–15071. https://doi.org/10.1073/pnas.1512264112

work page doi:10.1073/pnas.1512264112 2015

[12] [12]

Five ovine mitochondrial lineages identified from sheep breeds of the near East

Meadows JR, Cemal I, Karaca O, Gootwine E, Kijas JW. Five ovine mitochondrial lineages identified from sheep breeds of the near East. Genetics. 2007 Mar;175(3):1371-9. doi: 10.1534/genetics.106.068353. Epub 2006 Dec

work page doi:10.1534/genetics.106.068353 2007

[13] [13]

Large-scale mitochondrial DNA analysis of the domestic goat reveals six haplogroups with high diversity

Naderi S, Rezaei HR, Taberlet P, Zundel S, Rafat SA, Naghash HR, el-Barody MA, Ertugrul O, Pompanon F; Econogene Consortium. Large-scale mitochondrial DNA analysis of the domestic goat reveals six haplogroups with high diversity. PLoS One. 2007 Oct 10;2(10):e1012. doi: 10.1371/journal.pone.0001012. PMID: 17925860; PMCID: PMC1995761

work page doi:10.1371/journal.pone.0001012 2007

[14] [14]

T., & Kelso, J

Renaud, G., Slon, V., Duggan, A. T., & Kelso, J. (2015). Schmutzi: Estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biology, 16,

2015

[15] [15]

https://doi.org/10.1186/s13059-015-0776-0

work page doi:10.1186/s13059-015-0776-0

[16] [16]

PLoS One

The development of non-destructive sampling methods of parchment skins for genetic species identification. PLoS One . 5 19(3):e0299524. https://doi.org/10.1371/journal.pone.0299524

work page doi:10.1371/journal.pone.0299524