pith. sign in

arxiv: 2606.06889 · v1 · pith:EDKOE6YAnew · submitted 2026-06-05 · 🧬 q-bio.GN

From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts

Pith reviewed 2026-06-27 20:27 UTC · model grok-4.3

classification 🧬 q-bio.GN
keywords biocodicologypalimpsest detectionmitochondrial genomesneural networksmachine learningparchment DNAmedieval manuscriptsDNA sequencing
0
0 comments X

The pith

Mitochondrial genome sequencing data supports neural network classification of palimpsested versus single-use parchment folios.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates whether DNA preserved in medieval parchment can be used to detect reused pages after chemical washing for new writing. Researchers non-destructively sampled and sequenced mitochondrial genomes from both single-use and palimpsested folios in a 14th-century manuscript, finding that both retained sufficient DNA with no significant differences in coverage or depth. They applied logistic regression and neural network classifiers to features from this sequencing data to separate the two classes. The models delivered high precision but lower recall for the minority palimpsest class, which the authors attribute to limited sample numbers. This demonstrates a computational approach to biocodicology that could help identify reused parchment without further damage to the artifact.

Core claim

The authors establish that mtGenome sequencing data from a 14th-century manuscript supports the use of machine learning classifiers, including neural networks, to distinguish palimpsested folios from single-use ones, even though genome coverage and depth show no significant differences between the two. While precision is high, recall for palimpsests is reduced due to dataset imbalance, and more samples are needed.

What carries the argument

Neural network and logistic regression classifiers trained on features from mitochondrial genome sequencing of parchment DNA to classify folios as palimpsested or single-use.

If this is right

  • Palimpsest preparation does not significantly compromise the integrity of mtGenomes for sequencing analysis.
  • Machine learning on ancient DNA data can assist non-destructive identification of reused parchment.
  • Integration of molecular biology and neural networks provides new computational tools for manuscript studies.
  • Dataset imbalance limits recall for the minority palimpsest class and requires more samples to address.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Larger datasets of palimpsest mtGenomes could make the classifiers practical for routine use in cataloging collections.
  • The same DNA features might reveal other forms of alteration or reuse in historical documents beyond this single manuscript.
  • Non-destructive DNA sampling combined with classifiers could scale to study provenance patterns across many medieval texts.

Load-bearing premise

That mtGenome sequencing data contains discriminative features allowing reliable distinction between palimpsest and single-use folios even without significant differences in coverage or depth and with few palimpsest samples available.

What would settle it

Sequencing additional confirmed palimpsest samples, retraining the classifiers, and measuring whether recall improves substantially while precision remains high.

read the original abstract

Biocodicology, the study of biological information preserved in manuscripts, offers new opportunities to examine parchment as both a textual and biological artefact. This study applies non-destructive sampling to isolate and sequence mitochondrial genomes (mtGenomes) from a 14th-century manuscript, Ms. Codex 1629, which contains both single-use and palimpsested folios. We sought to evaluate whether palimpsest preparation, including chemical washing, compromised DNA integrity and whether computational methods could aid in identifying reused parchment. DNA sequencing revealed that both single-use and palimpsested parchments retained sufficient mtGenomes for analysis, with no significant differences in genome coverage or depth. To assess the potential of computational biology in manuscript studies, we implemented machine learning classifiers, including logistic regression and neural networks, to distinguish palimpsests from single-use folios. Models achieved high precision but exhibited reduced recall for the minority palimpsest class, reflecting dataset imbalance. While additional ancient mtGenome samples from palimpsest are required and further testing is needed, this study demonstrates how integrating molecular biology and neural networks highlights new approaches for palimpsest detection and underscores the evolving role of data science in biocodicology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that non-destructive mtGenome sequencing of a 14th-century manuscript (Ms. Codex 1629) recovers sufficient DNA from both single-use and palimpsested folios, with no significant differences in coverage or depth, and that logistic regression and neural network classifiers applied to this sequencing data can distinguish palimpsests with high precision (though reduced recall due to class imbalance).

Significance. If the classifiers can be shown to exploit genuine biological signal rather than artifacts, the work would creditably pioneer an interdisciplinary biocodicology-ML pipeline for non-invasive palimpsest detection. At present the absence of any reported discriminative features beyond the explicitly non-significant coverage/depth metrics, combined with missing sample sizes and performance numbers, prevents assessment of whether the central empirical claim holds.

major comments (2)
  1. [Abstract] Abstract: the statement that 'DNA sequencing revealed ... no significant differences in genome coverage or depth' directly conflicts with the subsequent claim that the same mtGenome data supports neural-network classification of palimpsests; no alternative features (variant profiles, mapping statistics, base-quality distributions, etc.) are described that could explain class separation.
  2. [Abstract] Abstract: claims of 'high precision' and 'reduced recall for the minority palimpsest class' are presented without any numerical values, sample sizes (N single-use vs. N palimpsest), confusion matrices, or statistical tests, rendering the performance assertions unverifiable and the imbalance caveat unquantified.
minor comments (1)
  1. The manuscript should include a dedicated Methods or Results subsection specifying the exact input representation fed to the logistic regression and neural-network models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below and have revised the abstract to improve clarity and verifiability.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that 'DNA sequencing revealed ... no significant differences in genome coverage or depth' directly conflicts with the subsequent claim that the same mtGenome data supports neural-network classification of palimpsests; no alternative features (variant profiles, mapping statistics, base-quality distributions, etc.) are described that could explain class separation.

    Authors: We acknowledge the potential for misinterpretation in the abstract. The classifiers operate on multiple features extracted from the mtGenome sequencing reads (variant profiles, mapping statistics, and base-quality distributions), which are described in the Methods and Results sections of the full manuscript; coverage and depth were reported separately to establish that palimpsest preparation does not destroy recoverable DNA. To eliminate ambiguity we have revised the abstract to explicitly note the additional discriminative features used by the models. revision: yes

  2. Referee: [Abstract] Abstract: claims of 'high precision' and 'reduced recall for the minority palimpsest class' are presented without any numerical values, sample sizes (N single-use vs. N palimpsest), confusion matrices, or statistical tests, rendering the performance assertions unverifiable and the imbalance caveat unquantified.

    Authors: We agree that the abstract should contain the quantitative details already present in the main text. We have revised the abstract to report the sample sizes, precision and recall values, and to reference the confusion matrix and statistical tests shown in the Results section and supplementary materials. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical sequencing + standard ML classification

full rationale

The paper reports mtGenome sequencing results from a single manuscript and applies off-the-shelf classifiers (logistic regression, neural networks) to the resulting coverage/depth and related metrics. No equations, derivations, first-principles claims, or parameter-fitting steps that are later relabeled as predictions appear anywhere in the text. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results occurs. The reported precision/recall figures are direct empirical outputs on the collected samples; the authors themselves note the small palimpsest sample size and resulting imbalance, confirming the work remains an open experimental report rather than a closed definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities are described in the provided text.

pith-pipeline@v0.9.1-grok · 5774 in / 1248 out tokens · 25927 ms · 2026-06-27T20:27:25.857516+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 8 canonical work pages

  1. [1]

    Harr III, 2 Madelin E

    From Genomes to Algorithms: Neural Network Applications for Palimpsest Detection in Medieval Manuscripts 1 James B. Harr III, 2 Madelin E. Blong, 3 Tessa Gadomski, 4 Kelly A. Meiklejohn, 5 William E. Gundling Jr. 1 Assistant Teaching Professor, College of William and Mary, Williamsburg, Virginia USA 2 Molecular Laboratory Technician, Naveris Inc., Durham,...

  2. [2]

    Neural network applications were performed in Python using Google Colab, which enabled flexible experimentation with classification, oversampling techniques (SMOTE), and dimensionality reduction (PCA). While models showed strong precision, palimpsests remained harder to detect due to class imbalance (largely due to the limited availability of palimpsest s...

  3. [3]

    Mitochondrial genome (mtGenome) enrichment and sequencing mtGenome enrichment and sequencing were performed following the protocol from Scheible et al

    was prepared with a maximum of 20 µl of any library added to the pool, and (E) pooled libraries were purified using a 1.8 X bead ratio of KAPA Pure Beads (Roche) and concentrated to 25 µl. Mitochondrial genome (mtGenome) enrichment and sequencing mtGenome enrichment and sequencing were performed following the protocol from Scheible et al. (2024) with the ...

  4. [4]

    (2024) for assessing the quality of the resulting mtGenome sequences: (1) percentage of the mtGenome covered, and (2) mean read depth across the mtGenome

    We used two of the metrics outlined in Scheible et al. (2024) for assessing the quality of the resulting mtGenome sequences: (1) percentage of the mtGenome covered, and (2) mean read depth across the mtGenome. Table

  5. [5]

    The threshold specified by Scheible et al

    15,925 75,141 96 521 ± 172 There were no significant differences between the palimpsested parchments and the single-use parchments in either the coverage of the mtGenome (size is 16,616 bp) or mean read depth (p = 0.0799 and p = 0.8397, respectively) ( Table 1 ). The threshold specified by Scheible et al. (2024) for genome coverage was 90%. When applying ...

  6. [6]

    Violin plots were generated to compare the parchment type and the percent of mtGenome covered (A) and the mean read depth (B)

    Violin Plots Comparing Parchment Type to Sequencing Metrics. Violin plots were generated to compare the parchment type and the percent of mtGenome covered (A) and the mean read depth (B). A violin plot is used to visualize data distribution with width indicating data density. The horizontal dashed lines on the plots indicate the thresholds for each metric...

  7. [7]

    Table 3: Reclassification results following augmentation (538 single-use vs

    A caveat, however, is that without more independent palimpsest samples, these results, while encouraging, remain largely inconclusive. Table 3: Reclassification results following augmentation (538 single-use vs. 538 palimpsest sample segments). Results were excellent with a balanced accuracy of 0.989. Precision Recall F1-Score Palimpsest (n=538) 0.978 1.0...

  8. [8]

    M., Teasdale, M

    Cassidy, L. M., Teasdale, M. D., Carolan, S., Enright, R., Werner, R., Bradley, D. G., Finlay, E. K., & Mattiangeli, V. (2017). Capturing goats: Documenting two hundred years of mitochondrial DNA diversity among goat populations from Britain and Ireland. Biology Letters, 13(3), 20160876. https://doi.org/10.1098/rsbl.2016.0876

  9. [9]

    https://find.library.upenn.edu/catalog/9958752123503681?hld_id=22418714950003681

    Kislak Center for Special Collections, University of Pennsylvania. https://find.library.upenn.edu/catalog/9958752123503681?hld_id=22418714950003681

  10. [10]

    G., et al

    Daly, K. G., et al. (2018). Ancient goat genomes reveal mosaic domestication in the Fertile Crescent. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1746), 20160429. https://doi.org/10.1098/rstb.2016.0429

  11. [11]

    Fiddyment, S., et al. (2015). Animal origin of 13th-century uterine vellum revealed using non-invasive peptide fingerprinting. PNAS, 112(49), 15066–15071. https://doi.org/10.1073/pnas.1512264112

  12. [12]

    Five ovine mitochondrial lineages identified from sheep breeds of the near East

    Meadows JR, Cemal I, Karaca O, Gootwine E, Kijas JW. Five ovine mitochondrial lineages identified from sheep breeds of the near East. Genetics. 2007 Mar;175(3):1371-9. doi: 10.1534/genetics.106.068353. Epub 2006 Dec

  13. [13]

    Large-scale mitochondrial DNA analysis of the domestic goat reveals six haplogroups with high diversity

    Naderi S, Rezaei HR, Taberlet P, Zundel S, Rafat SA, Naghash HR, el-Barody MA, Ertugrul O, Pompanon F; Econogene Consortium. Large-scale mitochondrial DNA analysis of the domestic goat reveals six haplogroups with high diversity. PLoS One. 2007 Oct 10;2(10):e1012. doi: 10.1371/journal.pone.0001012. PMID: 17925860; PMCID: PMC1995761

  14. [14]

    T., & Kelso, J

    Renaud, G., Slon, V., Duggan, A. T., & Kelso, J. (2015). Schmutzi: Estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA. Genome Biology, 16,

  15. [15]

    https://doi.org/10.1186/s13059-015-0776-0

  16. [16]

    PLoS One

    The development of non-destructive sampling methods of parchment skins for genetic species identification. PLoS One . 5 19(3):e0299524. https://doi.org/10.1371/journal.pone.0299524