pith. sign in

arxiv: 2604.23683 · v2 · pith:QQOZKIIKnew · submitted 2026-04-26 · 💻 cs.CV

Learning to Decipher from Pixels: A Case Study of Copiale

Pith reviewed 2026-07-01 09:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords deciphermenthistorical ciphersCopiale cipherimage-to-textneural networkssubstitution ciphersend-to-end learninghandwriting recognition
0
0 comments X

The pith

An end-to-end neural model can map handwritten cipher images directly to plaintext without any intermediate symbol transcription.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that pretraining a model on generic handwriting datasets and then fine-tuning it on a new line-level dataset of Copiale cipher images paired with German plaintext enables direct image-to-text decipherment. This bypasses the traditional step of manually transcribing individual cipher symbols before applying cryptanalysis. The work focuses on substitution ciphers and demonstrates improved accuracy through the two-stage training process. A sympathetic reader would care because the method removes a labor-intensive and error-prone stage in analyzing historical encrypted manuscripts.

Core claim

The central claim is that transcription-free image-to-plaintext decipherment is both feasible and effective for historical substitution ciphers. Using the Copiale cipher as a case study, the authors introduce the first text-line-level dataset of cipher images aligned with German plaintext and show that pretraining on generic handwriting data followed by cipher-specific fine-tuning substantially raises decipherment accuracy compared to baselines.

What carries the argument

An end-to-end neural network that learns to map raw cipher image lines directly to plaintext sequences, trained first on broad handwriting data and then fine-tuned on aligned Copiale image-plaintext pairs.

If this is right

  • Decipherment workflows for historical manuscripts can skip the transcription stage entirely.
  • Accuracy on substitution ciphers improves when generic handwriting pretraining precedes cipher-specific fine-tuning.
  • The same end-to-end image-to-plaintext pipeline offers a scalable alternative to transcription-first methods.
  • New line-level datasets pairing cipher images with known plaintext enable supervised training for other ciphers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach might extend to other substitution ciphers if similar aligned line-level datasets can be created.
  • Combining the direct image model with existing cryptanalytic techniques could raise overall recovery rates on unknown keys.
  • Testing the model on full-page images rather than isolated lines would reveal whether layout context further aids decipherment.

Load-bearing premise

The model can learn the symbol-to-letter mappings purely from pixel patterns in the image-plaintext pairs without needing explicit symbol segmentation or transcription.

What would settle it

Evaluating the trained model on held-out Copiale cipher lines and finding that the generated plaintext matches the ground truth at rates no better than a random baseline or a simple frequency-matching method would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.23683 by Alicia Forn\'es, Be\'ata Megyesi, Giuseppe De Gregorio, Lei Kang, Raphaela Heil.

Figure 1
Figure 1. Figure 1: Overview of the training pipeline for our proposed Transcription-Free Decipherment paradigm. view at source ↗
Figure 2
Figure 2. Figure 2: (a) Attention visualization illustrating alignment between handwritten cipher regions and de view at source ↗
read the original abstract

Historical encrypted manuscripts require both paleographic interpretation of cipher symbols and cryptanalytic recovery of plaintext. Most existing computational workflows rely on a transcription-first paradigm, in which handwritten symbols are transcribed prior to decipherment. This intermediate step is labor-intensive, error-prone, and not always aligned with the goal of direct plaintext recovery. We propose an end-to-end, transcription-free approach that directly maps handwritten cipher images to plaintext. Using the Copiale cipher as a case study, we introduce the first text-line-level dataset pairing cipher images with German plaintext. We show that pretraining on generic handwriting data followed by cipher-specific fine-tuning substantially improves decipherment accuracy. Our results demonstrate that transcription-free image-to-plaintext decipherment is both feasible and effective for historical substitution ciphers, offering a simplified and scalable alternative to traditional pipelines. https://github.com/leitro/Decipher-from-Pixels-Copiale

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes an end-to-end, transcription-free pipeline that maps handwritten cipher images directly to plaintext for historical substitution ciphers. Using the Copiale cipher as a case study, the authors release the first line-level dataset of cipher images aligned with German plaintext and report that pretraining on generic handwriting data followed by cipher-specific fine-tuning yields substantial gains in decipherment accuracy.

Significance. If the experimental results hold, the work would demonstrate a viable simplification of existing cryptanalytic workflows by removing the transcription stage, which is labor-intensive and error-prone. The release of the aligned line-level dataset constitutes a concrete, reusable contribution to the digital humanities and historical cryptanalysis communities.

major comments (2)
  1. [Abstract] Abstract: the central claim that pretraining followed by fine-tuning 'substantially improves decipherment accuracy' is stated without any quantitative metrics, baseline comparisons, error rates, or dataset statistics. This absence prevents verification of whether the reported improvement is load-bearing for the feasibility conclusion.
  2. [Abstract] Abstract: the feasibility demonstration rests on the assumption that the new line-level dataset supplies reliable image-to-plaintext alignments for supervised training, yet no description of alignment creation, validation, or error rate is supplied, leaving the quality of the supervised signal unassessable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the manuscript to incorporate additional quantitative details and dataset information for improved clarity and verifiability.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that pretraining followed by fine-tuning 'substantially improves decipherment accuracy' is stated without any quantitative metrics, baseline comparisons, error rates, or dataset statistics. This absence prevents verification of whether the reported improvement is load-bearing for the feasibility conclusion.

    Authors: We agree that the abstract would be strengthened by including quantitative support for the claim. The body of the manuscript reports the relevant experimental results, including accuracy metrics, baseline comparisons, and dataset statistics. In the revised version we will update the abstract to include key figures such as the observed accuracy gains from pretraining plus fine-tuning, along with baseline error rates. revision: yes

  2. Referee: [Abstract] Abstract: the feasibility demonstration rests on the assumption that the new line-level dataset supplies reliable image-to-plaintext alignments for supervised training, yet no description of alignment creation, validation, or error rate is supplied, leaving the quality of the supervised signal unassessable.

    Authors: The manuscript provides a description of the line-level dataset and alignment process in the dedicated Dataset section. To make this information more immediately accessible, we will revise the abstract to include a concise statement on how the alignments were created and validated. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results independent of inputs

full rationale

The paper reports an empirical ML pipeline (pretraining on generic handwriting data then fine-tuning on a new line-level Copiale image-to-plaintext dataset) whose central claim is validated by experimental accuracy metrics rather than any derivation chain. No equations, fitted parameters renamed as predictions, or self-citation load-bearing steps appear in the provided abstract or description; the approach relies on standard supervised learning assumptions that are externally testable via the released dataset and code.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that pixel patterns alone suffice to recover the substitution mapping once the model is pretrained on handwriting.

axioms (1)
  • domain assumption A neural network can learn a direct mapping from cipher symbol images to plaintext letters without an intermediate symbolic transcription or segmentation step.
    This premise is required for the end-to-end claim but is not justified or tested in the provided abstract.

pith-pipeline@v0.9.1-grok · 5692 in / 1257 out tokens · 28892 ms · 2026-07-01T09:46:36.441917+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

3 extracted references

  1. [1]

    InProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, pages 2–9

    The copiale cipher. InProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, pages 2–9. Kevin Knight, Be ´ata Megyesi, and Christiane Schae- fer. 2012. The secrets of the copiale cipher.Jour- nal for Research into Freemasonry and Fraternal- ism, 2(2):314. Jan Koh ´ut and Michal Hradi ˇs. 2025. Practical f...

  2. [2]

    InEuro- pean Conference on Computer Vision, pages 330–

    Structured analysis and comparison of al- phabets in historical handwritten ciphers. InEuro- pean Conference on Computer Vision, pages 330–

  3. [3]

    Xusen Yin, Nada Aldarrab, Be ´ata Megyesi, and Kevin Knight

    Springer. Xusen Yin, Nada Aldarrab, Be ´ata Megyesi, and Kevin Knight. 2019. Decipherment of historical manuscript images. In2019 International Confer- ence on Document Analysis and Recognition (IC- DAR), pages 78–85. IEEE