arxiv: 2605.02089 · v1 · submitted 2026-05-03 · 💻 cs.CV

Recognition: 2 theorem links

Cross-Language Learning within Arabic Script for Low-Resource HTR

Sana Al-azzawi , Elisa Barney , Marcus Liwicki

Authors on Pith no claims yet

Pith reviewed 2026-05-08 19:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords handwritten text recognitionarabic scriptcross-language learninglow-resourcecharacter error ratejoint trainingpersianurdu

0 comments

The pith

Joint training across Arabic, Urdu, and Persian scripts improves low-resource HTR by concentrating gains on shared characters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests cross-script joint training as a way to handle scarce labeled data for handwritten text recognition in Arabic-script languages. A CRNN model trained on the combined datasets from Arabic, Urdu, and Persian outperforms single-script baselines under low-resource conditions of 100 to 1000 lines. On Persian data the joint model reaches a CER of 9.99, beating prior results even without using all available training lines. On Urdu data the CER falls from 17.20 to 14.45. Character-level analysis shows the improvements arise mainly from characters common to the scripts, while language-specific characters see limited or negative transfer.

Core claim

A CRNN recognizer trained on the union of Arabic-script line datasets under low-resource regimes achieves lower character error rates than single-script training. On the PHTD Persian dataset this yields a CER of 9.99 despite incomplete data. On the UNHD Urdu dataset the CER drops from 17.20 to 14.45. The transfer occurs primarily because of script-level character overlap, with gains structurally concentrated on shared characters and little benefit for language-specific ones.

What carries the argument

Joint training of a CRNN sequence model on the union of low-resource Arabic, Urdu, and Persian line datasets, followed by statistical character-level breakdown of error reductions on shared versus script-unique characters.

Load-bearing premise

That script-level character overlap produces reliable positive transfer in low-resource regimes without substantial negative interference from language-specific characters or dataset differences.

What would settle it

A controlled run of the same joint-training setup that shows no concentration of error reduction on shared characters or that produces higher overall CER than the single-script baseline.

Figures

Figures reproduced from arXiv: 2605.02089 by Elisa Barney, Marcus Liwicki, Sana Al-azzawi.

**Figure 1.** Figure 1: Overlap between the character inventories of Arabic, Persian, and Urdu view at source ↗

**Figure 2.** Figure 2: CRNN architecture used in all experiments. view at source ↗

**Figure 3.** Figure 3: Comparison of training paradigms. (a) Single-script ( view at source ↗

**Figure 4.** Figure 4: Examples of handwritten text lines from the four datasets used in this view at source ↗

**Figure 5.** Figure 5: Qualitative comparison between single-script and multi-script training view at source ↗

**Figure 6.** Figure 6: Character-level transfer analysis for shared characters. Each dot repre view at source ↗

read the original abstract

Handwritten Text Recognition (HTR) under limited labeled data remains a challenging problem, particularly for Arabic-script languages. Although modern sequence-based recognizers perform well in high-resource settings, their accuracy degrades sharply as training data becomes scarce. Arabic-script languages share a common writing system with substantial character overlap, motivating cross-script training as a strategy to mitigate data scarcity. We performed experiments on Arabic, Urdu, and Persian scripts and achieved improvements over single-script baselines (new SotA especially for low-resource settings). A key finding of our experiments is that cross-script transfer is largely driven by script-level overlap rather than uniform accuracy improvements. Through a statistical character-level analysis we show that gains are structurally concentrated on characters shared across scripts, while language-specific characters exhibit limited or negative transfer. These findings provide insight into transfer dynamics in low-resource script families. Detailed results include: We conduct a controlled line-level study of cross-script joint training for Arabic-script HTR under low-resource regimes (number of samples K \in 100, 500, 1000 labeled lines) on Arabic (KHATT), Urdu (NUST-UHWR), and Persian (PHTD). A CRNN model is trained on the union of multiple related Arabic-script datasets and evaluated on individual target languages. On Persian (PHTD), joint training achieves a Character Error Rate (CER) of 9.99, surpassing previously reported results despite not using the full available training data. On an Urdu dataset (UNHD), joint training reduces CER from 17.20 to 14.45. Code and data splits are released to ensure reproducibility.1

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Joint training across Arabic-script languages cuts CER in low-resource HTR with gains concentrated on shared characters, though the design leaves room for data-volume confounds.

read the letter

The main takeaway is that combining small amounts of Arabic, Urdu, and Persian handwritten lines into one training set improves recognition on the target scripts, and the character-level breakdown points to overlap as the main driver rather than blanket gains. They keep the setup controlled: K equals 100, 500 or 1000 lines per language, a standard CRNN, and evaluation on held-out test lines from KHATT, NUST-UHWR, and PHTD. The numbers are concrete—9.99 CER on Persian beats earlier published results without using the full training pool, and Urdu drops from 17.20 to 14.45. The post-hoc count of per-character error changes is the clearest addition; it shows positive transfer mostly on glyphs that appear in more than one script and flat or worse results on script-specific ones. Releasing the exact splits and code is useful for anyone who wants to replicate or extend it. The limitation is that the analysis does not fully rule out simpler explanations. Adding more lines or stylistic variety could produce similar aggregate drops even without character overlap, and the paper does not include a control that mixes in non-Arabic-script data or balances character frequencies across the K subsets. No error bars or significance tests appear in the reported figures either. This is aimed at people working on historical document digitization in Arabic-script languages who need practical low-resource baselines. The experimental core is solid enough and the reproducibility steps are in place, so it should go to peer review rather than a desk reject. A referee could ask for the missing controls without changing the overall usefulness of the reported numbers.

Referee Report

1 major / 2 minor

Summary. The manuscript presents an empirical study on cross-script joint training for handwritten text recognition (HTR) in low-resource Arabic-script languages (Arabic KHATT, Urdu NUST-UHWR/UNHD, Persian PHTD). A CRNN is trained on unions of datasets using K=100/500/1000 labeled lines and evaluated on target languages. It reports CER improvements including 9.99 on Persian (new SOTA with partial data) and reduction from 17.20 to 14.45 on Urdu, with a post-hoc character-level statistical analysis claiming gains are concentrated on shared script characters while language-specific ones show limited/negative transfer. Code and data splits are released.

Significance. If the results and attribution hold, the work demonstrates practical benefits of script-family transfer for low-resource HTR and provides insight into why joint training helps (overlap-driven rather than uniform). The concrete CER numbers across controlled K regimes and the release of code/splits are strengths that support reproducibility and extension. The character-level breakdown, if robust, would be a useful addition to the literature on cross-lingual transfer in related scripts.

major comments (1)

[character-level analysis (results section)] The central claim that cross-script transfer is 'largely driven by script-level overlap' rests on the post-hoc statistical character-level analysis showing gains concentrated on shared characters. However, this breakdown does not control for character frequency imbalance across the K-line subsets and lacks a control that adds equally diverse but non-overlapping script data. Without these isolations, the structural attribution remains under-supported even if the raw joint-training CER reductions hold.

minor comments (2)

The abstract and reported results provide no error bars, statistical significance tests, or full details on baseline implementations and exact joint-training set compositions.
Clarify how the K-line subsets were sampled and whether language-specific characters were balanced in the joint vs. single-script conditions.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and have revised the paper to acknowledge limitations in the character-level analysis while preserving the reported CER results.

read point-by-point responses

Referee: The central claim that cross-script transfer is 'largely driven by script-level overlap' rests on the post-hoc statistical character-level analysis showing gains concentrated on shared characters. However, this breakdown does not control for character frequency imbalance across the K-line subsets and lacks a control that adds equally diverse but non-overlapping script data. Without these isolations, the structural attribution remains under-supported even if the raw joint-training CER reductions hold.

Authors: We agree that the post-hoc character-level analysis does not fully isolate script overlap from frequency effects and lacks a non-overlapping script control. The K-line subsets inherently reflect the natural frequency distributions of each dataset, and shared characters can appear more often due to multi-dataset union. Our analysis compares gains on shared versus language-specific characters within the same joint-training regime, which offers supportive but not conclusive evidence for overlap-driven transfer. A control adding equally diverse data from an unrelated script (e.g., Latin) would require new datasets outside our Arabic-script focus and is not feasible within the current experimental design. In the revised manuscript we have added a dedicated limitations paragraph in the discussion, tempered the claim from 'largely driven' to 'suggestive of overlap-driven transfer', and included a supplementary frequency table for characters in the K=100/500/1000 subsets. These changes partially address the concern without altering the core CER findings or data releases. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical HTR transfer study

full rationale

The manuscript reports controlled experiments training a CRNN on unions of Arabic-script line datasets (KHATT, NUST-UHWR, PHTD, UNHD) at fixed K-line budgets and measures CER on held-out test sets. All reported numbers (e.g., Persian joint CER 9.99, Urdu reduction 17.20→14.45) are direct outputs of the training runs; no equations, fitted parameters, or first-principles derivations are present that could reduce those outputs to quantities defined by the model itself. The post-hoc character-level breakdown is a statistical summary of the same empirical error counts and does not constitute a self-definitional or load-bearing self-citation step. The work therefore contains no circular reduction and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard supervised learning assumptions for sequence models and the suitability of CRNN for HTR; no new free parameters, axioms, or invented entities are introduced beyond the experimental protocol.

axioms (1)

domain assumption CRNN architecture is appropriate for line-level HTR across Arabic-script languages
Standard assumption in the HTR literature invoked by choice of model.

pith-pipeline@v0.9.0 · 5596 in / 1141 out tokens · 92004 ms · 2026-05-08T19:24:44.203962+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Understanding Cross-Language Transfer Improvements in Low-Resource HTR: The Role of Sequence Modeling
cs.CV 2026-05 unverdicted novelty 5.0

Sequence-level modeling, not shared visual features, explains cross-language transfer improvements in low-resource Arabic-script HTR.

Reference graph

Works this paper leans on

20 extracted references · 2 canonical work pages · cited by 1 Pith paper

[1]

C. Luo, Y. Zhu, L. Jin, Y. Wang, Learn to augment: Joint data augmen- tation and network optimization for text recognition, in: Proceedings of 17 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13746–13755

2020
[2]

Salaheldin Kasem, M

M. Salaheldin Kasem, M. Mahmoud, H.-S. Kang, Advancements and chal- lenges in Arabic optical character recognition: A comprehensive survey, ACM Computing Surveys 58 (4) (2025) 1–37

2025
[3]

Ahmad, S

R. Ahmad, S. Naz, M. Z. Afzal, S. F. Rashid, M. Liwicki, A. Dengel, The impact of visual similarities of Arabic-like scripts regarding learning in an OCR system, in: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Vol. 7, IEEE, 2017, pp. 15–19

2017
[4]

Nemati, C

F. Nemati, C. Westbury, G. Hollis, H. Haghbin, The Persian lexicon project: minimized orthographic neighbourhood effects in a dense language, Journal of Psycholinguistic Research 51 (5) (2022) 957–979

2022
[5]

Ul-Hasan, T

A. Ul-Hasan, T. M. Breuel, Can we build language-independent OCR using LSTM networks?, in: Proceedings of the 4th International Workshop on Multilingual OCR, 2013, pp. 1–5

2013
[6]

Huang, K

J. Huang, K. J. Liang, R. Kovvuri, T. Hassner, Task grouping for mul- tilingual text recognition, in: European Conference on Computer Vision, Springer, 2022, pp. 297–313

2022
[7]

E. F. Bilgin Tasdemir, Printed Ottoman text recognition using synthetic data and data augmentation, International Journal on Document Analysis and Recognition (IJDAR) 26 (3) (2023) 273–287

2023
[8]

Broadwell, U

P. Broadwell, U. Patel, M. Tekgürler, Multilingual handwritten text recog- nition (HTR) models for large-scale processing of archival documents in low-resourced Arabic-script languages, A vailable at SSRN 5190984 (2025)

2025
[9]

N. Riaz, H. Arbab, A. Maqsood, K. Nasir, A. Ul-Hasan, F. Shafait, Conv- transformer architecture for unconstrained off-line Urdu handwriting recog- nition, International Journal on Document Analysis and Recognition (IJ- DAR) 25 (4) (2022) 373–384

2022
[10]

Hamza, S

A. Hamza, S. Ren, U. Saeed, ET-Network: A novel eﬀicient transformer deep learning model for automated Urdu handwritten text recognition, PLOS One 19 (5) (2024) e0302590

2024
[11]

ul Sehr Zia, M

N. ul Sehr Zia, M. F. Naeem, S. M. K. Raza, M. M. Khan, A. Ul-Hasan, F. Shafait, A convolutional recursive deep architecture for unconstrained Urdu handwriting recognition, Neural Computing and Applications 34 (2) (2022) 1635–1648

2022
[12]

Retsinas, G

G. Retsinas, G. Sfikas, B. Gatos, C. Nikou, Best practices for a handwritten text recognition system, in: International Workshop on Document Analysis Systems, Springer, 2022, pp. 247–259

2022
[13]

S. A. Mahmoud, I. Ahmad, W. G. Al-Khatib, M. Alshayeb, M. T. Parvez, V. Märgner, G. A. Fink, KHATT: An open Arabic offline handwritten text database, Pattern Recognition 47 (3) (2014) 1096–1112

2014
[14]

Alaei, U

A. Alaei, U. Pal, P. Nagabhushan, Dataset and ground truth for handwrit- ten text in four different scripts, International Journal of Pattern Recogni- tion and Artificial Intelligence 26 (04) (2012) 1253001

2012
[15]

Al-azzawi, E

S. Al-azzawi, E. Barney, M. Liwicki, CER-HV: A human-in-the-loop frame- work for cleaning datasets applied to Arabic-script HTR, arXiv preprint arXiv:2601.16713 (2026). 18

work page arXiv 2026
[16]

S. B. Ahmed, S. Naz, S. Swati, M. I. Razzak, Handwritten Urdu character recognition using one-dimensional BLSTM classifier, Neural Computing and Applications 31 (4) (2019) 1143–1151

2019
[17]

A. Chan, A. Mijar, M. Saeed, C.-W. Wong, A. Khater, HATFormer: his- toric handwritten Arabic text recognition with transformers, arXiv preprint arXiv:2410.02179 (2024)

work page arXiv 2024
[18]

Saeed, A

M. Saeed, A. Chan, A. Mijar, G. Habchi, C. Younes, C.-W. Wong, A. Khater, Muharaf: Manuscripts of handwritten Arabic dataset for cur- sive text recognition, Advances in Neural Information Processing Systems 37 (2024) 58525–58538

2024
[19]

Bouchal, A

H. Bouchal, A. Belaid, F. Meziane, Towards accurate recognition of histori- cal Arabic manuscripts: A novel dataset and a generalizable pipeline, ACM Transactions on Asian and Low-Resource Language Information Processing (2025)

2025
[20]

Anjum, N

T. Anjum, N. Khan, Caltext: Contextual attention localization for offline handwritten text, Neural Processing Letters 55 (6) (2023) 7227–7257

2023