DeepMine-Mamba: Mitigating Information Dilution in Mamba-Based State Space Models for Document Image Binarization

Chia-Min Lin; Hsin-Jui Pan; Jen-Shiun Chiang; Sheng-Wei Chan; Yung-Che Wang

arxiv: 2606.08781 · v1 · pith:JG4G7JSGnew · submitted 2026-06-07 · 💻 cs.CV

DeepMine-Mamba: Mitigating Information Dilution in Mamba-Based State Space Models for Document Image Binarization

Sheng-Wei Chan , Yung-Che Wang , Hsin-Jui Pan , Chia-Min Lin , Jen-Shiun Chiang This is my paper

Pith reviewed 2026-06-27 18:45 UTC · model grok-4.3

classification 💻 cs.CV

keywords document image binarizationMambastate space modelsAnti-Dilution GateDIBCO benchmarksstroke preservationfeature propagationdegraded documents

0 comments

The pith

Mamba-based binarization dilutes faint text strokes during state propagation, but an Anti-Dilution Gate restores stroke-sensitive responses to fix it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper explores Mamba state space models for document image binarization, which must separate text from degraded backgrounds while keeping thin, broken, and low-contrast strokes intact. Direct state-space propagation in these models can weaken weak foreground cues such as faint ink traces and boundary details over long distances. The authors introduce an Anti-Dilution Gate that estimates the feature changes caused by propagation and selectively restores local stroke responses while limiting background enhancement. On DIBCO and H-DIBCO benchmarks with a leave-one-year-out protocol, the resulting DeepMine-Mamba framework reaches competitive FM and Fps scores, with ablations linking the gate to better stroke preservation. A reader would care because reliable binarization supports downstream tasks like OCR on historical or noisy scans.

Core claim

Direct state-space propagation in Mamba models for document binarization dilutes weak foreground cues, especially faint ink traces, fragmented characters, and boundary-sensitive stroke details. DeepMine-Mamba counters this with a novel Anti-Dilution Gate that estimates propagation-induced feature changes and selectively restores stroke-sensitive local responses while suppressing unnecessary background enhancement. Experiments on DIBCO/H-DIBCO benchmarks under strict leave-one-year-out evaluation show competitive overall performance with strong average FM and Fps across years, and ablations confirm the gate improves stroke preservation and reduces perceptually significant errors.

What carries the argument

The Anti-Dilution Gate, which estimates propagation-induced feature changes to restore stroke-sensitive local responses and suppress background enhancement.

If this is right

The gate improves preservation of thin, broken, and low-contrast strokes on standard benchmarks.
DeepMine-Mamba reaches competitive average FM and Fps scores across multiple DIBCO/H-DIBCO years.
Mamba-based pipelines become viable for binarization once equipped with targeted correction for local detail loss.
Ablation evidence ties the gate directly to reduced perceptually significant binarization errors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gate design could be tested on other state-space vision tasks that require retention of sparse local signals.
If dilution arises from long-range state updates, analogous modules might benefit non-Mamba sequence models in image restoration.
Extending evaluation to additional degradation types beyond DIBCO years would clarify whether the mechanism generalizes.

Load-bearing premise

The observed gains on DIBCO benchmarks are caused by the Anti-Dilution Gate mitigating dilution rather than by other modeling choices or dataset-specific tuning.

What would settle it

An ablation that removes only the Anti-Dilution Gate from the full DeepMine-Mamba architecture, then measures the change in stroke-specific metrics such as Fps and thin-stroke FM on the same DIBCO leave-one-year-out splits.

Figures

Figures reproduced from arXiv: 2606.08781 by Chia-Min Lin, Hsin-Jui Pan, Jen-Shiun Chiang, Sheng-Wei Chan, Yung-Che Wang.

**Figure 1.** Figure 1: Overall architecture of DeepMine-Mamba. The proposed framework combines ConvNeXt feature extraction, Sobel edge guidance, Mamba-based state modeling, and anti-dilution refinement for document image binarization. between-class variance of foreground and background pixels [1]. In contrast, adaptive thresholding methods compute local thresholds according to neighborhood statistics, making them more suitable… view at source ↗

**Figure 2.** Figure 2: Qualitative visualization of DeepMine-Mamba on [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Bright and low-contrast background case. From [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Complex degraded background case. From left to [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Document image binarization aims to separate foreground text from degraded backgrounds while preserving thin, broken, and low-contrast strokes. Although deep learning methods have improved binarization performance, most existing approaches rely on convolutional, transformer-based, or generative architectures, while Mamba-based state space models remain largely unexplored for this task. In this work, we investigate Mamba-based feature propagation and observe that direct state-space propagation may dilute weak foreground cues during long-range modeling, especially faint ink traces, fragmented characters, and boundary-sensitive stroke details. To address this problem, we propose DeepMine-Mamba, a Mamba-based binarization framework equipped with a novel Anti-Dilution Gate that estimates propagation-induced feature changes and selectively restores stroke-sensitive local responses while suppressing unnecessary background enhancement. Experiments on DIBCO/H-DIBCO benchmarks under a strict leave-one-year-out protocol show that DeepMine-Mamba achieves competitive overall performance, with strong average FM and Fps across benchmark years. Ablation results further demonstrate that the Anti-Dilution Gate improves stroke preservation and reduces perceptually significant binarization errors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mamba applied to binarization with a new gate that claims to fix dilution, but the evidence for that specific mechanism is indirect at best.

read the letter

The paper's main move is taking Mamba state-space models into document image binarization and adding an Anti-Dilution Gate to counter feature dilution during propagation. That combination is new for this task. It reports competitive average FM and Fps on DIBCO/H-DIBCO under a leave-one-year-out protocol, and the ablations link the gate to better stroke preservation.

The work is straightforward and the benchmark protocol is reasonable. The gate is presented as estimating propagation changes and selectively restoring local stroke responses while damping background, which sounds plausible on paper.

The soft spot is the missing link between the gate and the claimed dilution effect. The abstract and stress-test note give no equations, no before/after state norms, and no ablation that pits the gate against a generic local-enhancement module. Downstream metric gains are shown, but they do not isolate whether the improvement comes from dilution mitigation or from other modeling choices. Without that, the central story stays untested.

This is for people already working on binarization or Mamba variants in vision. A reader in either area could extract the application details and the reported numbers. It is coherent enough on its own terms to deserve referee time rather than desk rejection, though the methods section will need to supply the missing mechanistic checks.

Referee Report

2 major / 1 minor

Summary. The paper proposes DeepMine-Mamba, a Mamba-based state-space model for document image binarization. It observes that direct SSM propagation can dilute weak foreground cues (thin strokes, low-contrast ink) and introduces an Anti-Dilution Gate that estimates propagation-induced feature changes to selectively restore stroke-sensitive local responses while suppressing background enhancement. On DIBCO/H-DIBCO benchmarks under a leave-one-year-out protocol the model reports competitive average FM and Fps scores; ablations are said to confirm that the gate improves stroke preservation and reduces perceptually significant errors.

Significance. If the Anti-Dilution Gate can be shown to specifically counteract dilution rather than provide generic local enhancement, the work would supply a targeted architectural motif for preserving fine detail under long-range SSM propagation, a setting relevant to many degraded-image tasks. The leave-one-year-out protocol is a positive design choice that reduces temporal overfitting risk on the DIBCO series.

major comments (2)

[Ablation study] Ablation study (mentioned in abstract): the reported FM/Fps gains from adding the Anti-Dilution Gate are not accompanied by a control that replaces the gate with an equivalent non-dilution-aware module (e.g., a standard local convolution or generic gating block); without this isolation the causal link between the gate and mitigation of propagation-induced dilution remains untested.
[Method] Method description (abstract and § on Anti-Dilution Gate): no equations, feature-map visualizations, or quantitative diagnostics (state-difference norm, foreground-response histograms before/after propagation) are supplied to demonstrate that the gate detects and counters dilution of weak cues rather than performing generic feature modulation.

minor comments (1)

[Abstract] The abstract states 'competitive overall performance' without quoting the numerical margins relative to the strongest published baselines; adding these deltas would clarify the practical advance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the empirical isolation of the Anti-Dilution Gate's effect.

read point-by-point responses

Referee: [Ablation study] Ablation study (mentioned in abstract): the reported FM/Fps gains from adding the Anti-Dilution Gate are not accompanied by a control that replaces the gate with an equivalent non-dilution-aware module (e.g., a standard local convolution or generic gating block); without this isolation the causal link between the gate and mitigation of propagation-induced dilution remains untested.

Authors: We agree that the current ablation lacks a direct control isolating dilution-specific behavior from generic local enhancement. In the revised manuscript we will add an ablation replacing the Anti-Dilution Gate with both a standard local convolution block and a generic gating block, reporting the resulting FM and Fps scores under the same leave-one-year-out protocol. revision: yes
Referee: [Method] Method description (abstract and § on Anti-Dilution Gate): no equations, feature-map visualizations, or quantitative diagnostics (state-difference norm, foreground-response histograms before/after propagation) are supplied to demonstrate that the gate detects and counters dilution of weak cues rather than performing generic feature modulation.

Authors: We acknowledge that the manuscript currently provides insufficient mechanistic evidence. The revised version will include the full equations of the Anti-Dilution Gate, feature-map visualizations highlighting weak foreground responses, and quantitative diagnostics (state-difference norms and foreground-response histograms before/after propagation) to demonstrate targeted restoration of diluted cues. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical proposal with independent benchmark validation

full rationale

The paper proposes the Anti-Dilution Gate as an architectural addition to Mamba-based models for document binarization, motivated by an observed dilution phenomenon during state propagation. Performance is evaluated via standard DIBCO benchmarks under leave-one-year-out protocol, with ablations reporting downstream FM/Fps metrics. No equations, fitted parameters, or self-citations are described that would reduce the gate's claimed effect to a tautology, a renamed input, or a self-referential prediction. The derivation chain consists of standard model design followed by external empirical testing and remains self-contained against the provided benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper rests on the domain assumption that Mamba propagation dilutes weak foreground signals and that a learned gate can selectively restore them; the Anti-Dilution Gate itself is an invented component whose independent evidence is limited to the reported ablations.

axioms (1)

domain assumption Mamba state-space models are suitable for image feature propagation in binarization tasks
The work assumes this suitability and proceeds to diagnose and patch a dilution problem within that architecture.

invented entities (1)

Anti-Dilution Gate no independent evidence
purpose: To estimate feature changes during Mamba propagation and selectively restore stroke-sensitive responses
New module introduced to address the hypothesized dilution issue; no external falsifiable prediction is supplied.

pith-pipeline@v0.9.1-grok · 5741 in / 1177 out tokens · 22897 ms · 2026-06-27T18:45:07.986002+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Reload-Mamba: Hierarchical Anti-Dilution State-Space Modeling for Multi-Class Semantic Segmentation
cs.CV 2026-06 unverdicted novelty 5.0

Reload-Mamba augments a ConvNeXt-Tiny + four-directional Mamba encoder-decoder with boundary-supervised detail prior, entropy-aware Reload Gate, and three-level hierarchical reload, reporting 47.9% mIoU on ADE20K and ...

Reference graph

Works this paper leans on

33 extracted references · 11 canonical work pages · cited by 1 Pith paper

[1]

Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics 9 (1979) 62– 66

N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics 9 (1979) 62– 66

1979
[2]

Sauvola, M

J. Sauvola, M. Pietikäinen, Adaptive document image binarization, Pattern Recognition 33 (2000) 225–236

2000
[3]

B.Gatos,I.Pratikakis,S.J.Perantonis, Adaptivedegradeddocument image binarization, Pattern Recognition 39 (2006) 317–327

2006
[4]

In: International Conference on Medical image com- puting and computer-assisted intervention

O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing andComputer-AssistedIntervention–MICCAI2015,Springer,2015, pp. 234–241. doi:10.1007/978-3-319-24574-4_28

work page doi:10.1007/978-3-319-24574-4_28 2015
[5]

M. A. Souibgui, S. Biswas, S. K. Jemni, Y. Kessentini, A. Fornés, J. Lladós, U. Pal, DocEnTr: An end-to-end document image en- hancement transformer, in: Proceedings of the 26th International Conference on Pattern Recognition, 2022, pp. 1699–1705. doi:10. 1109/ICPR56361.2022.9956101

arXiv 2022
[6]

Cicchetti, D

G. Cicchetti, D. Comminiello, NAF-DPM: A nonlinear activation- free diffusion probabilistic model for document enhancement, arXiv preprint arXiv:2404.05669 (2024). S.W. Chan:Preprint submitted to ElsevierPage 6 of 7 DeepMine-Mamba for Document Image Binarization

arXiv 2024
[7]

A.Gu,T.Dao,Mamba:Linear-timesequencemodelingwithselective state spaces, arXiv preprint arXiv:2312.00752 (2023)

Pith/arXiv arXiv 2023
[8]

Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A convnet for the 2020s, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976–11986

2022
[9]

Presented at the Stanford Artificial Intelligence Project

I.Sobel,G.Feldman,Anisotropic3x3imagegradientoperator,1968. Presented at the Stanford Artificial Intelligence Project

1968
[10]

S. S. M. Salehi, D. Erdogmus, A. Gholipour, Tversky loss function for image segmentation using 3d fully convolutional deep networks, in:InternationalWorkshoponMachineLearninginMedicalImaging, Springer, 2017, pp. 379–387. doi:10.1007/978-3-319-67389-9_44

work page doi:10.1007/978-3-319-67389-9_44 2017
[11]

Russakovsky, J

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei- Fei, ImageNet large scale visual recognition challenge, International Journal of Computer Vision 115 (2015) 211–252

2015
[12]

Loshchilov, F

I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: International Conference on Learning Representations, 2019

2019
[13]

Micikevicius, S

P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Gar- cia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, H. Wu, Mixed precision training, in: International Conference on Learning Representations, 2018

2018
[14]

Gatos, K

B. Gatos, K. Ntirogiannis, I. Pratikakis, ICDAR 2009 document imagebinarizationcontest(DIBCO2009),in:Proceedingsofthe10th International Conference on Document Analysis and Recognition, 2009, pp. 1375–1382. doi:10.1109/ICDAR.2009.246

work page doi:10.1109/icdar.2009.246 2009
[15]

I.Pratikakis,B.Gatos,K.Ntirogiannis,H-DIBCO2010:Handwritten documentimagebinarizationcompetition,in:Proceedingsofthe12th International Conference on Frontiers in Handwriting Recognition, 2010, pp. 727–732. doi:10.1109/ICFHR.2010.118

work page doi:10.1109/icfhr.2010.118 2010
[16]

Pratikakis, B

I. Pratikakis, B. Gatos, K. Ntirogiannis, Icdar 2011 document image binarization contest, in: Proceedings of the International Confer- ence on Document Analysis and Recognition, 2011, pp. 1506–1510. doi:10.1109/ICDAR.2011.299

work page doi:10.1109/icdar.2011.299 2011
[17]

Pratikakis, B

I. Pratikakis, B. Gatos, K. Ntirogiannis, Icfhr 2012 competition on handwritten document image binarization, in: Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2012, pp. 817–822. doi:10.1109/ICFHR.2012.216

work page doi:10.1109/icfhr.2012.216 2012
[18]

Pratikakis, B

I. Pratikakis, B. Gatos, K. Ntirogiannis, Icdar 2013 document image binarization contest, in: Proceedings of the International Confer- ence on Document Analysis and Recognition, 2013, pp. 1471–1476. doi:10.1109/ICDAR.2013.219

work page doi:10.1109/icdar.2013.219 2013
[19]

Ntirogiannis, B

K. Ntirogiannis, B. Gatos, I. Pratikakis, Icfhr 2014 competition on handwritten document image binarization, in: Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2014, pp. 809–813. doi:10.1109/ICFHR.2014.141

work page doi:10.1109/icfhr.2014.141 2014
[20]

I.Pratikakis,K.Zagoris,G.Barlas,B.Gatos, Icfhr2016handwritten document image binarization contest, in: Proceedings of the Interna- tionalConferenceonFrontiersinHandwritingRecognition,2016,pp. 619–623. doi:10.1109/ICFHR.2016.0118

work page doi:10.1109/icfhr.2016.0118 2016
[21]

1395–1403

I.Pratikakis,K.Zagoris,G.Barlas,B.Gatos, Icdar2017competition ondocumentimagebinarization, in:ProceedingsoftheInternational Conference on Document Analysis and Recognition Workshops, 2017, pp. 1395–1403. doi:10.1109/ICDAR.2017.228

work page doi:10.1109/icdar.2017.228 2017
[22]

I.Pratikakis,K.Zagoris,P.Kaddas,B.Gatos, Icfhr2018competition on handwritten document image binarization, in: Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2018, pp. 489–493. doi:10.1109/ICFHR-2018.2018.00091

work page doi:10.1109/icfhr-2018.2018.00091 2018
[23]

Marthot-Santaniello, Icdar 2019 competition on document image binarization, in: Proceedings of the International Conference on Document Analysis and Recognition, 2019, pp

I.Pratikakis,K.Zagoris,X.Karagiannis,L.Tsochatzidis,T.Mondal, I. Marthot-Santaniello, Icdar 2019 competition on document image binarization, in: Proceedings of the International Conference on Document Analysis and Recognition, 2019, pp. 1547–1556. doi:10. 1109/ICDAR.2019.00249

arXiv 2019
[24]

H.Lu,A.C.Kot,Y.Q.Shi,Distance-reciprocaldistortionmeasurefor binary document images, IEEE Signal Processing Letters 11 (2004) 228–231

2004
[25]

B. Su, S. Lu, C. L. Tan, Robust document image binarization technique for degraded document images, IEEE Transactions on Image Processing 22 (2013) 1408–1417

2013
[26]

S.He,L.Schomaker, Documentenhancementandbinarizationusing iterative deep learning, Pattern Recognition 91 (2019) 379–390

2019
[27]

R.De,A.Chakraborty,R.Sarkar,Documentimagebinarizationusing dual discriminator generative adversarial networks, IEEE Signal Processing Letters 27 (2020) 1090–1094

2020
[28]

J. Zhao, C. Shi, F. Jia, Y. Wang, B. Xiao, Document image binariza- tion with cascaded generators of conditional generative adversarial networks, Pattern Recognition 96 (2019) 106968

2019
[29]

Biswas, S

R. Biswas, S. K. Roy, N. Wang, U. Pal, G.-B. Huang, Docbinformer: A two-level transformer network for effective document image bina- rization, arXiv preprint arXiv:2312.03568 (2023)

arXiv 2023
[30]

Yang, et al., A novel degraded document binarization model through vision transformer, Information Fusion 93 (2023) 159–173

M. Yang, et al., A novel degraded document binarization model through vision transformer, Information Fusion 93 (2023) 159–173

2023
[31]

Biswas, S

R. Biswas, S. Sarkhel, S. K. Roy, U. Pal, TransDocUNet: A transformer-based UNet architecture for degraded document image binarization, in: Proceedings of the 14th Indian Conference on ComputerVision,GraphicsandImageProcessing,2023.doi:10.1145/ 3627631.3627639

arXiv 2023
[32]

Z. Yang, Z. Zhang, N. Wang, T. Chen, X. Liu, Docdiff: Docu- ment enhancement via residual diffusion models, arXiv preprint arXiv:2305.03892 (2023)

arXiv 2023
[33]

R.-Y. Ju, K. Wong, Y. Jin, J.-S. Chiang, Mfe-gan: Efficient gan- based framework for document image enhancement and binarization with multi-scale feature extraction, arXiv preprint arXiv:2512.14114 (2025). S.W. Chan:Preprint submitted to ElsevierPage 7 of 7

arXiv 2025

[1] [1]

Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics 9 (1979) 62– 66

N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics 9 (1979) 62– 66

1979

[2] [2]

Sauvola, M

J. Sauvola, M. Pietikäinen, Adaptive document image binarization, Pattern Recognition 33 (2000) 225–236

2000

[3] [3]

B.Gatos,I.Pratikakis,S.J.Perantonis, Adaptivedegradeddocument image binarization, Pattern Recognition 39 (2006) 317–327

2006

[4] [4]

In: International Conference on Medical image com- puting and computer-assisted intervention

O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing andComputer-AssistedIntervention–MICCAI2015,Springer,2015, pp. 234–241. doi:10.1007/978-3-319-24574-4_28

work page doi:10.1007/978-3-319-24574-4_28 2015

[5] [5]

M. A. Souibgui, S. Biswas, S. K. Jemni, Y. Kessentini, A. Fornés, J. Lladós, U. Pal, DocEnTr: An end-to-end document image en- hancement transformer, in: Proceedings of the 26th International Conference on Pattern Recognition, 2022, pp. 1699–1705. doi:10. 1109/ICPR56361.2022.9956101

arXiv 2022

[6] [6]

Cicchetti, D

G. Cicchetti, D. Comminiello, NAF-DPM: A nonlinear activation- free diffusion probabilistic model for document enhancement, arXiv preprint arXiv:2404.05669 (2024). S.W. Chan:Preprint submitted to ElsevierPage 6 of 7 DeepMine-Mamba for Document Image Binarization

arXiv 2024

[7] [7]

A.Gu,T.Dao,Mamba:Linear-timesequencemodelingwithselective state spaces, arXiv preprint arXiv:2312.00752 (2023)

Pith/arXiv arXiv 2023

[8] [8]

Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. Xie, A convnet for the 2020s, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976–11986

2022

[9] [9]

Presented at the Stanford Artificial Intelligence Project

I.Sobel,G.Feldman,Anisotropic3x3imagegradientoperator,1968. Presented at the Stanford Artificial Intelligence Project

1968

[10] [10]

S. S. M. Salehi, D. Erdogmus, A. Gholipour, Tversky loss function for image segmentation using 3d fully convolutional deep networks, in:InternationalWorkshoponMachineLearninginMedicalImaging, Springer, 2017, pp. 379–387. doi:10.1007/978-3-319-67389-9_44

work page doi:10.1007/978-3-319-67389-9_44 2017

[11] [11]

Russakovsky, J

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei- Fei, ImageNet large scale visual recognition challenge, International Journal of Computer Vision 115 (2015) 211–252

2015

[12] [12]

Loshchilov, F

I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: International Conference on Learning Representations, 2019

2019

[13] [13]

Micikevicius, S

P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Gar- cia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, H. Wu, Mixed precision training, in: International Conference on Learning Representations, 2018

2018

[14] [14]

Gatos, K

B. Gatos, K. Ntirogiannis, I. Pratikakis, ICDAR 2009 document imagebinarizationcontest(DIBCO2009),in:Proceedingsofthe10th International Conference on Document Analysis and Recognition, 2009, pp. 1375–1382. doi:10.1109/ICDAR.2009.246

work page doi:10.1109/icdar.2009.246 2009

[15] [15]

I.Pratikakis,B.Gatos,K.Ntirogiannis,H-DIBCO2010:Handwritten documentimagebinarizationcompetition,in:Proceedingsofthe12th International Conference on Frontiers in Handwriting Recognition, 2010, pp. 727–732. doi:10.1109/ICFHR.2010.118

work page doi:10.1109/icfhr.2010.118 2010

[16] [16]

Pratikakis, B

I. Pratikakis, B. Gatos, K. Ntirogiannis, Icdar 2011 document image binarization contest, in: Proceedings of the International Confer- ence on Document Analysis and Recognition, 2011, pp. 1506–1510. doi:10.1109/ICDAR.2011.299

work page doi:10.1109/icdar.2011.299 2011

[17] [17]

Pratikakis, B

I. Pratikakis, B. Gatos, K. Ntirogiannis, Icfhr 2012 competition on handwritten document image binarization, in: Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2012, pp. 817–822. doi:10.1109/ICFHR.2012.216

work page doi:10.1109/icfhr.2012.216 2012

[18] [18]

Pratikakis, B

I. Pratikakis, B. Gatos, K. Ntirogiannis, Icdar 2013 document image binarization contest, in: Proceedings of the International Confer- ence on Document Analysis and Recognition, 2013, pp. 1471–1476. doi:10.1109/ICDAR.2013.219

work page doi:10.1109/icdar.2013.219 2013

[19] [19]

Ntirogiannis, B

K. Ntirogiannis, B. Gatos, I. Pratikakis, Icfhr 2014 competition on handwritten document image binarization, in: Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2014, pp. 809–813. doi:10.1109/ICFHR.2014.141

work page doi:10.1109/icfhr.2014.141 2014

[20] [20]

I.Pratikakis,K.Zagoris,G.Barlas,B.Gatos, Icfhr2016handwritten document image binarization contest, in: Proceedings of the Interna- tionalConferenceonFrontiersinHandwritingRecognition,2016,pp. 619–623. doi:10.1109/ICFHR.2016.0118

work page doi:10.1109/icfhr.2016.0118 2016

[21] [21]

1395–1403

I.Pratikakis,K.Zagoris,G.Barlas,B.Gatos, Icdar2017competition ondocumentimagebinarization, in:ProceedingsoftheInternational Conference on Document Analysis and Recognition Workshops, 2017, pp. 1395–1403. doi:10.1109/ICDAR.2017.228

work page doi:10.1109/icdar.2017.228 2017

[22] [22]

I.Pratikakis,K.Zagoris,P.Kaddas,B.Gatos, Icfhr2018competition on handwritten document image binarization, in: Proceedings of the International Conference on Frontiers in Handwriting Recognition, 2018, pp. 489–493. doi:10.1109/ICFHR-2018.2018.00091

work page doi:10.1109/icfhr-2018.2018.00091 2018

[23] [23]

Marthot-Santaniello, Icdar 2019 competition on document image binarization, in: Proceedings of the International Conference on Document Analysis and Recognition, 2019, pp

I.Pratikakis,K.Zagoris,X.Karagiannis,L.Tsochatzidis,T.Mondal, I. Marthot-Santaniello, Icdar 2019 competition on document image binarization, in: Proceedings of the International Conference on Document Analysis and Recognition, 2019, pp. 1547–1556. doi:10. 1109/ICDAR.2019.00249

arXiv 2019

[24] [24]

H.Lu,A.C.Kot,Y.Q.Shi,Distance-reciprocaldistortionmeasurefor binary document images, IEEE Signal Processing Letters 11 (2004) 228–231

2004

[25] [25]

B. Su, S. Lu, C. L. Tan, Robust document image binarization technique for degraded document images, IEEE Transactions on Image Processing 22 (2013) 1408–1417

2013

[26] [26]

S.He,L.Schomaker, Documentenhancementandbinarizationusing iterative deep learning, Pattern Recognition 91 (2019) 379–390

2019

[27] [27]

R.De,A.Chakraborty,R.Sarkar,Documentimagebinarizationusing dual discriminator generative adversarial networks, IEEE Signal Processing Letters 27 (2020) 1090–1094

2020

[28] [28]

J. Zhao, C. Shi, F. Jia, Y. Wang, B. Xiao, Document image binariza- tion with cascaded generators of conditional generative adversarial networks, Pattern Recognition 96 (2019) 106968

2019

[29] [29]

Biswas, S

R. Biswas, S. K. Roy, N. Wang, U. Pal, G.-B. Huang, Docbinformer: A two-level transformer network for effective document image bina- rization, arXiv preprint arXiv:2312.03568 (2023)

arXiv 2023

[30] [30]

Yang, et al., A novel degraded document binarization model through vision transformer, Information Fusion 93 (2023) 159–173

M. Yang, et al., A novel degraded document binarization model through vision transformer, Information Fusion 93 (2023) 159–173

2023

[31] [31]

Biswas, S

R. Biswas, S. Sarkhel, S. K. Roy, U. Pal, TransDocUNet: A transformer-based UNet architecture for degraded document image binarization, in: Proceedings of the 14th Indian Conference on ComputerVision,GraphicsandImageProcessing,2023.doi:10.1145/ 3627631.3627639

arXiv 2023

[32] [32]

Z. Yang, Z. Zhang, N. Wang, T. Chen, X. Liu, Docdiff: Docu- ment enhancement via residual diffusion models, arXiv preprint arXiv:2305.03892 (2023)

arXiv 2023

[33] [33]

R.-Y. Ju, K. Wong, Y. Jin, J.-S. Chiang, Mfe-gan: Efficient gan- based framework for document image enhancement and binarization with multi-scale feature extraction, arXiv preprint arXiv:2512.14114 (2025). S.W. Chan:Preprint submitted to ElsevierPage 7 of 7

arXiv 2025