Recognition: unknown
DocRevive: A Unified Pipeline for Document Text Restoration
Pith reviewed 2026-05-10 16:54 UTC · model grok-4.3
The pith
A unified pipeline restores damaged document text by combining OCR, occlusion detection, inpainting and diffusion models while preserving visual style.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a unified pipeline combining state-of-the-art OCR, advanced image analysis, masked language modeling, and diffusion-based models can restore and reconstruct text in damaged documents while preserving visual integrity, as shown on a new synthetic benchmark dataset of 30,078 degraded document images and measured with the Unified Context Similarity Metric that incorporates edit, semantic, length and contextual predictability scores.
What carries the argument
The DocRevive unified pipeline that sequences OCR text detection and recognition, occlusion detection for identifying degradations, inpainting for semantic reconstruction, and diffusion-based reintegration to match original font, size and alignment.
If this is right
- Restored documents improve accuracy on subsequent document understanding tasks.
- The synthetic dataset of 30,078 images sets a benchmark for testing other restoration methods.
- The Unified Context Similarity Metric supplies a combined score for edit similarity, semantic fit, length and contextual predictability.
- Archival research and digital preservation gain an automated route to recover text from damaged sources.
Where Pith is reading between the lines
- The same cascade of detection and diffusion steps could be tested on photographs of faded signs or labels to see if it generalizes beyond scanned pages.
- Releasing the dataset invites direct comparisons with other inpainting or language-model approaches on identical inputs.
- Measuring performance drop when the system moves from synthetic to genuine aged documents would test how realistic the training degradations are.
Load-bearing premise
The synthetic dataset of degraded document images accurately simulates diverse real-world degradation scenarios and the combined models produce semantically coherent and visually matching reconstructions.
What would settle it
Running the pipeline on real damaged documents that have known original text and observing large drops in the Unified Context Similarity Metric score or clear mismatches in meaning and appearance would show the approach does not deliver the claimed restorations.
Figures
read the original abstract
In Document Understanding, the challenge of reconstructing damaged, occluded, or incomplete text remains a critical yet unexplored problem. Subsequent document understanding tasks can benefit from a document reconstruction process. In response, this paper presents a novel unified pipeline combining state-of-the-art Optical Character Recognition (OCR), advanced image analysis, masked language modeling, and diffusion-based models to restore and reconstruct text while preserving visual integrity. We create a synthetic dataset of 30{,}078 degraded document images that simulates diverse document degradation scenarios, setting a benchmark for restoration tasks. Our pipeline detects and recognizes text, identifies degradation with an occlusion detector, and uses an inpainting model for semantically coherent reconstruction. A diffusion-based module seamlessly reintegrates text, matching font, size, and alignment. To evaluate restoration quality, we propose a Unified Context Similarity Metric (UCSM), incorporating edit, semantic, and length similarities with a contextual predictability measure that penalizes deviations when the correct text is contextually obvious. Our work advances document restoration, benefiting archival research and digital preservation while setting a new standard for text reconstruction. The OPRB dataset and code are available at \href{https://huggingface.co/datasets/kpurkayastha/OPRB}{Hugging Face} and \href{https://github.com/kunalpurkayastha/DocRevive}{Github} respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes DocRevive, a unified pipeline for restoring text in damaged documents by combining OCR for text detection and recognition, an occlusion detector for identifying degradations, masked language modeling for semantic coherence, and diffusion-based models for visual reintegration of text while matching font and alignment. It introduces the synthetic OPRB dataset consisting of 30,078 degraded document images to benchmark restoration tasks and defines the Unified Context Similarity Metric (UCSM) that integrates edit similarity, semantic similarity, length similarity, and a contextual predictability measure.
Significance. If the pipeline's effectiveness is demonstrated through rigorous evaluation, this work could have significant impact on document understanding and digital preservation by offering a practical, open-source solution for text restoration in archival materials. The public release of the OPRB dataset and code on Hugging Face and GitHub strengthens the contribution by enabling reproducibility and further research.
major comments (3)
- Abstract and Experiments section: The manuscript describes the pipeline components and the OPRB dataset but reports no quantitative results, ablation studies, baseline comparisons, or error analysis, leaving the central claim that the system produces semantically coherent and visually faithful reconstructions without empirical support.
- Dataset section: All described training, benchmarking, and UCSM evaluation occurs exclusively on the synthetic OPRB set of 30,078 images; the manuscript provides no hold-out real-world test sets, cross-validation on authentic archival documents, or human preference studies, which is load-bearing for the claim that the pipeline generalizes beyond simulated degradations.
- UCSM definition: The metric combines edit, semantic, and length similarities with a contextual predictability term, yet the component weights are listed as free parameters with no sensitivity analysis or justification provided, undermining the assertion that UCSM reliably penalizes contextually obvious deviations.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: Abstract and Experiments section: The manuscript describes the pipeline components and the OPRB dataset but reports no quantitative results, ablation studies, baseline comparisons, or error analysis, leaving the central claim that the system produces semantically coherent and visually faithful reconstructions without empirical support.
Authors: We agree that the current manuscript lacks quantitative evaluations to substantiate the pipeline's performance. The initial submission prioritizes the description of the unified pipeline, the introduction of the OPRB dataset, and the definition of UCSM. In the revised manuscript we will add a dedicated Experiments section that reports quantitative results on the OPRB dataset, ablation studies isolating each pipeline component (OCR, occlusion detection, masked language modeling, and diffusion inpainting), comparisons against relevant baselines such as standard diffusion inpainting and rule-based restoration methods, and a detailed error analysis. These additions will provide direct empirical support for claims of semantic coherence and visual fidelity. revision: yes
-
Referee: Dataset section: All described training, benchmarking, and UCSM evaluation occurs exclusively on the synthetic OPRB set of 30,078 images; the manuscript provides no hold-out real-world test sets, cross-validation on authentic archival documents, or human preference studies, which is load-bearing for the claim that the pipeline generalizes beyond simulated degradations.
Authors: The synthetic OPRB dataset was constructed to enable controlled, reproducible simulation of diverse degradation types with perfect ground-truth text, which is difficult to obtain at scale from real archives. We acknowledge that this design alone does not fully demonstrate generalization. In the revision we will add a hold-out test set of authentic archival documents, perform k-fold cross-validation on the synthetic data, and include human preference studies in which participants rate restored versus original text for readability and visual plausibility. These additions will directly address the generalization concern. revision: yes
-
Referee: UCSM definition: The metric combines edit, semantic, and length similarities with a contextual predictability term, yet the component weights are listed as free parameters with no sensitivity analysis or justification provided, undermining the assertion that UCSM reliably penalizes contextually obvious deviations.
Authors: The weights were chosen to give balanced emphasis to lexical, semantic, and length fidelity while amplifying the contextual predictability term for cases where deviations are highly predictable from surrounding text. We will expand the UCSM section with an explicit justification of the weight selection, supported by preliminary experiments, and will add a sensitivity analysis that varies each weight over a reasonable range and reports the resulting changes in ranking and correlation with human judgments. This will demonstrate the metric's robustness. revision: yes
Circularity Check
No circularity: applied pipeline with external models and synthetic data
full rationale
The manuscript describes an engineering pipeline that chains pre-trained OCR, an occlusion detector, masked language modeling, and diffusion-based inpainting to restore degraded documents. It introduces a new synthetic dataset (OPRB, 30,078 images) and a composite UCSM evaluation metric built from edit distance, semantic similarity, length, and contextual predictability. No equations, derivations, or self-citations appear in the text that would reduce any claimed result to a fitted parameter or prior self-work by construction. The work is self-contained as a system description whose performance claims rest on external pre-trained components and the authors' own synthetic benchmark rather than any internal loop.
Axiom & Free-Parameter Ledger
free parameters (1)
- UCSM component weights
axioms (1)
- domain assumption Synthetic degradations (blur, occlusion, etc.) sufficiently represent real document damage distributions
Reference graph
Works this paper leans on
-
[1]
Restoring and attributing ancient texts using deep neural networks.Nature, 603(7900):280–283, 2022
Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipana- giotou, Ion Androutsopoulos, Jonathan Prag, and Nando de Freitas. Restoring and attributing ancient texts using deep neural networks.Nature, 603(7900):280–283, 2022. 3
2022
-
[2]
TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering
Ayan Banerjee, Josep Llad ˜A`gs, Umapada Pal, and Anjan Dutta. Talediffusion: Multi-character story generation with dialogue rendering.arXiv preprint arXiv:2509.04123, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Ayan Banerjee, Fernando Vilari ˜no, and Josep Llad´os. Craft- graffiti: Exploring human identity with custom graffiti art via facial-preserving diffusion models.arXiv preprint arXiv:2508.20640, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Craftsvg: Multi-object text-to-svg synthesis via layout guided diffusion
Ayan Banerjee, Nityanand Mathur, Josep Llados, Umapada Pal, and Anjan Dutta. Craftsvg: Multi-object text-to-svg synthesis via layout guided diffusion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2564–2574, 2026. 3
2026
-
[5]
Scene text recognition with permuted autoregressive sequence models
Darwin Bautista and Rowel Atienza. Scene text recognition with permuted autoregressive sequence models. InEuropean conference on computer vision, pages 178–196. Springer,
-
[6]
Mending fractured texts
Jens Bjerring-Hansen, Ross Deans Kristensen-McLachlan, Philip Diderichsen, and Dorte Haltrup Hansen. Mending fractured texts. a heuristic procedure for correcting ocr data. InCEUR Workshop Proceedings, pages 177–186. ceur work- shop proceedings, 2022. 2
2022
-
[7]
Selecting a restoration technique to minimize ocr error.IEEE Transactions on Neural Networks, 14(3):478–490, 2003
Mike Cannon, Mike Fugate, Don R Hush, and Clint Scovel. Selecting a restoration technique to minimize ocr error.IEEE Transactions on Neural Networks, 14(3):478–490, 2003. 1
2003
-
[8]
Linknet: Ex- ploiting encoder representations for efficient semantic seg- mentation
Abhishek Chaurasia and Eugenio Culurciello. Linknet: Ex- ploiting encoder representations for efficient semantic seg- mentation. In2017 IEEE visual communications and image processing (VCIP), pages 1–4. IEEE, 2017. 8
2017
-
[9]
Textdiffuser: Diffusion models as text painters
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei. Textdiffuser: Diffusion models as text painters. InAdvances in Neural Information Processing Sys- tems (NeurIPS) 36, 2023. 3
2023
-
[10]
Simple baselines for image restoration
Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InEuropean confer- ence on computer vision, pages 17–33. Springer, 2022. 6
2022
-
[11]
Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, and Tong Lu. Fast: Faster arbitrarily-shaped text detector with minimalist kernel representation.arXiv preprint arXiv:2111.02394, 2021. 4, 8
-
[12]
arXiv preprint arXiv:2009.09941 , year=
Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, et al. Pp-ocr: A practical ultra lightweight ocr system. arXiv preprint arXiv:2009.09941, 2020. 2
-
[13]
Context per- ception parallel decoder for scene text recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence,
Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, and Yu-Gang Jiang. Context per- ception parallel decoder for scene text recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence,
-
[14]
Restoration of fragmentary babylonian texts us- ing recurrent neural networks.Proceedings of the National Academy of Sciences (PNAS), 117(37):22743–22751, 2020
Ethan Fetaya, Yonatan Lifshitz, Elad Aaron, and Shai Gordin. Restoration of fragmentary babylonian texts us- ing recurrent neural networks.Proceedings of the National Academy of Sciences (PNAS), 117(37):22743–22751, 2020. 3
2020
-
[15]
Unsupervised post- ocr correction for noisy text in engineering documents
Mathieu Franc ¸ois and V´eronique Eglin. Unsupervised post- ocr correction for noisy text in engineering documents. In Proceedings of the 17th International Conference on Docu- ment Analysis and Recognition (ICDAR), 2023. 3
2023
-
[16]
Shuhao Guan and Derek Greene. Advancing post-ocr correc- tion: A comparative study of synthetic data.arXiv preprint arXiv:2408.02253, 2024. 3
-
[17]
Self-supervised im- plicit glyph attention for text recognition
Tongkun Guan, Chaochen Gu, Jingzheng Tu, Xue Yang, Qi Feng, Yudi Zhao, and Wei Shen. Self-supervised im- plicit glyph attention for text recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15285–15294, 2023. 3
2023
-
[18]
YOLOv11: An Overview of the Key Architectural Enhancements
Rahima Khanam and Muhammad Hussain. Yolov11: An overview of the key architectural enhancements.arXiv preprint arXiv:2410.17725, 2024. 7
work page internal anchor Pith review arXiv 2024
-
[19]
Docbank: A benchmark dataset for document layout analysis
Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. Docbank: A benchmark dataset for document layout analysis. InProceedings of the 28th In- ternational Conference on Computational Linguistics, pages 949–960, 2020. 2, 5
2020
-
[20]
TrOCR: Transformer-based optical character recogni- tion with pre-trained models
Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. TrOCR: Transformer-based optical character recogni- tion with pre-trained models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 13094–13102,
-
[21]
Real-time scene text detection with differentiable bina- rization
Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. Real-time scene text detection with differentiable bina- rization. InProceedings of the AAAI conference on artificial intelligence, pages 11474–11481, 2020. 8
2020
-
[22]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A ro- bustly optimized BERT pretraining approach.arXiv preprint arXiv:1907.11692, 2019. 2, 5, 7
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[23]
A new context-based method for restoring occluded text in natural scene images
Ayush Mittal, Palaiahnakote Shivakumara, Umapada Pal, Tong Lu, Michael Blumenstein, and Daniel Lopresti. A new context-based method for restoring occluded text in natural scene images. InDocument Analysis Systems: 14th IAPR International Workshop, DAS 2020, Wuhan, China, July 26– 29, 2020, Proceedings 14, pages 466–480. Springer, 2020. 2
2020
-
[24]
S. Mori, C. Y . Suen, and K. Yamamoto. Historical review of OCR research and development.Proceedings of the IEEE, 80(7):1029–1058, 1992. 3
1992
-
[25]
Robust ocr of degraded documents
Premkumar Natarajan, Issam Bazzi, Zhidong Lu, John Makhoul, and Richard Scwhartz. Robust ocr of degraded documents. InProceedings of the Fifth International Con- ference on Document Analysis and Recognition. ICDAR’99 (Cat. No. PR00318), pages 357–361. IEEE, 1999. 1
1999
-
[26]
N. Otsu. A threshold selection method from gray-level his- tograms.IEEE Transactions on Systems, Man, and Cyber- netics, 9(1):62–66, 1979. 2
1979
-
[27]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318,
-
[28]
Layereddoc: Domain adaptive document restoration with a layer separa- tion approach
Maria Pilligua, Nil Biescas, Javier Vazquez-Corral, Josep Llad´os, Ernest Valveny, and Sanket Biswas. Layereddoc: Domain adaptive document restoration with a layer separa- tion approach. InInternational Conference on Document Analysis and Recognition, pages 27–39. Springer, 2024. 2
2024
-
[29]
Datr: Domain agnostic text recognizer
Kunal Purkayastha, Shashwat Sarkar, Shivakumara Palaiah- nakote, Umapada Pal, and Palash Ghosal. Datr: Domain agnostic text recognizer. InInternational Conference on Pat- tern Recognition, pages 220–235. Springer, 2025. 8
2025
-
[30]
https://doi.org/10.48550/ARXIV.2509.25164
R Sapkota, RH Cheppally, A Sharda, and M Karkee. Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection. arxiv 2025.arXiv preprint arXiv:2509.25164. 7
-
[31]
Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recog- nition and its application to scene text recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11):2298–2304, 2017. 3
2017
-
[32]
Type-r: Au- tomatically retouching typos for text-to-image generation
Wataru Shimoda, Naoto Inoue, Daichi Haraguchi, Hayato Mitani, Seiichi Uchida, and Kota Yamaguchi. Type-r: Au- tomatically retouching typos for text-to-image generation. arXiv preprint arXiv:2411.18159, 2024. 3
-
[33]
De-GAN: a conditional generative adversar- ial network for document enhancement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1180– 1191, 2020
Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Forn ´es, Josep Llad ´os, and Umapada Pal. De-GAN: a conditional generative adversar- ial network for document enhancement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1180– 1191, 2020. 2
2020
-
[34]
Docentr: An end-to-end document image enhancement transformer
Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Forn ´es, Josep Llad ´os, and Umapada Pal. Docentr: An end-to-end document image enhancement transformer. In2022 26th International Con- ference on Pattern Recognition (ICPR), pages 1699–1705. IEEE, 2022. 2
2022
-
[35]
Text- DIAE: A self-supervised degradation invariant autoencoder for text recognition and document enhancement
Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Forn ´es, Yousri Kessentini, Josep Llad´os, Lluis G ´omez, and Dimosthenis Karatzas. Text- DIAE: A self-supervised degradation invariant autoencoder for text recognition and document enhancement. InProceed- ings of the AAAI Conference on Artificial Intelligence, 2023. 2
2023
-
[36]
B. Su, S. Lu, and C. L. Tan. Robust document image bi- narization technique for degraded document images.IEEE Transactions on Image Processing, 22(4):1408–1417, 2013. 2
2013
-
[37]
Unifying vision, text, and layout for universal doc- ument processing
Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, and Mohit Bansal. Unifying vision, text, and layout for universal doc- ument processing. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 19254–19264, 2023. 3
2023
-
[38]
Qwen Team. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Leverag- ing LLMs for post-ocr correction of historical newspapers
Alan Thomas, Robert Gaizauskas, and Haiping Lu. Leverag- ing LLMs for post-ocr correction of historical newspapers. In Proceedings of the LT4HALA Workshop at LREC-COLING, pages 116–121, 2024. 3
2024
-
[40]
Yolov10: Real-time end-to-end object detection.arXiv preprint arXiv:2405.14458, 2024
Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jun- gong Han, and Guiguang Ding. Yolov10: Real-time end- to-end object detection.arXiv preprint arXiv:2405.14458,
-
[41]
Yolov9: Learning what you want to learn us- ing programmable gradient information
Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn us- ing programmable gradient information.arXiv preprint arXiv:2402.13616, 2024. 2, 4, 7
-
[42]
Symmetrical linguis- tic feature distillation with clip for scene text recognition
Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Bo- qiang Zhang, and Yongdong Zhang. Symmetrical linguis- tic feature distillation with clip for scene text recognition. InProceedings of the 31st ACM international conference on multimedia, pages 509–518, 2023. 8
2023
-
[43]
Ote: exploring accurate scene text recognition us- ing one token
Jianjun Xu, Yuxin Wang, Hongtao Xie, and Yongdong Zhang. Ote: exploring accurate scene text recognition us- ing one token. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28327– 28336, 2024. 1
2024
-
[44]
DocDiff: Document enhancement via residual diffu- sion models
Zongyuan Yang, Baolin Liu, Yongping Xiong, Lan Yi, Guibin Wu, Xiaojun Tang, Ziqi Liu, Junjie Zhou, and Xing Zhang. DocDiff: Document enhancement via residual diffu- sion models. InProceedings of the 31st ACM International Conference on Multimedia (ACM MM), pages 2795–2806,
-
[45]
Docdiff: Document enhancement via residual diffu- sion models
Zongyuan Yang, Baolin Liu, Yongping Xxiong, Lan Yi, Guibin Wu, Xiaojun Tang, Ziqi Liu, Junjie Zhou, and Xing Zhang. Docdiff: Document enhancement via residual diffu- sion models. InProceedings of the 31st ACM international conference on multimedia, pages 2795–2806, 2023. 6
2023
-
[46]
What is yolov8: an in-depth exploration of the internal features of the next-generation object detector (2024).Accessed: Sep, 10, 2025
Muhammad Yaseen. What is yolov8: an in-depth exploration of the internal features of the next-generation object detector (2024).Accessed: Sep, 10, 2025. 7
2024
-
[47]
DocReal: Robust document dewarping of real-life images via attention-enhanced control point prediction
Fangchen Yu, Yina Xie, Lei Wu, Yafei Wen, Guozhi Wang, Shuai Ren, Xiaoxin Chen, Jianfeng Mao, and Wenye Li. DocReal: Robust document dewarping of real-life images via attention-enhanced control point prediction. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 665–674, 2024. 2
2024
-
[48]
A normalized levenshtein distance metric.IEEE transactions on pattern analysis and machine intelligence, 29(6):1091–1095, 2007
Li Yujian and Liu Bo. A normalized levenshtein distance metric.IEEE transactions on pattern analysis and machine intelligence, 29(6):1091–1095, 2007. 2
2007
-
[49]
Textctrl: Diffusion-based scene text editing with prior guidance control.Advances in Neural Information Pro- cessing Systems, 37:138569–138594, 2025
Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, and Yu Zhou. Textctrl: Diffusion-based scene text editing with prior guidance control.Advances in Neural Information Pro- cessing Systems, 37:138569–138594, 2025. 3, 5
2025
-
[50]
Linguistic more: Taking a further step toward efficient and accurate scene text recognition
Boqiang Zhang, Hongtao Xie, Yuxin Wang, Jianjun Xu, and Yongdong Zhang. Linguistic more: Taking a further step toward efficient and accurate scene text recognition. InPro- ceedings of the 32nd International Joint Conference on Arti- ficial Intelligence (IJCAI), pages 1704–1712, 2023. 3
2023
-
[51]
Choose what you need: Disentangled representation learning for scene text recognition removal and editing
Boqiang Zhang, Hongtao Xie, Zuan Gao, and Yuxin Wang. Choose what you need: Disentangled representation learning for scene text recognition removal and editing. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 28358–28368, 2024. 1
2024
-
[52]
DocRes: A generalist model toward uni- fying document image restoration tasks
Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, and Lianwen Jin. DocRes: A generalist model toward uni- fying document image restoration tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2
2024
-
[53]
Document image shadow removal guided by color-aware background
Ling Zhang, Yinxiao He, Qing Zhang, Zheng Liu, Xiao- long Zhang, and Chunxia Xiao. Document image shadow removal guided by color-aware background. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 1818–1827, 2023. 3
2023
-
[54]
A review of document image en- hancement based on document degradation problem.Ap- plied Sciences, 13(13):7855, 2023
Yanxi Zhou, Shikai Zuo, Zhengxian Yang, Jinlong He, Jian- wen Shi, and Rui Zhang. A review of document image en- hancement based on document degradation problem.Ap- plied Sciences, 13(13):7855, 2023. 1
2023
-
[55]
Text image inpainting via global structure-guided diffusion models
Shipeng Zhu, Pengfei Fang, Chenjie Zhu, Zuoyan Zhao, Qiang Xu, and Hui Xue. Text image inpainting via global structure-guided diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7775– 7783, 2024. 3, 5 DocRevive: A Unified Pipeline for Document Text Restoration Supplementary Material
2024
-
[56]
In the current generator, we chooseNunique source pages per class-level
Dataset Construction Details This supplementary section provides the full construc- tion details of the Occluded Pages Restoration Benchmark (OPRB). In the current generator, we chooseNunique source pages per class-level. We introduce a novel benchmark dataset called Occluded Pages Restoration Benchmark (OPRB) designed to eval- uate document restoration u...
-
[57]
We evaluate the on three benchmark datasets
Method Details 10.1. Occlusion Detection and Blank Region Ex- traction Occlusion patches are first localized using a fine-tuned YOLOv9c detector [41] trained on the OPRB dataset. The benchmark contains six degradation classes,Black Ink, Burnt,Whitener,Dust,Scribble, andStamp. Opaque classes (Black Ink,Burnt,Whitener) fully obscure the un- derlying text, t...
-
[58]
We evaluate the on three benchmark datasets
Misceleneous Experiments 11.1. Comparison with Prior Document Restora- tion Methods We compare DocRevive against three prior methods on a subset of 498 images from OPRB (83 per occlusion type) DocDiff [45], GSDM (standalone), our pipeline’s inpaint- ing module run in isolation without any text prediction or editing and NAFNet [10], a strong general image ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.