DocRevive: A Unified Pipeline for Document Text Restoration
Pith reviewed 2026-05-25 06:36 UTC · model grok-4.3
The pith
DocRevive combines OCR, masked language modeling, and diffusion models into a pipeline that restores text in damaged documents while preserving visual integrity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A pipeline that sequences state-of-the-art OCR for text detection and recognition, an occlusion detector for identifying degradation, inpainting models for semantically coherent reconstruction, and a diffusion-based module for seamless text reintegration can restore and reconstruct text in degraded documents while preserving visual integrity, as shown on the new OPRB synthetic dataset.
What carries the argument
The DocRevive unified pipeline that sequences OCR, occlusion detection, masked language modeling, inpainting, and diffusion-based text reintegration.
If this is right
- Restored documents improve accuracy on downstream tasks such as information extraction and layout analysis.
- The OPRB dataset functions as a standard benchmark for comparing future text restoration methods.
- The UCSM metric supplies a finer-grained evaluation than edit distance alone by adding contextual predictability.
- Archival and preservation workflows gain an automated step for handling damaged pages.
Where Pith is reading between the lines
- The modular design suggests the pipeline could be inserted into existing document processing systems without major redesign.
- Performance gaps between synthetic and authentic historical documents would highlight the need for more varied training data.
- The same sequence of detection-plus-diffusion steps might apply to restoring scene text in natural photographs.
Load-bearing premise
The synthetic dataset of 30,078 degraded document images accurately simulates diverse real-world document degradation scenarios sufficiently to serve as a reliable benchmark for restoration performance.
What would settle it
If real-world degraded documents produce substantially lower UCSM scores or visibly mismatched text than the synthetic test results, the pipeline's claimed effectiveness would not hold.
Figures
read the original abstract
In Document Understanding, the challenge of reconstructing damaged, occluded, or incomplete text remains a critical yet unexplored problem. Subsequent document understanding tasks can benefit from a document reconstruction process. In response, this paper presents a novel unified pipeline combining state-of-the-art Optical Character Recognition (OCR), advanced image analysis, masked language modeling, and diffusion-based models to restore and reconstruct text while preserving visual integrity. We create a synthetic dataset of 30{,}078 degraded document images that simulates diverse document degradation scenarios, setting a benchmark for restoration tasks. Our pipeline detects and recognizes text, identifies degradation with an occlusion detector, and uses an inpainting model for semantically coherent reconstruction. A diffusion-based module seamlessly reintegrates text, matching font, size, and alignment. To evaluate restoration quality, we propose a Unified Context Similarity Metric (UCSM), incorporating edit, semantic, and length similarities with a contextual predictability measure that penalizes deviations when the correct text is contextually obvious. Our work advances document restoration, benefiting archival research and digital preservation while setting a new standard for text reconstruction. The OPRB dataset and code are available at \href{https://huggingface.co/datasets/kpurkayastha/OPRB}{Hugging Face} and \href{https://github.com/kunalpurkayastha/DocRevive}{Github} respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents DocRevive, a unified pipeline for restoring damaged or occluded text in documents. It combines state-of-the-art OCR, image analysis for occlusion detection, masked language modeling for semantic coherence, and diffusion-based inpainting to reconstruct text while preserving visual properties such as font, size, and alignment. The work introduces the synthetic OPRB dataset of 30,078 degraded document images to benchmark restoration tasks and proposes the Unified Context Similarity Metric (UCSM), which combines edit, semantic, and length similarities with a contextual predictability penalty. The pipeline and dataset are claimed to set a new standard for document text reconstruction, with code and data released publicly.
Significance. If the central claims hold, the work could advance document restoration for archival and digital preservation applications by providing an integrated pipeline and an open benchmark. The public release of the OPRB dataset and code is a clear strength that enables reproducibility and follow-on research. However, the significance is currently limited by the lack of demonstrated fidelity between the synthetic degradations and real-world document conditions.
major comments (2)
- [Abstract / OPRB dataset section] The central claim that the pipeline 'sets a new standard' rests on quantitative results obtained exclusively on the synthetic OPRB dataset (Abstract and dataset description). No validation is reported that the generated degradations match the statistics of real archival documents (e.g., edge histograms, bleed-through distributions, or OCR error rates under non-uniform lighting). Without such evidence the benchmark results remain dataset-specific and do not yet support field-wide conclusions.
- [Evaluation / UCSM definition] The UCSM metric is presented as incorporating 'contextual predictability' to penalize deviations when text is obvious from context, yet no ablation or sensitivity analysis is shown for the weighting of its edit, semantic, length, and predictability components. This makes it difficult to assess whether UCSM provides a robust, non-circular evaluation of restoration quality beyond standard metrics.
minor comments (2)
- [Abstract] The abstract states the dataset size as '30{,}078' with an unusual comma placement; standardize to 30,078 throughout.
- [Pipeline overview] The pipeline diagram and component descriptions would benefit from explicit input/output specifications for each stage (OCR → occlusion detector → MLM → diffusion) to clarify data flow.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract / OPRB dataset section] The central claim that the pipeline 'sets a new standard' rests on quantitative results obtained exclusively on the synthetic OPRB dataset (Abstract and dataset description). No validation is reported that the generated degradations match the statistics of real archival documents (e.g., edge histograms, bleed-through distributions, or OCR error rates under non-uniform lighting). Without such evidence the benchmark results remain dataset-specific and do not yet support field-wide conclusions.
Authors: We acknowledge that the manuscript reports no quantitative statistical validation (such as edge histograms or bleed-through distributions) comparing the synthetic degradations to real archival documents. The OPRB dataset was designed to simulate representative degradation types drawn from the document analysis literature, but this design rationale is not accompanied by direct empirical matching in the current text. In revision we will add a subsection to the dataset description that details the degradation generation process, cites supporting references for the chosen degradation models, and explicitly notes the absence of real-world statistical validation as a limitation. We will also revise the abstract and conclusion to state that the results establish performance on a controlled synthetic benchmark rather than claiming a field-wide standard. revision: partial
-
Referee: [Evaluation / UCSM definition] The UCSM metric is presented as incorporating 'contextual predictability' to penalize deviations when text is obvious from context, yet no ablation or sensitivity analysis is shown for the weighting of its edit, semantic, length, and predictability components. This makes it difficult to assess whether UCSM provides a robust, non-circular evaluation of restoration quality beyond standard metrics.
Authors: The observation is correct: the manuscript defines UCSM as a linear combination of edit, semantic, length, and contextual predictability terms but provides no ablation or sensitivity study on the relative weights. In the revised manuscript we will add an ablation subsection to the evaluation that systematically varies each component weight, reports the resulting metric values on the OPRB test set, and compares against standard metrics (CER, semantic similarity) to demonstrate that the combined score is not circular and offers additional diagnostic value. revision: yes
Circularity Check
No circularity in pipeline or metric proposal
full rationale
The manuscript describes an engineering pipeline (OCR + occlusion detection + MLM + diffusion inpainting) and introduces UCSM as a composite metric without any equations, fitted parameters, or predictions that reduce to their own inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems, no ansatzes are smuggled, and the synthetic OPRB dataset is presented as a created benchmark rather than a self-referential evaluation. The central claims rest on external model components and a new metric definition that does not tautologically reproduce its inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Synthetic data of 30,078 images can serve as a valid benchmark for real document restoration tasks
Reference graph
Works this paper leans on
-
[1]
Restoring and attributing ancient texts using deep neural networks.Nature, 603(7900):280–283, 2022
Yannis Assael, Thea Sommerschield, Brendan Shillingford, Mahyar Bordbar, John Pavlopoulos, Marita Chatzipana- giotou, Ion Androutsopoulos, Jonathan Prag, and Nando de Freitas. Restoring and attributing ancient texts using deep neural networks.Nature, 603(7900):280–283, 2022. 3
work page 2022
-
[2]
TaleDiffusion: Multi-Character Story Generation with Dialogue Rendering
Ayan Banerjee, Josep Llad ˜A`gs, Umapada Pal, and Anjan Dutta. Talediffusion: Multi-character story generation with dialogue rendering.arXiv preprint arXiv:2509.04123, 2025. 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[3]
Ayan Banerjee, Fernando Vilari ˜no, and Josep Llad´os. Craft- graffiti: Exploring human identity with custom graffiti art via facial-preserving diffusion models.arXiv preprint arXiv:2508.20640, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
Craftsvg: Multi-object text-to-svg synthesis via layout guided diffusion
Ayan Banerjee, Nityanand Mathur, Josep Llados, Umapada Pal, and Anjan Dutta. Craftsvg: Multi-object text-to-svg synthesis via layout guided diffusion. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2564–2574, 2026. 3
work page 2026
-
[5]
Scene text recognition with permuted autoregressive sequence models
Darwin Bautista and Rowel Atienza. Scene text recognition with permuted autoregressive sequence models. InEuropean conference on computer vision, pages 178–196. Springer,
-
[6]
Jens Bjerring-Hansen, Ross Deans Kristensen-McLachlan, Philip Diderichsen, and Dorte Haltrup Hansen. Mending fractured texts. a heuristic procedure for correcting ocr data. InCEUR Workshop Proceedings, pages 177–186. ceur work- shop proceedings, 2022. 2
work page 2022
-
[7]
Mike Cannon, Mike Fugate, Don R Hush, and Clint Scovel. Selecting a restoration technique to minimize ocr error.IEEE Transactions on Neural Networks, 14(3):478–490, 2003. 1
work page 2003
-
[8]
Linknet: Ex- ploiting encoder representations for efficient semantic seg- mentation
Abhishek Chaurasia and Eugenio Culurciello. Linknet: Ex- ploiting encoder representations for efficient semantic seg- mentation. In2017 IEEE visual communications and image processing (VCIP), pages 1–4. IEEE, 2017. 8
work page 2017
-
[9]
Textdiffuser: Diffusion models as text painters
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei. Textdiffuser: Diffusion models as text painters. InAdvances in Neural Information Processing Sys- tems (NeurIPS) 36, 2023. 3
work page 2023
-
[10]
Simple baselines for image restoration
Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InEuropean confer- ence on computer vision, pages 17–33. Springer, 2022. 6
work page 2022
-
[11]
Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, and Tong Lu. Fast: Faster arbitrarily-shaped text detector with minimalist kernel representation.arXiv preprint arXiv:2111.02394, 2021. 4, 8
-
[12]
Pp-ocr: A practical ultra lightweight ocr system
Yuning Du, Chenxia Li, Ruoyu Guo, Xiaoting Yin, Weiwei Liu, Jun Zhou, Yifan Bai, Zilin Yu, Yehua Yang, Qingqing Dang, et al. Pp-ocr: A practical ultra lightweight ocr system. arXiv preprint arXiv:2009.09941, 2020. 2
-
[13]
Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, and Yu-Gang Jiang. Context per- ception parallel decoder for scene text recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence,
-
[14]
Ethan Fetaya, Yonatan Lifshitz, Elad Aaron, and Shai Gordin. Restoration of fragmentary babylonian texts us- ing recurrent neural networks.Proceedings of the National Academy of Sciences (PNAS), 117(37):22743–22751, 2020. 3
work page 2020
-
[15]
Unsupervised post- ocr correction for noisy text in engineering documents
Mathieu Franc ¸ois and V´eronique Eglin. Unsupervised post- ocr correction for noisy text in engineering documents. In Proceedings of the 17th International Conference on Docu- ment Analysis and Recognition (ICDAR), 2023. 3
work page 2023
-
[16]
Shuhao Guan and Derek Greene. Advancing post-ocr correc- tion: A comparative study of synthetic data.arXiv preprint arXiv:2408.02253, 2024. 3
-
[17]
Self-supervised im- plicit glyph attention for text recognition
Tongkun Guan, Chaochen Gu, Jingzheng Tu, Xue Yang, Qi Feng, Yudi Zhao, and Wei Shen. Self-supervised im- plicit glyph attention for text recognition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15285–15294, 2023. 3
work page 2023
-
[18]
YOLOv11: An Overview of the Key Architectural Enhancements
Rahima Khanam and Muhammad Hussain. Yolov11: An overview of the key architectural enhancements.arXiv preprint arXiv:2410.17725, 2024. 7
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Docbank: A benchmark dataset for document layout analysis
Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, and Ming Zhou. Docbank: A benchmark dataset for document layout analysis. InProceedings of the 28th In- ternational Conference on Computational Linguistics, pages 949–960, 2020. 2, 5
work page 2020
-
[20]
TrOCR: Transformer-based optical character recogni- tion with pre-trained models
Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. TrOCR: Transformer-based optical character recogni- tion with pre-trained models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 13094–13102,
-
[21]
Real-time scene text detection with differentiable bina- rization
Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. Real-time scene text detection with differentiable bina- rization. InProceedings of the AAAI conference on artificial intelligence, pages 11474–11481, 2020. 8
work page 2020
-
[22]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A ro- bustly optimized BERT pretraining approach.arXiv preprint arXiv:1907.11692, 2019. 2, 5, 7
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[23]
A new context-based method for restoring occluded text in natural scene images
Ayush Mittal, Palaiahnakote Shivakumara, Umapada Pal, Tong Lu, Michael Blumenstein, and Daniel Lopresti. A new context-based method for restoring occluded text in natural scene images. InDocument Analysis Systems: 14th IAPR International Workshop, DAS 2020, Wuhan, China, July 26– 29, 2020, Proceedings 14, pages 466–480. Springer, 2020. 2
work page 2020
-
[24]
S. Mori, C. Y . Suen, and K. Yamamoto. Historical review of OCR research and development.Proceedings of the IEEE, 80(7):1029–1058, 1992. 3
work page 1992
-
[25]
Robust ocr of degraded documents
Premkumar Natarajan, Issam Bazzi, Zhidong Lu, John Makhoul, and Richard Scwhartz. Robust ocr of degraded documents. InProceedings of the Fifth International Con- ference on Document Analysis and Recognition. ICDAR’99 (Cat. No. PR00318), pages 357–361. IEEE, 1999. 1
work page 1999
-
[26]
N. Otsu. A threshold selection method from gray-level his- tograms.IEEE Transactions on Systems, Man, and Cyber- netics, 9(1):62–66, 1979. 2
work page 1979
-
[27]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318,
-
[28]
Layereddoc: Domain adaptive document restoration with a layer separa- tion approach
Maria Pilligua, Nil Biescas, Javier Vazquez-Corral, Josep Llad´os, Ernest Valveny, and Sanket Biswas. Layereddoc: Domain adaptive document restoration with a layer separa- tion approach. InInternational Conference on Document Analysis and Recognition, pages 27–39. Springer, 2024. 2
work page 2024
-
[29]
Datr: Domain agnostic text recognizer
Kunal Purkayastha, Shashwat Sarkar, Shivakumara Palaiah- nakote, Umapada Pal, and Palash Ghosal. Datr: Domain agnostic text recognizer. InInternational Conference on Pat- tern Recognition, pages 220–235. Springer, 2025. 8
work page 2025
-
[30]
Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection
R Sapkota, RH Cheppally, A Sharda, and M Karkee. Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection. arxiv 2025.arXiv preprint arXiv:2509.25164. 7
-
[31]
Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recog- nition and its application to scene text recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11):2298–2304, 2017. 3
work page 2017
-
[32]
Type-r: Au- tomatically retouching typos for text-to-image generation
Wataru Shimoda, Naoto Inoue, Daichi Haraguchi, Hayato Mitani, Seiichi Uchida, and Kota Yamaguchi. Type-r: Au- tomatically retouching typos for text-to-image generation. arXiv preprint arXiv:2411.18159, 2024. 3
-
[33]
Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Forn ´es, Josep Llad ´os, and Umapada Pal. De-GAN: a conditional generative adversar- ial network for document enhancement.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3):1180– 1191, 2020. 2
work page 2020
-
[34]
Docentr: An end-to-end document image enhancement transformer
Mohamed Ali Souibgui, Sanket Biswas, Sana Khamekhem Jemni, Yousri Kessentini, Alicia Forn ´es, Josep Llad ´os, and Umapada Pal. Docentr: An end-to-end document image enhancement transformer. In2022 26th International Con- ference on Pattern Recognition (ICPR), pages 1699–1705. IEEE, 2022. 2
work page 2022
-
[35]
Mohamed Ali Souibgui, Sanket Biswas, Andres Mafla, Ali Furkan Biten, Alicia Forn ´es, Yousri Kessentini, Josep Llad´os, Lluis G ´omez, and Dimosthenis Karatzas. Text- DIAE: A self-supervised degradation invariant autoencoder for text recognition and document enhancement. InProceed- ings of the AAAI Conference on Artificial Intelligence, 2023. 2
work page 2023
-
[36]
B. Su, S. Lu, and C. L. Tan. Robust document image bi- narization technique for degraded document images.IEEE Transactions on Image Processing, 22(4):1408–1417, 2013. 2
work page 2013
-
[37]
Unifying vision, text, and layout for universal doc- ument processing
Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, and Mohit Bansal. Unifying vision, text, and layout for universal doc- ument processing. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 19254–19264, 2023. 3
work page 2023
-
[38]
Qwen Team. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[39]
Leverag- ing LLMs for post-ocr correction of historical newspapers
Alan Thomas, Robert Gaizauskas, and Haiping Lu. Leverag- ing LLMs for post-ocr correction of historical newspapers. In Proceedings of the LT4HALA Workshop at LREC-COLING, pages 116–121, 2024. 3
work page 2024
-
[40]
Yolov10: Real-time end-to-end object detection,
Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jun- gong Han, and Guiguang Ding. Yolov10: Real-time end- to-end object detection.arXiv preprint arXiv:2405.14458,
-
[41]
Yolov9: Learning what you want to learn using programmable gradient information,
Chien-Yao Wang, I-Hau Yeh, and Hong-Yuan Mark Liao. Yolov9: Learning what you want to learn us- ing programmable gradient information.arXiv preprint arXiv:2402.13616, 2024. 2, 4, 7
-
[42]
Symmetrical linguis- tic feature distillation with clip for scene text recognition
Zixiao Wang, Hongtao Xie, Yuxin Wang, Jianjun Xu, Bo- qiang Zhang, and Yongdong Zhang. Symmetrical linguis- tic feature distillation with clip for scene text recognition. InProceedings of the 31st ACM international conference on multimedia, pages 509–518, 2023. 8
work page 2023
-
[43]
Ote: exploring accurate scene text recognition us- ing one token
Jianjun Xu, Yuxin Wang, Hongtao Xie, and Yongdong Zhang. Ote: exploring accurate scene text recognition us- ing one token. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28327– 28336, 2024. 1
work page 2024
-
[44]
DocDiff: Document enhancement via residual diffu- sion models
Zongyuan Yang, Baolin Liu, Yongping Xiong, Lan Yi, Guibin Wu, Xiaojun Tang, Ziqi Liu, Junjie Zhou, and Xing Zhang. DocDiff: Document enhancement via residual diffu- sion models. InProceedings of the 31st ACM International Conference on Multimedia (ACM MM), pages 2795–2806,
-
[45]
Docdiff: Document enhancement via residual diffu- sion models
Zongyuan Yang, Baolin Liu, Yongping Xxiong, Lan Yi, Guibin Wu, Xiaojun Tang, Ziqi Liu, Junjie Zhou, and Xing Zhang. Docdiff: Document enhancement via residual diffu- sion models. InProceedings of the 31st ACM international conference on multimedia, pages 2795–2806, 2023. 6
work page 2023
-
[46]
Muhammad Yaseen. What is yolov8: an in-depth exploration of the internal features of the next-generation object detector (2024).Accessed: Sep, 10, 2025. 7
work page 2024
-
[47]
Fangchen Yu, Yina Xie, Lei Wu, Yafei Wen, Guozhi Wang, Shuai Ren, Xiaoxin Chen, Jianfeng Mao, and Wenye Li. DocReal: Robust document dewarping of real-life images via attention-enhanced control point prediction. InProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 665–674, 2024. 2
work page 2024
-
[48]
Li Yujian and Liu Bo. A normalized levenshtein distance metric.IEEE transactions on pattern analysis and machine intelligence, 29(6):1091–1095, 2007. 2
work page 2007
-
[49]
Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, and Yu Zhou. Textctrl: Diffusion-based scene text editing with prior guidance control.Advances in Neural Information Pro- cessing Systems, 37:138569–138594, 2025. 3, 5
work page 2025
-
[50]
Linguistic more: Taking a further step toward efficient and accurate scene text recognition
Boqiang Zhang, Hongtao Xie, Yuxin Wang, Jianjun Xu, and Yongdong Zhang. Linguistic more: Taking a further step toward efficient and accurate scene text recognition. InPro- ceedings of the 32nd International Joint Conference on Arti- ficial Intelligence (IJCAI), pages 1704–1712, 2023. 3
work page 2023
-
[51]
Boqiang Zhang, Hongtao Xie, Zuan Gao, and Yuxin Wang. Choose what you need: Disentangled representation learning for scene text recognition removal and editing. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 28358–28368, 2024. 1
work page 2024
-
[52]
DocRes: A generalist model toward uni- fying document image restoration tasks
Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, and Lianwen Jin. DocRes: A generalist model toward uni- fying document image restoration tasks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. 2
work page 2024
-
[53]
Document image shadow removal guided by color-aware background
Ling Zhang, Yinxiao He, Qing Zhang, Zheng Liu, Xiao- long Zhang, and Chunxia Xiao. Document image shadow removal guided by color-aware background. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 1818–1827, 2023. 3
work page 2023
-
[54]
Yanxi Zhou, Shikai Zuo, Zhengxian Yang, Jinlong He, Jian- wen Shi, and Rui Zhang. A review of document image en- hancement based on document degradation problem.Ap- plied Sciences, 13(13):7855, 2023. 1
work page 2023
-
[55]
Text image inpainting via global structure-guided diffusion models
Shipeng Zhu, Pengfei Fang, Chenjie Zhu, Zuoyan Zhao, Qiang Xu, and Hui Xue. Text image inpainting via global structure-guided diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7775– 7783, 2024. 3, 5 DocRevive: A Unified Pipeline for Document Text Restoration Supplementary Material
work page 2024
-
[56]
In the current generator, we chooseNunique source pages per class-level
Dataset Construction Details This supplementary section provides the full construc- tion details of the Occluded Pages Restoration Benchmark (OPRB). In the current generator, we chooseNunique source pages per class-level. We introduce a novel benchmark dataset called Occluded Pages Restoration Benchmark (OPRB) designed to eval- uate document restoration u...
-
[57]
We evaluate the on three benchmark datasets
Method Details 10.1. Occlusion Detection and Blank Region Ex- traction Occlusion patches are first localized using a fine-tuned YOLOv9c detector [41] trained on the OPRB dataset. The benchmark contains six degradation classes,Black Ink, Burnt,Whitener,Dust,Scribble, andStamp. Opaque classes (Black Ink,Burnt,Whitener) fully obscure the un- derlying text, t...
-
[58]
We evaluate the on three benchmark datasets
Misceleneous Experiments 11.1. Comparison with Prior Document Restora- tion Methods We compare DocRevive against three prior methods on a subset of 498 images from OPRB (83 per occlusion type) DocDiff [45], GSDM (standalone), our pipeline’s inpaint- ing module run in isolation without any text prediction or editing and NAFNet [10], a strong general image ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.