Recognition: 2 theorem links
· Lean TheoremStyleTextGen: Style-Conditioned Multilingual Scene Text Generation
Pith reviewed 2026-05-15 04:52 UTC · model grok-4.3
The pith
StyleTextGen generates scene text that matches reference visual styles across languages using a dedicated dual-branch encoder.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
StyleTextGen learns to perceive and replicate visual text styles across different languages and writing systems by introducing a dual-branch style encoder that yields robust multilingual representations from complex scenes, a text style consistency loss that improves coherence and visual quality, and a mask-guided inference strategy that ensures precise alignment, resulting in superior style consistency and cross-lingual generalization over prior methods.
What carries the argument
Dual-branch style encoder that isolates style modeling to produce robust multilingual text style representations from complex real-world backgrounds.
Load-bearing premise
The dual-branch style encoder and consistency loss can extract and maintain precise fine-grained text styles from complex backgrounds across languages without needing extra tuning or dataset changes.
What would settle it
Generated images on the StyleText-CE benchmark showing visible mismatches in stroke width, color, or texture between output and reference text in cross-lingual test cases would falsify the performance claim.
Figures
read the original abstract
Style-conditioned scene text generation faces unique challenges in extracting precise text styles from complex backgrounds and maintaining fine-grained style consistency across characters, especially for multilingual scripts. We propose StyleTextGen, a novel framework that learns to perceive and replicate visual text styles across different languages and writing systems. Our approach features three key contributions: First, we introduce a dual-branch style encoder dedicated to style modeling, yielding robust multilingual text style representations in complex real-world scenes. Second, we design a text style consistency loss that enhances style coherence and improves overall visual quality. Third, we develop a mask-guided inference strategy that ensures precise style alignment between generated and reference text. To facilitate systematic evaluation, we construct StyleText-CE, a bilingual scene text style benchmark covering both monolingual and cross-lingual settings. Extensive experiments demonstrate that StyleTextGen significantly outperforms existing methods in style consistency and cross-lingual generalization, establishing new state-of-the-art performance in multilingual style-conditioned text generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces StyleTextGen, a framework for style-conditioned multilingual scene text generation. It features a dual-branch style encoder for robust style representations from complex backgrounds, a text style consistency loss to improve coherence, and a mask-guided inference strategy for precise alignment. The authors construct the StyleText-CE bilingual benchmark for monolingual and cross-lingual evaluation and claim that the method significantly outperforms prior work in style consistency and cross-lingual generalization, establishing new state-of-the-art results.
Significance. If the empirical results hold, the work could advance scene text generation by addressing style extraction from real-world backgrounds and cross-lingual coherence, areas that remain challenging. The introduction of StyleText-CE as a dedicated benchmark for systematic evaluation is a potentially useful contribution that could support future research in multilingual settings.
major comments (1)
- [Abstract] Abstract: the central claim that StyleTextGen 'significantly outperforms existing methods' and establishes 'new state-of-the-art performance' is presented without any quantitative metrics, error bars, ablation studies, or dataset statistics. The experiments section must supply concrete numbers (e.g., style similarity scores, FID, or user-study results), baseline comparisons, and statistical validation; without them the primary empirical assertion remains unverifiable and load-bearing for the paper's contribution.
minor comments (2)
- [Abstract] Abstract: the dual-branch style encoder and consistency loss are described at a high level; adding a brief architectural diagram or pseudocode would improve clarity and reproducibility.
- [Abstract] Abstract: provide basic statistics for the StyleText-CE benchmark (number of images, text instances, languages covered, and train/test splits) to allow readers to assess its scope and difficulty.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the abstract. We will revise the manuscript to make the central empirical claims more concrete by incorporating key quantitative metrics directly into the abstract while ensuring the experiments section provides full supporting details, including error bars, ablations, and statistical validation.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that StyleTextGen 'significantly outperforms existing methods' and establishes 'new state-of-the-art performance' is presented without any quantitative metrics, error bars, ablation studies, or dataset statistics. The experiments section must supply concrete numbers (e.g., style similarity scores, FID, or user-study results), baseline comparisons, and statistical validation; without them the primary empirical assertion remains unverifiable and load-bearing for the paper's contribution.
Authors: We agree that the abstract would benefit from greater specificity. In the revised version we will add concrete numbers (e.g., style similarity score improvements of X points and FID reductions of Y points relative to the strongest baseline) while preserving the abstract's brevity. The experiments section already contains the requested elements: quantitative style-consistency and FID scores on StyleText-CE, direct comparisons against prior methods, ablation studies isolating the dual-branch encoder and text-style consistency loss, user-study results, and dataset statistics. We will further augment this section with error bars and statistical significance tests to strengthen verifiability. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical architecture (dual-branch style encoder, consistency loss, mask-guided inference) evaluated on a newly constructed benchmark (StyleText-CE). No equations, derivations, or parameter-fitting steps are described that reduce a claimed prediction back to its own inputs by construction. The central claims rest on experimental outperformance rather than self-referential definitions or load-bearing self-citations that would force the result. The framework is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work in a circular manner.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
StyleText-CE benchmark... cross-lingual generalization
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
In- structpix2pix: Learning to follow image editing instructions
Tim Brooks, Aleksander Holynski, and Alexei A Efros. In- structpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 18392–18402, 2023. 3
work page 2023
-
[2]
Tianjiao Cao, Jiahao Lyu, Weichao Zeng, Weimin Mu, and Yu Zhou. The devil is in fine-tuning and long-tailed prob- lems: a new benchmark for scene text detection.arXiv preprint arXiv:2505.15649, 2025. 1
-
[3]
Posta: A go-to framework for customized artistic poster gen- eration
Haoyu Chen, Xiaojie Xu, Wenbo Li, Jingjing Ren, Tian Ye, Songhua Liu, Ying-Cong Chen, Lei Zhu, and Xinchao Wang. Posta: A go-to framework for customized artistic poster gen- eration. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 28694–28704, 2025. 5
work page 2025
-
[4]
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei. Textdiffuser-2: Unleashing the power of language models for text rendering.arXiv preprint arXiv:2311.16465, 2023. 2
-
[5]
Textdiffuser: Diffusion models as text painters.arXiv preprint, abs/2305.10855, 2023
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei. Textdiffuser: Diffusion models as text painters.arXiv preprint, abs/2305.10855, 2023. 2
-
[6]
Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, et al. Internvl: Scaling up vision foundation mod- els and aligning for generic visual-linguistic tasks. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24185–24198, 2024. 6
work page 2024
-
[7]
Context per- ception parallel decoder for scene text recognition.IEEE Trans
Yongkun Du, Zhineng Chen, Caiyan Jia, Xiaoting Yin, Chenxia Li, Yuning Du, and Yu-Gang Jiang. Context per- ception parallel decoder for scene text recognition.IEEE Trans. Pattern Anal. Mach. Intell., 47(6):4668–4683, 2025. 1
work page 2025
-
[8]
Instruction-guided scene text recognition.IEEE Trans
Yongkun Du, Zhineng Chen, Yuchen Su, Caiyan Jia, and Yu- Gang Jiang. Instruction-guided scene text recognition.IEEE Trans. Pattern Anal. Mach. Intell., 47(4):2723–2738, 2025. 1
work page 2025
-
[9]
Recognition-synergistic scene text editing
Zhengyao Fang, Pengyuan Lyu, Jingjing Wu, Chengquan Zhang, Jun Yu, Guangming Lu, and Wenjie Pei. Recognition-synergistic scene text editing. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13104–13113, 2025. 2
work page 2025
-
[10]
Im- age style transfer using convolutional neural networks
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. Im- age style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2414–2423, 2016. 3, 4
work page 2016
-
[11]
Tongkun Guan, Zining Wang, Pei Fu, Zhengtao Guo, Wei Shen, Kai Zhou, Tiezhu Yue, Chen Duan, Hao Sun, Qianyi Jiang, et al. A token-level text image foundation model for document understanding.arXiv preprint arXiv:2503.02304,
-
[12]
Arbitrary style transfer in real-time with adaptive instance normalization
Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. InProceed- ings of the IEEE international conference on computer vi- sion, pages 1501–1510, 2017. 3
work page 2017
-
[13]
Improving diffusion models for scene text editing with dual encoders,
Jiabao Ji, Guanhua Zhang, Zhaowen Wang, Bairu Hou, Zhifei Zhang, Brian Price, and Shiyu Chang. Improving diffusion models for scene text editing with dual encoders,
-
[14]
Perceptual losses for real-time style transfer and super-resolution
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pages 694–711. Springer, 2016. 3
work page 2016
-
[15]
Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion
Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu. Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion. InEuropean Conference on Computer Vision, pages 150–168. Springer,
-
[16]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 4401–4410, 2019. 3
work page 2019
-
[17]
Analyzing and improv- ing the image quality of stylegan
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improv- ing the image quality of stylegan. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020. 2, 3
work page 2020
-
[18]
Praveen Krishnan, Rama Kovvuri, Guan Pang, Boris Vas- silev, and Tal Hassner. Textstylebrush: transfer of text aes- thetics from a single example.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7):9122–9134, 2023. 2
work page 2023
-
[19]
Clipstyler: Image style transfer with a single text condition
Gihyun Kwon and Jong Chul Ye. Clipstyler: Image style transfer with a single text condition. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18062–18071, 2022. 3
work page 2022
-
[20]
Flux.https://github.com/ black-forest-labs/flux, 2024
Black Forest Labs. Flux.https://github.com/ black-forest-labs/flux, 2024. 1, 3, 6
work page 2024
-
[21]
Stylestudio: Text-driven style transfer with selective control of style elements
Mingkun Lei, Xue Song, Beier Zhu, Hao Wang, and Chi Zhang. Stylestudio: Text-driven style transfer with selective control of style elements. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 23443– 23452, 2025. 3
work page 2025
-
[22]
Chenxia Li, Weiwei Liu, Ruoyu Guo, Xiaoting Yin, Kaitao Jiang, Yongkun Du, Yuning Du, Lingfeng Zhu, Baohua Lai, Xiaoguang Hu, Dianhai Yu, and Yanjun Ma. Pp-ocrv3: More attempts for the improvement of ultra lightweight OCR sys- tem.CoRR, abs/2206.03001, 2022. 6
-
[23]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven C. H. Hoi. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint, abs/2301.12597, 2023. 3
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
Zhenhang Li, Yan Shu, Weichao Zeng, Dongbao Yang, and Yu Zhou. First creating backgrounds then rendering texts: A new paradigm for visual text blending.arXiv preprint arXiv:2410.10168, 2024. 2
-
[25]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maxi- milian Nickel, and Matt Le. Flow matching for generative modeling, 2023. 5
work page 2023
-
[26]
Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, and Yuhui Yuan. Glyph-byt5: A customized text encoder for accurate visual text rendering.arXiv preprint arXiv:2403.09622, 2024. 2
-
[27]
Decoupled weight decay regularization, 2019
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. 6
work page 2019
-
[28]
Arbitrary reading order scene text spotter with local semantics guidance
Jiahao Lyu, Wei Wang, Dongbao Yang, Jinwen Zhong, and Yu Zhou. Arbitrary reading order scene text spotter with local semantics guidance. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 5919–5927, 2025. 1
work page 2025
-
[29]
Jian Ma, Yonglin Deng, Chen Chen, Nanyang Du, Haonan Lu, and Zhenyu Yang. Glyphdraw2: Automatic generation of complex glyph posters with diffusion models and large language models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5955–5963, 2025. 2
work page 2025
-
[30]
Calligrapher: Freestyle text image customization.arXiv preprint arXiv:2506.24123, 2025
Yue Ma, Qingyan Bai, Hao Ouyang, Ka Leong Cheng, Qi- uyu Wang, Hongyu Liu, Zichen Liu, Haofan Wang, Jingye Chen, Yujun Shen, et al. Calligrapher: Freestyle text image customization.arXiv preprint arXiv:2506.24123, 2025. 2, 5, 8
-
[31]
Dall·e3.https://openai.com/index/ dall-e-3/, 2023
OpenAI. Dall·e3.https://openai.com/index/ dall-e-3/, 2023. 2
work page 2023
-
[32]
SDXL: improving latent diffusion models for high-resolution image synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas M ¨uller, Joe Penna, and Robin Rombach. SDXL: improving latent diffusion models for high-resolution image synthesis. InICLR, 2024. 1
work page 2024
-
[33]
Exploring stroke-level mod- ifications for scene text editing
Yadong Qu, Qingfeng Tan, Hongtao Xie, Jianjun Xu, Yuxin Wang, and Yongdong Zhang. Exploring stroke-level mod- ifications for scene text editing. InProceedings of the AAAI Conference on Artificial Intelligence, pages 2119– 2127, 2023. 2
work page 2023
-
[34]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 3
work page 2021
-
[35]
Zero-shot text-to-image generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InICML, pages 8821–
-
[36]
High-resolution image syn- thesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InCVPR, pages 10684– 10695, 2022. 2
work page 2022
-
[37]
Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Carama- nis, Sanjay Shakkottai, and Wen-Sheng Chu. Semantic im- age inversion and editing using rectified stochastic differen- tial equations.arXiv preprint arXiv:2410.10792, 2024. 5
-
[38]
Stefann: scene text editor using font adaptive neural network
Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, and Umapada Pal. Stefann: scene text editor using font adaptive neural network. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13228– 13237, 2020. 2
work page 2020
-
[39]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kamyar Seyed Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding. InNeurIPS, 2022. 2
work page 2022
-
[40]
Yan Shu, Hangui Lin, Yexin Liu, Yan Zhang, Gangyan Zeng, Yan Li, Yu Zhou, Ser-Nam Lim, Harry Yang, and Nicu Sebe. When semantics mislead vision: Mitigating large mul- timodal models hallucinations in scene text spotting and un- derstanding.arXiv preprint arXiv:2506.05551, 2025
-
[41]
Yan Shu, Weichao Zeng, Fangmin Zhao, Zeyu Chen, Zhen- hang Li, Xiaomeng Yang, Yu Zhou, Paolo Rota, Xiang Bai, Lianwen Jin, et al. Visual text processing: A com- prehensive review and unified evaluation.arXiv preprint arXiv:2504.21682, 2025. 2
-
[42]
Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muham- mad Ferjad Naeem, Ibrahim Alabdulmohsin, Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, et al. Siglip 2: Multilingual vision-language en- coders with improved semantic understanding, localization, and dense features.arXiv preprint arXiv:2502.14786, 2025. 4, 6
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[43]
Anytext: Multilingual visual text gener- ation and editing.arXiv, 2023
Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, and Xuansong Xie. Anytext: Multilingual visual text gener- ation and editing.arXiv, 2023. 2, 5, 6
work page 2023
-
[44]
Anytext2: Vi- sual text generation and editing with customizable attributes,
Yuxiang Tuo, Yifeng Geng, and Liefeng Bo. Anytext2: Vi- sual text generation and editing with customizable attributes,
-
[45]
Texture Networks: Feed-forward Synthesis of Textures and Stylized Images
Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky. Texture networks: Feed-forward syn- thesis of textures and stylized images.arXiv preprint arXiv:1603.03417, 2016. 3
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[46]
Rectified diffusion: Straightness is not your need in rectified flow, 2024
Fu-Yun Wang, Ling Yang, Zhaoyang Huang, Mengdi Wang, and Hongsheng Li. Rectified diffusion: Straightness is not your need in rectified flow, 2024. 3
work page 2024
-
[47]
Glyphmastero: A glyph encoder for high-fidelity scene text editing
Tong Wang, Ting Liu, Xiaochao Qu, Chengjing Wu, Lu- oqi Liu, and Xiaolin Hu. Glyphmastero: A glyph encoder for high-fidelity scene text editing. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 28523–28532, 2025. 2
work page 2025
-
[48]
Liang Wu, Chengquan Zhang, Jiaming Liu, Junyu Han, Jing- tuo Liu, Errui Ding, and Xiang Bai. Editing text in the wild. InProceedings of the 27th ACM international conference on multimedia, pages 1500–1508, 2019. 2
work page 2019
-
[49]
Yu Xie, Jielei Zhang, Pengyu Chen, Ziyue Wang, Weihang Wang, Longwen Gao, Peiyi Li, Huyang Sun, Qiang Zhang, Qian Qiao, et al. Textflux: An ocr-free dit model for high- fidelity multilingual scene text synthesis.arXiv preprint arXiv:2505.17778, 2025. 6, 8
-
[50]
Swaptext: Image based texts transfer in scenes
Qiangpeng Yang, Jun Huang, and Wei Lin. Swaptext: Image based texts transfer in scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14700–14709, 2020. 2
work page 2020
-
[51]
Xiaomeng Yang, Zhi Qiao, and Yu Zhou. Ipad: iterative, parallel, and diffusion-based network for scene text recog- nition.International Journal of Computer Vision, 133(8): 5589–5609, 2025. 1
work page 2025
-
[52]
Yukang Yang, Dongnan Gui, Yuhui Yuan, Haisong Ding, Han Hu, and Kai Chen. Glyphcontrol: Glyph condi- tional control for visual text generation.arXiv preprint, abs/2305.18259, 2023. 2
-
[53]
Zhoufaran Yang, Yan Shu, Jing Wang, Zhifei Yang, Yan Zhang, Yu Li, Keyang Lu, Gangyan Zeng, Shaohui Liu, Yu Zhou, et al. Vidtext: Towards comprehensive evaluation for video text understanding.arXiv preprint arXiv:2505.22810,
-
[54]
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models.arXiv preprint, 2023. 3
work page 2023
-
[55]
Hi-sam: Marrying segment anything model for hierarchical text segmentation
Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, Baocai Yin, Cong Liu, Bo Du, and Dacheng Tao. Hi-sam: Marrying segment anything model for hierarchical text segmentation. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, pages 1–16, 2024. 6
work page 2024
-
[56]
Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, and Yu Zhou. Textctrl: Diffusion-based scene text editing with prior guidance control.Advances in Neural Information Pro- cessing Systems, 37:138569–138594, 2024. 2
work page 2024
-
[57]
Inversion-based style transfer with diffusion models
Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. Inversion-based style transfer with diffusion models. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10146–10156, 2023. 3
work page 2023
-
[58]
Qilong Zhangli, Jindong Jiang, Di Liu, Licheng Yu, Xi- aoliang Dai, Ankit Ramchandani, Guan Pang, Dimitris N. Metaxas, and Praveen Krishnan. Layout agnostic scene text image synthesis with diffusion models, 2024. 2
work page 2024
-
[59]
Yiming Zhao and Zhouhui Lian. Udifftext: A unified frame- work for high-quality text synthesis in arbitrary images via character-aware diffusion models, 2023. 2
work page 2023
-
[60]
Tianlun Zheng, Zhineng Chen, Shancheng Fang, Hongtao Xie, and Yu-Gang Jiang. Cdistnet: Perceiving multi-domain character distance for robust text recognition.IJCV, 132(2): 300–318, 2024. 1
work page 2024
-
[61]
Jianqun Zhou, Pengwen Dai, Yang Li, Manjiang Hu, and Xiaochun Cao. Explicitly-decoupled text transfer with the minimized background reconstruction for scene text editing. IEEE Transactions on Image Processing, 2024. 2
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.