Editing Text in the Wild

Chengquan Zhang; Errui Ding; Jiaming Liu; Jingtuo Liu; Junyu Han; Liang Wu; Xiang Bai

arxiv: 1908.03047 · v1 · pith:HGXNZCNSnew · submitted 2019-08-08 · 💻 cs.CV

Editing Text in the Wild

Liang Wu , Chengquan Zhang , Jiaming Liu , Junyu Han , Jingtuo Liu , Errui Ding , Xiang Bai This is my paper

classification 💻 cs.CV

keywords textmoduleimagebackgroundimagessourceconversionedited

0 comments

read the original abstract

In this paper, we are interested in editing text in natural images, which aims to replace or modify a word in the source image with another one while maintaining its realistic look. This task is challenging, as the styles of both background and text need to be preserved so that the edited image is visually indistinguishable from the source image. Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module. The text conversion module changes the text content of the source image into the target text while keeping the original text style. The background inpainting module erases the original text, and fills the text region with appropriate texture. The fusion module combines the information from the two former modules, and generates the edited text images. To our knowledge, this work is the first attempt to edit text in natural images at the word level. Both visual effects and quantitative results on synthetic and real-world dataset (ICDAR 2013) fully confirm the importance and necessity of modular decomposition. We also conduct extensive experiments to validate the usefulness of our method in various real-world applications such as text image synthesis, augmented reality (AR) translation, information hiding, etc.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Efficient Document Tampering Localization with Multi-Level Discrepancy Features and Unified DCT-Quantization Embedding
cs.CV 2026-06 unverdicted novelty 6.0

DiffNet achieves state-of-the-art cross-domain performance on human-made document tampering localization by combining RGB-DCT early fusion with multi-level discrepancy transformations and a frequency-index-aware DCT-q...