ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control
Pith reviewed 2026-05-15 12:07 UTC · model grok-4.3
The pith
ChArtist generates pictorial charts by combining skeleton-based spatial control with subject control from reference images in a diffusion model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By introducing a skeleton-based spatial control representation that encodes only the data-encoding information of the chart, ChArtist allows a diffusion model to incorporate reference visuals flexibly without rigid outline constraints. Implemented via the Diffusion Transformer with adaptive position encoding and Spatially Gated Attention to manage the two controls, the approach produces pictorial charts that respect both chart structure and subject appearance. A dataset of 30,000 triplets supports fine-tuning, and a unified data accuracy metric quantifies faithfulness.
What carries the argument
Skeleton-based spatial control representation, which encodes only the data-encoding information of the chart to enable flexible incorporation of reference visuals.
If this is right
- Pictorial charts can be produced automatically while preserving data accuracy without manual creative deformation.
- General image conditions like edge or depth maps are replaced by task-specific skeletons that better suit chart generation.
- The Spatially Gated Attention mechanism modulates how spatial and subject controls interact during generation.
- Fine-tuning on 30,000 triplets demonstrates a practical path for adapting pre-trained diffusion models to this domain.
- The unified data accuracy metric offers a quantitative way to evaluate faithfulness in generated pictorial charts.
Where Pith is reading between the lines
- The skeleton approach could extend to generating other structured visuals such as diagrams or maps where data positions must stay fixed.
- Combining this control with real-time data feeds might allow dynamic updating of pictorial charts without retraining.
- The attention gating technique could apply to other multi-control generation tasks that mix structure and appearance.
Load-bearing premise
Encoding only the data positions in the skeleton provides enough structure to guide generation while leaving room for reference visuals to determine aesthetics without conflict.
What would settle it
Generated charts that systematically misalign data values with the input skeleton when measured by the proposed unified data accuracy metric would show the controls do not jointly maintain faithfulness.
Figures
read the original abstract
A pictorial chart is an effective medium for visual storytelling, seamlessly integrating visual elements with data charts. However, creating such images is challenging because the flexibility of visual elements often conflicts with the rigidity of chart structures. This process thus requires a creative deformation that maintains both data faithfulness and visual aesthetics. Current methods that extract dense structural cues from natural images (e.g., edge or depth maps) are ill-suited as conditioning signals for pictorial chart generation. We present ChArtist, a domain-specific diffusion model for generating pictorial charts automatically, offering two distinct types of control: 1) spatial control that aligns well with the chart structure, and 2) subject-driven control that respects the visual characteristics of a reference image. To achieve this, we introduce a skeleton-based spatial control representation. This representation encodes only the data-encoding information of the chart, allowing for the easy incorporation of reference visuals without a rigid outline constraint. We implement our method based on the Diffusion Transformer (DiT) and leverage an adaptive position encoding mechanism to manage these two controls. We further introduce Spatially Gated Attention to modulate the interaction between spatial control and subject control. To support the fine-tuning of pre-trained models for this task, we created a large-scale dataset of 30,000 triplets (skeleton, reference image, pictorial chart). We also propose a unified data accuracy metric to evaluate the data faithfulness of the generated charts. We believe this work demonstrates that current generative models can achieve data-driven visual storytelling by moving beyond general-purpose conditions to task-specific representations. Project page: https://chartist-ai.github.io/.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ChArtist, a domain-specific diffusion model based on the Diffusion Transformer (DiT) for automatically generating pictorial charts. It proposes two controls: skeleton-based spatial control that encodes only data-encoding structure, and subject-driven control from reference images, modulated via Spatially Gated Attention and adaptive position encoding. The work includes creation of a 30,000-triplet dataset (skeleton, reference image, pictorial chart) and a unified data accuracy metric for evaluating faithfulness.
Significance. If the empirical claims hold, the task-specific skeleton representation and gated attention mechanism could advance controlled generation for data visualization by resolving conflicts between structural rigidity and aesthetic flexibility, providing a template for domain-adapted diffusion models beyond general-purpose conditioning signals.
major comments (2)
- [Abstract, §4] Abstract and §4 (Evaluation): The manuscript introduces the unified data accuracy metric and claims effective balance of data faithfulness with aesthetics, but reports no quantitative results, baseline comparisons, or ablation studies on the 30k dataset; this leaves the central claim that the skeleton control plus Spatially Gated Attention achieves the desired outcome without verification.
- [§3.2] §3.2 (Spatially Gated Attention): The mechanism is introduced to modulate interaction between spatial and subject controls, yet the paper provides no equations or pseudocode detailing the gating function, its integration with adaptive position encoding, or how it avoids the conflicts noted for dense cues (e.g., edge maps); this is load-bearing for the architecture's novelty.
minor comments (2)
- [Abstract] The project page link is given but no details on released code, dataset, or model weights are provided in the text, which would aid reproducibility.
- [§3] Notation for the skeleton representation and gated attention could be formalized with explicit equations rather than descriptive text to improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the paper to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [Abstract, §4] Abstract and §4 (Evaluation): The manuscript introduces the unified data accuracy metric and claims effective balance of data faithfulness with aesthetics, but reports no quantitative results, baseline comparisons, or ablation studies on the 30k dataset; this leaves the central claim that the skeleton control plus Spatially Gated Attention achieves the desired outcome without verification.
Authors: We acknowledge that the original manuscript presented the unified data accuracy metric and qualitative examples but lacked quantitative benchmarks, baseline comparisons, and ablations on the 30k dataset. In the revised version, we have expanded §4 to include these evaluations: we report numerical scores using the proposed metric, compare against relevant baselines, and provide ablation studies isolating the contributions of skeleton-based spatial control and Spatially Gated Attention. These additions directly verify the central claims regarding the balance of faithfulness and aesthetics. revision: yes
-
Referee: [§3.2] §3.2 (Spatially Gated Attention): The mechanism is introduced to modulate interaction between spatial and subject controls, yet the paper provides no equations or pseudocode detailing the gating function, its integration with adaptive position encoding, or how it avoids the conflicts noted for dense cues (e.g., edge maps); this is load-bearing for the architecture's novelty.
Authors: We agree that the description of Spatially Gated Attention was insufficiently detailed. The revised §3.2 now includes the full mathematical formulation of the gating function, pseudocode for its computation and integration with adaptive position encoding, and an explicit explanation of how the mechanism selectively modulates features to avoid the rigidity conflicts inherent in dense conditioning signals such as edge maps. These additions clarify the architectural novelty. revision: yes
Circularity Check
No significant circularity
full rationale
The paper's core contributions consist of newly defined components (skeleton-based spatial control that encodes only data-encoding structure, adaptive position encoding to separate conditioning streams, and Spatially Gated Attention to modulate their interaction) whose definitions and interactions are introduced independently of any fitted parameters or target outputs. The 30k-triplet dataset and unified accuracy metric are presented as external support for training and evaluation rather than as self-referential predictions. No equations reduce a claimed result to its own inputs by construction, no load-bearing self-citations are invoked to justify uniqueness or ansatzes, and no known empirical patterns are merely renamed. The derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Pre-trained diffusion models can be effectively fine-tuned for domain-specific tasks using custom conditioning signals.
invented entities (2)
-
Skeleton-based spatial control representation
no independent evidence
-
Spatially Gated Attention
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
skeleton-based spatial control representation... encodes only the data-encoding information of the chart, allowing easy incorporation of reference visuals without a rigid outline constraint
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Spatially-Gated Attention to modulate the interaction between spatial control and subject control
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Ji- ahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, and et al. Gemini: A family of highly capable multimodal models, 2025. 6
work page 2025
-
[2]
Multi-content gan for few-shot font style transfer
Samaneh Azadi, Matthew Fisher, Vladimir G Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. Multi-content gan for few-shot font style transfer. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 7564–7573, 2018. 3
work page 2018
-
[3]
Loosec- ontrol: Lifting controlnet for generalized depth conditioning
Shariq Farooq Bhat, Niloy Mitra, and Peter Wonka. Loosec- ontrol: Lifting controlnet for generalized depth conditioning. InACM SIGGRAPH 2024 Conference Papers, pages 1–11,
work page 2024
-
[4]
Rita Borgo, Alfie Abdul-Rahman, Farhan Mohamed, Philip W Grant, Irene Reppa, Luciano Floridi, and Min Chen. An empirical study on using visual embellishments in visualization.IEEE Transactions on Visualization and Com- puter Graphics, 18(12):2759–2768, 2012. 1
work page 2012
-
[5]
Diffusion illusions: Hiding images in plain sight
Ryan Burgert, Xiang Li, Abe Leite, Kanchana Ranasinghe, and Michael Ryoo. Diffusion illusions: Hiding images in plain sight. InACM SIGGRAPH 2024 Conference Papers, pages 1–11, 2024. 3
work page 2024
-
[6]
Emerg- ing properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the International Conference on Computer Vi- sion (ICCV), 2021. 6
work page 2021
-
[7]
Caroline Chan, Shiry Ginosar, Tinghui Zhou, and Alexei A Efros. Everybody dance now. InProceedings of the IEEE/CVF international conference on computer vision, pages 5933–5942, 2019. 2
work page 2019
-
[8]
Infomages: Embedding data into thematic images
Darius Coelho and Klaus Mueller. Infomages: Embedding data into thematic images. InComputer Graphics Forum, pages 593–606. Wiley Online Library, 2020. 1, 2, 3
work page 2020
-
[9]
Weiwei Cui, Jinpeng Wang, He Huang, Yun Wang, Chin- Yew Lin, Haidong Zhang, and Dongmei Zhang. A mixed- initiative approach to reusing infographic charts.IEEE Transactions on Visualization and Computer Graphics, 28 (1):173–183, 2021. 2
work page 2021
-
[10]
Noa Fish, Lilach Perry, Amit Bermano, and Daniel Cohen- Or. Sketchpatch: Sketch stylization via seamless patch-level synthesis.ACM Transactions on Graphics (TOG), 39(6):1– 14, 2020. 3
work page 2020
-
[11]
Daniel Garibi, Shahar Yadin, Roni Paiss, Omer Tov, Shiran Zada, Ariel Ephrat, Tomer Michaeli, Inbar Mosseri, and Tali Dekel. Tokenverse: Versatile multi-concept personalization in token modulation space.ACM Transactions On Graphics (TOG), 44(4):1–11, 2025. 3
work page 2025
-
[12]
Visual ana- grams: Generating multi-view optical illusions with diffu- sion models
Daniel Geng, Inbum Park, and Andrew Owens. Visual ana- grams: Generating multi-view optical illusions with diffu- sion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24154– 24163, 2024. 3
work page 2024
-
[13]
Zheng Gu, Shiyuan Yang, Jing Liao, Jing Huo, and Yang Gao. Analogist: Out-of-the-box visual in-context learning with image diffusion model.ACM Transactions on Graphics (TOG), 43(4):1–15, 2024. 3
work page 2024
-
[14]
Iso- type visualization: Working memory, performance, and en- gagement with pictographs
Steve Haroz, Robert Kosara, and Steven L Franconeri. Iso- type visualization: Working memory, performance, and en- gagement with pictographs. InProceedings of the 33rd an- nual ACM conference on human factors in computing sys- tems, pages 1191–1200, 2015. 1
work page 2015
-
[15]
Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, and Jingren Zhou. Composer: Creative and controllable im- age synthesis with composable conditions.arXiv preprint arXiv:2302.09778, 2023. 2
-
[16]
In-context lora for diffusion transformers.arXiv preprint arXiv:2410.23775, 2024a
Lianghua Huang, Wei Wang, Zhi-Fan Wu, Yupeng Shi, Huanzhang Dou, Chen Liang, Yutong Feng, Yu Liu, and Jin- gren Zhou. In-context lora for diffusion transformers.arXiv preprint arXiv:2410.23775, 2024. 3
-
[17]
Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4): 1–11, 2023
Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, and Ariel Shamir. Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4): 1–11, 2023. 3
work page 2023
-
[18]
Image-to-image translation with conditional adver- sarial networks
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. Image-to-image translation with conditional adver- sarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134,
-
[19]
Humansd: A native skeleton-guided diffusion model for human image generation
Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, and Qiang Xu. Humansd: A native skeleton-guided diffusion model for human image generation. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 15988–15998, 2023. 2
work page 2023
-
[20]
Musiq: Multi-scale image quality transformer
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157, 2021. 6
work page 2021
-
[21]
Sungnyun Kim, Junsoo Lee, Kibeom Hong, Daesik Kim, and Namhyuk Ahn. Diffblender: Scalable and composable multimodal text-to-image diffusion models.arXiv preprint arXiv:2305.15194, 2023. 2
-
[22]
Pic- ture that sketch: Photorealistic image generation from ab- stract sketches
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Pic- ture that sketch: Photorealistic image generation from ab- stract sketches. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6850– 6861, 2023. 2
work page 2023
-
[23]
Multi-concept customization of text-to-image diffusion
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. Multi-concept customization of text-to-image diffusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1931–1941, 2023. 3
work page 1931
-
[24]
Flux.1 kontext: Flow matching for in-context image generation and editing in latent space,
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dock- horn, Jack English, Zion English, Patrick Esser, Sumith Ku- lal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas M¨uller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. Flux.1 kontext: Flow matching for in-context i...
-
[25]
One diffusion to generate them all
Duong H Le, Tuan Pham, Sangho Lee, Christopher Clark, Aniruddha Kembhavi, Stephan Mandt, Ranjay Krishna, and Jiasen Lu. One diffusion to generate them all. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 2671–2682, 2025. 2
work page 2025
-
[26]
Gligen: Open-set grounded text-to-image generation
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jian- wei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. Gligen: Open-set grounded text-to-image generation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22511–22521, 2023. 2
work page 2023
-
[27]
Smartcontrol: Enhancing controlnet for handling rough visual conditions
Xiaoyu Liu, Yuxiang Wei, Ming Liu, Xianhui Lin, Peiran Ren, Xuansong Xie, and Wangmeng Zuo. Smartcontrol: Enhancing controlnet for handling rough visual conditions. arXiv preprint arXiv:2404.06451, 2024. 2
-
[28]
Readout guidance: Learning con- trol from diffusion features
Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, and Aleksander Holynski. Readout guidance: Learning con- trol from diffusion features. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8217–8227, 2024. 2
work page 2024
-
[29]
Readout guidance: Learning con- trol from diffusion features
Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, and Aleksander Holynski. Readout guidance: Learning con- trol from diffusion features. InCVPR, 2024. 3
work page 2024
-
[30]
Pose guided person image gener- ation.Advances in neural information processing systems, 30, 2017
Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuyte- laars, and Luc Van Gool. Pose guided person image gener- ation.Advances in neural information processing systems, 30, 2017. 2
work page 2017
-
[31]
SDEdit: Guided image synthesis and editing with stochastic differential equa- tions
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jia- jun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided image synthesis and editing with stochastic differential equa- tions. InInternational Conference on Learning Representa- tions, 2022. 6
work page 2022
-
[32]
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI conference on artificial intelligence, pages 4296–4304, 2024. 2
work page 2024
-
[33]
Gpt-image-1: Openai image generation model
OpenAI. Gpt-image-1: Openai image generation model. https://developers.openai.com/api/docs/ models/gpt-image-1, 2025. Accessed: March 2026. 6
work page 2025
-
[34]
Ji Hwan Park, Arie Kaufman, and Klaus Mueller. Graphoto: Aesthetically pleasing charts for casual information visual- ization.IEEE computer graphics and applications, 38(6): 67–82, 2019. 1, 2
work page 2019
-
[35]
High-resolution image syn- thesis with latent diffusion models, 2021
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models, 2021. 6
work page 2021
-
[36]
Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500– 22510, 2023. 3
work page 2023
-
[37]
Yang Shi, Pei Liu, Siji Chen, Mengdi Sun, and Nan Cao. Supporting expressive and faithful pictorial visualization de- sign with visual style transfer.IEEE Transactions on Visual- ization and Computer Graphics, 29(1):236–246, 2022. 2
work page 2022
-
[38]
Large-scale text-to-image model with inpainting is a zero-shot subject-driven image generator
Chaehun Shin, Jooyoung Choi, Heeseung Kim, and Sungroh Yoon. Large-scale text-to-image model with inpainting is a zero-shot subject-driven image generator. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7986–7996, 2025. 3, 4
work page 2025
-
[39]
Styledrop: Text-to-image generation in any style,
Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, et al. Styledrop: Text-to-image generation in any style.arXiv preprint arXiv:2306.00983,
-
[40]
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063,
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063,
-
[41]
Ominicontrol: Minimal and univer- sal control for diffusion transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang. Ominicontrol: Minimal and univer- sal control for diffusion transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14940–14950, 2025. 2, 3, 4, 5, 6
work page 2025
-
[42]
Trick or TReAT: Thematic Reinforcement for Artistic Typography
Purva Tendulkar, Kalpesh Krishna, Ramprasaath R Sel- varaju, and Devi Parikh. Trick or treat: Thematic reinforcement for artistic typography.arXiv preprint arXiv:1903.07820, 2019. 3
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[43]
Sketch-guided text-to-image diffusion models
Andrey V oynov, Kfir Aberman, and Daniel Cohen-Or. Sketch-guided text-to-image diffusion models. InACM SIG- GRAPH 2023 conference proceedings, pages 1–11, 2023. 2
work page 2023
-
[44]
Haoxuan Wang, Jinlong Peng, Qingdong He, Hao Yang, Ying Jin, Jiafu Wu, Xiaobin Hu, Yanjie Pan, Zhenye Gan, Mingmin Chi, et al. Unicombine: Unified multi-conditional combination with diffusion transformer.arXiv preprint arXiv:2503.09277, 2025. 3, 4
-
[45]
Ex- ploring clip for assessing the look and feel of images
Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Ex- ploring clip for assessing the look and feel of images. In AAAI, 2023. 6
work page 2023
-
[46]
Elite: Encoding visual con- cepts into textual embeddings for customized text-to-image generation
Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, and Wangmeng Zuo. Elite: Encoding visual con- cepts into textual embeddings for customized text-to-image generation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15943–15953, 2023. 3
work page 2023
-
[47]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, Yuxiang Chen, Zecheng Tang, Zekai Zhang, Zhengyi Wang, An Yang, Bowen Yu, Chen Cheng, Dayiheng Liu, De- qing Li, Hang Zhang, Hao Meng, Hu Wei, Jingyuan Ni, Kai Chen, Kuan Cao, Liang Peng, Lin Qu, Minggang Wu, Peng Wang, Shuting Yu, Tingk...
-
[48]
Jiaqi Wu, John Joon Young Chung, and Eytan Adar. viz2viz: Prompt-driven stylized visualization generation using a dif- fusion model.arXiv preprint arXiv:2304.01919, 2023. 1, 2
-
[49]
Shishi Xiao, Suizi Huang, Yue Lin, Yilin Ye, and Wei Zeng. Let the chart spark: Embedding semantic context into chart with text-to-image generative model.IEEE Transactions on Visualization and Computer Graphics, 30(1):284–294, 2023. 1, 2, 3
work page 2023
-
[50]
Typedance: Creating semantic typographic logos from im- age through personalized generation
Shishi Xiao, Liangwei Wang, Xiaojuan Ma, and Wei Zeng. Typedance: Creating semantic typographic logos from im- age through personalized generation. InProceedings of the 2024 CHI Conference on Human Factors in Computing Sys- tems, pages 1–18, 2024. 3
work page 2024
-
[51]
Mingliang Xu, Qingfeng Li, Jianwei Niu, Hao Su, Xiting Liu, Weiwei Xu, Pei Lv, Bing Zhou, and Yi Yang. Art-up: A novel method for generating scanning-robust aesthetic qr codes.ACM transactions on multimedia computing, com- munications, and applications (TOMM), 17(1):1–23, 2021. 3
work page 2021
-
[52]
arXiv preprint arXiv:2211.13227 , year=
Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, and Fang Wen. Paint by example: Exemplar-based image editing with diffusion mod- els.arXiv preprint arXiv:2211.13227, 2022. 6
-
[53]
Context-aware unsupervised text stylization
Shuai Yang, Jiaying Liu, Wenhan Yang, and Zongming Guo. Context-aware unsupervised text stylization. InProceedings of the 26th ACM international conference on Multimedia, pages 1688–1696, 2018. 3
work page 2018
-
[54]
Tet-gan: Text effects transfer via stylization and destyl- ization
Shuai Yang, Jiaying Liu, Wenjing Wang, and Zongming Guo. Tet-gan: Text effects transfer via stylization and destyl- ization. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1238–1245, 2019. 3
work page 2019
-
[55]
Maniqa: Multi-dimension attention network for no-reference image quality assessment
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1191–1200, 2022. 6
work page 2022
-
[56]
Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models. 2023. 6
work page 2023
-
[57]
Dataquilt: Extracting visual elements from images to craft pictorial visualizations
Jiayi Eris Zhang, Nicole Sultanum, Anastasia Bezerianos, and Fanny Chevalier. Dataquilt: Extracting visual elements from images to craft pictorial visualizations. InProceedings of the 2020 chi conference on human factors in computing systems, pages 1–13, 2020. 3
work page 2020
-
[58]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 2, 6
work page 2023
-
[59]
Aesthetic qr codes based on two-stage image blend- ing
Yongtai Zhang, Shihong Deng, Zhihong Liu, and Yongtao Wang. Aesthetic qr codes based on two-stage image blend- ing. InInternational Conference on Multimedia Modeling, pages 183–194. Springer, 2015. 3
work page 2015
-
[60]
Bo Zhao, Lili Meng, Weidong Yin, and Leonid Sigal. Image generation from layout. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2
work page 2019
-
[61]
Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, and Kwan-Yee K Wong. Uni-controlnet: All-in-one control to text-to-image diffusion models.Advances in Neural Information Processing Sys- tems, 36:11127–11150, 2023. 2
work page 2023
-
[62]
Layoutdiffusion: Controllable diffu- sion model for layout-to-image generation
Guangcong Zheng, Xianpan Zhou, Xuewei Li, Zhongang Qi, Ying Shan, and Xi Li. Layoutdiffusion: Controllable diffu- sion model for layout-to-image generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 22490–22499, 2023. 2
work page 2023
-
[63]
Bilateral refer- ence for high-resolution dichotomous image segmentation
Peng Zheng, Dehong Gao, Deng-Ping Fan, Li Liu, Jorma Laaksonen, Wanli Ouyang, and Nicu Sebe. Bilateral refer- ence for high-resolution dichotomous image segmentation. CAAI Artificial Intelligence Research, 3:9150038, 2024. 3
work page 2024
-
[64]
Weizhi Zhong, Huan Yang, Zheng Liu, Huiguo He, Zijian He, Xuesong Niu, Di Zhang, and Guanbin Li. Mod-adapter: Tuning-free and versatile multi-concept personalization via modulation adapter.arXiv preprint arXiv:2505.18612, 2025. 3
-
[65]
this item, in a white back- ground
Jun-Yan Zhu, Philipp Kr ¨ahenb¨uhl, Eli Shechtman, and Alexei A. Efros. Generative visual manipulation on the natu- ral image manifold. InProceedings of European Conference on Computer Vision (ECCV), 2016. 2 ChArtist: Generating Pictorial Charts with Unified Spatial and Subject Control Supplementary Material A. Evaluation A.1. Data Accuracy Evaluation Det...
work page 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.