Reason-SVG: Enhancing Structured Reasoning for Vector Graphics Generation with Reinforcement Learning
Pith reviewed 2026-05-19 12:56 UTC · model grok-4.3
The pith
Explicit design reasoning during training lets language models create more accurate vector graphics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Reason-SVG introduces the Drawing-with-Thought paradigm in which the model must generate both SVG code and explicit design rationales. A first supervised stage on the SVGX-DwT-10k dataset builds basic reasoning ability, after which reinforcement learning with Group Relative Policy Optimization and a hybrid reward refines the outputs for structural validity, semantic alignment, and visual coherence, yielding measurable gains for both language models and vision-language models.
What carries the argument
The Drawing-with-Thought (DwT) paradigm, in which the model produces both SVG code and explicit design rationales that guide generation.
If this is right
- Generated SVGs exhibit higher rates of structural validity with fewer broken paths or overlapping shapes.
- Semantic alignment improves so that the graphics more closely reflect the content of the input description.
- Visual coherence rises, producing results that appear more polished without direct pixel supervision.
- The same pipeline lifts performance on both pure language models and those that also process images.
Where Pith is reading between the lines
- The same explicit-reasoning training pattern could be applied to other structured code outputs such as HTML layouts or diagram specifications.
- Well-designed hybrid rewards might reduce the volume of human-labeled data needed for creative generation tasks.
- The stored rationales could later support interactive editing, where a user modifies the reasoning steps rather than the code directly.
Load-bearing premise
The hybrid reward function is assumed to correctly measure the presence of useful design reasoning together with structural, semantic, and visual quality without the model learning to exploit scoring loopholes.
What would settle it
If a model trained with the full DwT-plus-RL pipeline produces SVGs whose rendered images match input prompts no better than a standard supervised baseline, as measured by human ratings or automated structural checks on a held-out prompt set, the benefit of the added reasoning stage would be refuted.
Figures
read the original abstract
Generating high-quality Scalable Vector Graphics (SVGs) is challenging for Large Language Models (LLMs), as it requires advanced reasoning for structural validity, semantic accuracy, and visual coherence -- areas where current LLMs often struggle. In this work, we introduce Reason-SVG, a novel framework equipped with enhanced structured reasoning for SVG generation. Reason-SVG pioneers the ``Drawing-with-Thought'' (DwT) paradigm, in which models generate both SVG code and explicit design rationales. Reason-SVG follows a two-stage training strategy: First, Supervised Fine-Tuning (SFT) trains the LLM on the DwT paradigm to develop foundational reasoning abilities. Second, Reinforcement Learning (RL), utilizing Group Relative Policy Optimization (GRPO), empowers the model to generate both DwT and SVG rationales through refined, reward-driven reasoning. To enable reasoning-driven SVG generation, we design a Hybrid Reward function that evaluates the presence and effectiveness of DwT reasoning, along with structural validity, semantic alignment, and visual quality. We also introduce the SVGX-DwT-10k dataset, a high-quality corpus of 10k SVG-DwT pairs, where each SVG code is generated based on explicit DwT reasoning. By integrating DwT, SFT, and Hybrid Reward-guided RL, Reason-SVG significantly improves the performance of LLMs and VLMs in generating accurate and visually coherent SVGs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Reason-SVG, a framework for SVG generation that augments LLMs and VLMs with structured reasoning via the new 'Drawing-with-Thought' (DwT) paradigm. Models are trained to output both SVG code and explicit design rationales. Training proceeds in two stages: supervised fine-tuning (SFT) on the introduced SVGX-DwT-10k dataset of 10k SVG-DwT pairs, followed by reinforcement learning with Group Relative Policy Optimization (GRPO) driven by a hybrid reward that scores DwT reasoning presence/effectiveness together with structural validity, semantic alignment, and visual quality. The central claim is that this pipeline yields significant gains in accurate and visually coherent SVG outputs.
Significance. If the empirical results hold with proper controls, the DwT paradigm and the accompanying dataset constitute a useful contribution toward interpretable, reasoning-augmented generation of structured graphics. The two-stage SFT-then-GRPO recipe is a standard template but is applied here to a new domain with a composite reward; credit is due for releasing the SVGX-DwT-10k corpus. The work sits at the intersection of vision-language models and controllable vector graphics, an area of growing practical interest.
major comments (2)
- [Abstract and §3] Abstract and §3 (Hybrid Reward): the central performance claim rests on the hybrid reward correctly measuring and incentivizing genuine DwT reasoning rather than superficial rationales. No equations, weighting coefficients, or validation protocol for the reward components are supplied in the abstract or high-level description, leaving open the possibility of reward hacking that inflates metrics without improving SVG coherence.
- [§4] §4 (Experiments): the abstract asserts 'significant improvements' for both LLMs and VLMs yet supplies no quantitative results, baseline comparisons, or details on how visual quality is scored. Without these numbers and controls the load-bearing claim that DwT + SFT + GRPO is responsible for the gains cannot be evaluated.
minor comments (2)
- Define GRPO and all other acronyms on first use; clarify whether the visual-quality term in the hybrid reward is computed by an automated metric, an LLM judge, or human raters.
- Figure captions and dataset statistics should explicitly state the split sizes, diversity of SVG categories, and any filtering criteria applied to the 10k SVG-DwT pairs.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve clarity and transparency as indicated.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (Hybrid Reward): the central performance claim rests on the hybrid reward correctly measuring and incentivizing genuine DwT reasoning rather than superficial rationales. No equations, weighting coefficients, or validation protocol for the reward components are supplied in the abstract or high-level description, leaving open the possibility of reward hacking that inflates metrics without improving SVG coherence.
Authors: We appreciate the referee's emphasis on reward transparency. While Section 3 describes the four components of the Hybrid Reward (DwT reasoning effectiveness, structural validity, semantic alignment, and visual quality), we acknowledge that explicit equations, specific weighting coefficients, and a validation protocol are not presented at a high level. To address concerns about potential reward hacking, we will revise Section 3 to include the mathematical formulations for each component, the weighting scheme used, and a description of how the reward was validated to prioritize substantive reasoning over superficial outputs. A concise summary of the reward formulation will also be added to the abstract. revision: yes
-
Referee: §4 (Experiments): the abstract asserts 'significant improvements' for both LLMs and VLMs yet supplies no quantitative results, baseline comparisons, or details on how visual quality is scored. Without these numbers and controls the load-bearing claim that DwT + SFT + GRPO is responsible for the gains cannot be evaluated.
Authors: We agree that the abstract would be strengthened by including key quantitative evidence. The experiments in §4 report results across LLMs and VLMs with baseline comparisons, using metrics for structural validity, semantic alignment, and visual quality (the latter via automated metrics combined with human evaluation protocols detailed in the section). To make the central claims more evaluable from the abstract alone, we will revise the abstract to incorporate specific performance deltas and baseline references while maintaining conciseness. This revision will better substantiate the contributions of the DwT paradigm, SFT, and GRPO stages. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents an empirical training pipeline (DwT paradigm + SFT + GRPO RL with a newly designed hybrid reward) applied to a newly introduced dataset (SVGX-DwT-10k). No mathematical derivations, fitted parameters renamed as predictions, or self-citations are used to justify the central performance claims. All load-bearing elements are introduced as novel components whose effectiveness is asserted via training outcomes rather than reducing to prior inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can be effectively aligned to produce both reasoning text and code via SFT followed by RL with a composite reward.
invented entities (2)
-
Drawing-with-Thought (DwT) paradigm
no independent evidence
-
Hybrid Reward function
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we design a Hybrid Reward function that evaluates the presence and effectiveness of DwT reasoning, along with structural validity, semantic alignment, and visual quality
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Drawing-with-Thought (DwT) paradigm, in which models generate both SVG code and explicit design rationales
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 5 Pith papers
-
VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
-
Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
Render-in-the-Loop reformulates SVG generation as a step-wise visual-context-aware process using self-feedback from rendered intermediate states, VSF training, and RaV inference to outperform baselines on MMSVGBench f...
-
AmodalSVG: Amodal Image Vectorization via Semantic Layer Peeling
AmodalSVG produces semantically separate and geometrically complete SVG layers from natural images by using VLM-guided semantic layer peeling for amodal completion followed by adaptive vectorization.
-
Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis
Element-level leave-one-out analysis yields per-element quality scores and four structural metrics (purity, coverage, compactness, locality) that quantify SVG modularity and enable artifact detection.
-
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
HiVG introduces hierarchical SVG tokenization with atomic and segment tokens plus HMN initialization to enable more efficient and stable autoregressive generation of vector graphics programs.
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 3, 6, 7, 9
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Anthropic. Claude 3.5 sonnet. https : / / www . anthropic . com / news / claude - 3 - 5 - sonnet, 2024
work page 2024
-
[3]
Claude 3.7 sonnet and claude code
Anthropic. Claude 3.7 sonnet and claude code. https: / / www . anthropic . com / news / claude - 3 - 7 - sonnet, 2025. 3, 6, 7, 9
work page 2025
-
[4]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 5, 6, 7, 9, 12
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. Deepsvg: A hierarchical generative network for vector graphics animation.Advances in Neural Informa- tion Processing Systems (NeurIPS), 33:16351–16361, 2020. 3
work page 2020
-
[6]
CairoSVG: A Simple SVG Con- verter based on Cairo
CourtBouillon. CairoSVG: A Simple SVG Con- verter based on Cairo. https : / / cairosvg . org / documentation/, 2024. Version 2.7.1 or later. Accessed: 2025-05-14. 5, 9, 10, 11
work page 2024
-
[7]
Zhiqing Cui, Jiahao Yuan, Hanqing Wang, Yanshu Li, Chenxu Du, and Zhenglong Ding. Draw with thought: Unleashing multimodal reasoning for scientific diagram generation.arXiv preprint arXiv:2504.09479, 2025. 3
-
[8]
Gemini 2.5 pro - best for coding and complex prompts
DeepMind. Gemini 2.5 pro - best for coding and complex prompts. https : / / deepmind . google / technologies/gemini/pro/, 2024. 6, 9, 11
work page 2024
-
[9]
Shuguang Dou, Xinyang Jiang, Lu Liu, Lu Ying, Caihua Shan, Yifei Shen, Xuanyi Dong, Yun Wang, Dongsheng Li, and Cairong Zhao. Hierarchically recognizing vector graphics and a new chart-based vector graphics dataset.IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 46 (12):7556–7573, 2024. 3
work page 2024
-
[10]
CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders
Kevin Frans, Lisa Soros, and Olaf Witkowski. CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 1, 3
work page 2022
-
[11]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2, 3, 4, 6, 7, 9
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[12]
A neural representation of sketch drawings
David Ha and Douglas Eck. A neural representation of sketch drawings. InInternational Conference on Learning Represen- tations (ICLR), 2018. 3
work page 2018
-
[13]
Vec- torpainter: Advanced stylized vector graphics synthesis using stroke-style priors
Juncheng Hu, Ximing Xing, Jing Zhang, and Qian Yu. Vec- torpainter: Advanced stylized vector graphics synthesis using stroke-style priors. InIEEE International Conference on Multimedia and Expo (ICME). IEEE, 2025. 1, 3
work page 2025
-
[14]
Supersvg: Superpixel-based scalable vector graphics synthesis
Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L Rosin, and Yu-Kun Lai. Supersvg: Superpixel-based scalable vector graphics synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24892–24901, 2024. 3
work page 2024
-
[15]
Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4), 2023
Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, and Ariel Shamir. Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4), 2023
work page 2023
-
[16]
Vectorfusion: Text- to-svg by abstracting pixel-based diffusion models
Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text- to-svg by abstracting pixel-based diffusion models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 1, 3, 6, 7, 9
work page 2023
-
[17]
Recognizing vector graphics without rasterization
Xinyang Jiang, Lu Liu, Caihua Shan, Yifei Shen, Xuanyi Dong, and Dongsheng Li. Recognizing vector graphics without rasterization. InProceedings of the 35th Interna- tional Conference on Neural Information Processing Systems (NeurIPS), Red Hook, NY , USA, 2021. Curran Associates Inc. 3
work page 2021
-
[18]
Jinke Li, Jiarui Yu, Chenxing Wei, Hande Dong, Qiang Lin, Liangjing Yang, Zhicai Wang, and Yanbin Hao. Unisvg: A unified dataset for vector graphic understanding and genera- tion with multimodal large language models. InProceedings 13 of the 33rd ACM International Conference on Multimedia, pages 13156–13163, 2025. 1, 3
work page 2025
-
[19]
Starcoder: may the source be with you! Transactions on Machine Learning Research (TMLR), 2023
Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muen- nighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia LI, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier De- haene, Joel Lamy-Poirier, Joao Monteiro, Nicolas Gontier, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Ben Lip- kin, Muhtasham Oblokul...
work page 2023
-
[20]
Tzu-Mao Li, Michal Lukáˇc, Gharbi Michaël, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):193:1–193:15, 2020. 1, 3
work page 2020
-
[21]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. InThirty-seventh Conference on Neural Information Processing Systems (NeurIP), 2023. 3
work page 2023
-
[23]
A learned representation for scalable vec- tor graphics
Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vec- tor graphics. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 1, 3
work page 2019
-
[24]
Towards layer-wise image vectorization
Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer-wise image vectorization. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 16314–16323, 2022. 3
work page 2022
-
[25]
Chart4blind: An intelligent interface for chart accessibility conversion
Omar Moured, Morris Baumgarten-Egemole, Karin Müller, Alina Roitberg, Thorsten Schwarz, and Rainer Stiefelhagen. Chart4blind: An intelligent interface for chart accessibility conversion. InProceedings of the 29th International Con- ference on Intelligent User Interfaces, pages 504–514, 2024. 1
work page 2024
-
[26]
Svgeditbench: A bench- mark dataset for quantitative assessment of llm’s svg editing capabilities
Kunato Nishina and Yusuke Matsui. Svgeditbench: A bench- mark dataset for quantitative assessment of llm’s svg editing capabilities. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8142–8147,
-
[27]
Introducing openai o3 and o4-mini
OpenAI. Introducing openai o3 and o4-mini. https:// openai.com/index/introducing- o3- and- o4- mini/, 2025. 3, 6
work page 2025
-
[28]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patrick...
work page 2024
-
[29]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Car- roll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Ad- vances in neural information processing systems (NeurIP), 35:27730–27744, 2022. 3
work page 2022
-
[30]
Sagi Polaczek, Yuval Alaluf, Elad Richardson, Yael Vinker, and Daniel Cohen-Or. Neuralsvg: An implicit repre- sentation for text-to-vector generation.arXiv preprint arXiv:2501.03992, 2025. 3
-
[31]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021. 1, 3, 5, 9, 10
work page 2021
-
[32]
Im2vec: Synthesizing vector graphics without vector supervision
Pradyumna Reddy, Michael Gharbi, Michal Lukac, and Niloy J Mitra. Im2vec: Synthesizing vector graphics without vector supervision. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7342–7351, 2021. 3
work page 2021
-
[33]
Starvector: Generating scalable vector graphics code from images
Juan A Rodriguez, Shubham Agarwal, Issam H Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, and Marco Ped- ersoli. Starvector: Generating scalable vector graphics code from images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 3, 6, 7, 9
work page 2025
-
[34]
Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Rishav Pramanik, Aarash Feizi, Pascal Wichmann, Arnab Kumar Mondal, Mohammad Reza Samsami, Rabiul Awal, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, and Marco Pedersoli. Rendering-aware re- inforcement learning for vector graphics generation. InThe Thirty-ninth Annual Confere...
work page 2025
-
[35]
Proximal Policy Optimization Algorithms
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 3, 4
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[36]
I-Chao Shen and Bing-Yu Chen. Clipgen: A deep generative model for clipart vectorization and synthesis.IEEE Trans- actions on Visualization and Computer Graphics (TOG), 28 (12):4211–4224, 2022. 3
work page 2022
-
[37]
Clipvg: Text-guided image manipulation using differentiable vector graphics
Yiren Song, Xuning Shao, Kang Chen, Weidong Zhang, Zhongliang Jing, and Minzhe Li. Clipvg: Text-guided image manipulation using differentiable vector graphics. InProceed- ings of the Conference on Artificial Intelligence (AAAI), 2023. 3
work page 2023
-
[38]
Reason-rft: Reinforcement fine-tuning for visual reasoning.arXiv preprint arXiv:2503.20752, 2025
Huajie Tan, Yuheng Ji, Xiaoshuai Hao, Minglan Lin, Pengwei Wang, Zhongyuan Wang, and Shanghang Zhang. Reason- 14 rft: Reinforcement fine-tuning for visual reasoning.arXiv preprint arXiv:2503.20752, 2025. 3
-
[39]
Strokenuwa: tokeniz- ing strokes for vector graphic synthesis
Zecheng Tang, Chenfei Wu, Zekai Zhang, Minheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, and Nan Duan. Strokenuwa: tokeniz- ing strokes for vector graphic synthesis. InProceedings of the 41st International Conference on Machine Learning (ICML). JMLR.org, 2024. 1, 3
work page 2024
-
[40]
Vecfusion: Vector font gen- eration with diffusion
Vikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michaël Gharbi, Oliver Wang, Alec Jacob- son, and Evangelos Kalogerakis. Vecfusion: Vector font gen- eration with diffusion. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 7943–7952, 2024. 1
work page 2024
-
[41]
Nivel: Neural implicit vector layers for text-to-vector generation
Vikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, and Michal Lukac. Nivel: Neural implicit vector layers for text-to-vector generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4589–4597,
-
[42]
Yael Vinker, Ehsan Pajouheshgar, Jessica Y Bo, Roman Chris- tian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. Clipasso: Semantically-aware ob- ject sketching.ACM Transactions on Graphics (TOG), 41(4): 1–11, 2022. 3
work page 2022
-
[43]
Clipascene: Scene sketching with different types and levels of abstraction
Yael Vinker, Yuval Alaluf, Daniel Cohen-Or, and Ariel Shamir. Clipascene: Scene sketching with different types and levels of abstraction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4146–4156, 2023. 3
work page 2023
-
[44]
Svgen: Interpretable vector graphics generation with large language models
Feiyu Wang, Zhiyuan Zhao, Yuandong Liu, Da Zhang, Junyu Gao, Hao Sun, and Xuelong Li. Svgen: Interpretable vector graphics generation with large language models. InProceed- ings of the 33rd ACM International Conference on Multime- dia, page 9608–9617, 2025. 1, 3
work page 2025
-
[45]
Haomin Wang, Jinhui Yin, Qi Wei, Wenguang Zeng, Lixin Gu, Shenglong Ye, Zhangwei Gao, Yaohui Wang, Yanting Zhang, Yuanqi Li, et al. Internsvg: Towards unified svg tasks with multimodal large language models.arXiv preprint arXiv:2510.11341, 2025. 1
-
[46]
Yizhi Wang and Zhouhui Lian. Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning.ACM Transactions on Graphics (TOG), 40(6), 2021. 1, 3
work page 2021
-
[47]
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions.arXiv preprint arXiv:2212.10560, 2022. 3
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[48]
Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang. Unified multimodal chain- of-thought reward model through reinforcement fine-tuning. arXiv preprint arXiv:2505.03318, 2025. 3
-
[49]
Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, and Heng Ji. Visually descrip- tive language model for vector graphics reasoning.Transac- tions on Machine Learning Research. 3
-
[50]
Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Trans
Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Trans. Graph., 42(6), 2023. 1, 3
work page 2023
-
[51]
Chat2svg: Vector graphics generation with large language models and image diffusion models
Ronghuan Wu, Wanchao Su, and Jing Liao. Chat2svg: Vector graphics generation with large language models and image diffusion models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),
-
[52]
Human preference score: Better aligning text-to- image models with human preference
Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hong- sheng Li. Human preference score: Better aligning text-to- image models with human preference. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2096–2105, 2023. 5, 9, 10
work page 2096
-
[53]
DiffSketcher: Text guided vector sketch synthesis through latent diffusion models
Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, and Dong Xu. DiffSketcher: Text guided vector sketch synthesis through latent diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 1, 3, 6, 7, 9
work page 2023
-
[54]
SVGFusion: A VAE-Diffusion Transformer for Vector Graphic Generation
Ximing Xing, Juncheng Hu, Jing Zhang, Dong Xu, and Qian Yu. Svgfusion: Scalable text-to-svg generation via vector space diffusion.arXiv preprint arXiv:2412.10437, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[55]
SVGDreamer: Text guided svg generation with diffusion model
Ximing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, and Qian Yu. SVGDreamer: Text guided svg generation with diffusion model. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 4546–4555, 2024. 1, 3, 6, 7, 9
work page 2024
-
[56]
Empowering llms to understand and gener- ate complex vector graphics
Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, and Qian Yu. Empowering llms to understand and gener- ate complex vector graphics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 3, 5, 6, 7, 9
work page 2025
-
[57]
Ximing Xing, Qian Yu, Chuang Wang, Haitao Zhou, Jing Zhang, and Dong Xu. SVGDreamer++: Advancing editabil- ity and diversity in text-guided svg generation.IEEE Transac- tions on Pattern Analysis and Machine Intelligence (T-PAMI), pages 1–18, 2025. 3
work page 2025
-
[58]
Zhongzheng Xu and Emily Wall. Exploring the capability of llms in performing low-level visual analytic tasks on svg data visualizations. In2024 IEEE Visualization and Visual Analytics (VIS), pages 126–130. IEEE, 2024. 1
work page 2024
-
[59]
An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report.arXiv preprint arXiv:2412.15115, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[60]
Omnisvg: A unified scalable vector graphics generation model
Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Fukun Yin, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, and Yu-Gang Jiang. Omnisvg: A unified scalable vector graphics generation model. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 1, 3
work page 2025
-
[61]
Text-guided vector graphics customization
Peiying Zhang, Nanxuan Zhao, and Jing Liao. Text-guided vector graphics customization. InSIGGRAPH Asia 2023 Conference Papers, New York, NY , USA, 2023. Association for Computing Machinery. 3
work page 2023
-
[62]
Peiying Zhang, Nanxuan Zhao, and Jing Liao. Text-to-vector generation with neural path representation.ACM Transactions on Graphics (TOG), 43(4):1–13, 2024. 1, 3
work page 2024
-
[63]
Beyond pixels: Exploring human-readable svg generation for simple images with vision language models
Tong Zhang, Haoyang Liu, Peiyan Zhang, Yuxuan Cheng, and Haohan Wang. Beyond pixels: Exploring human-readable svg generation for simple images with vision language models. ArXiv, abs/2311.15543, 2023. 3 15
-
[64]
Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, et al. R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025. 3
-
[65]
VG- Bench: Evaluating large language models on vector graphics understanding and generation
Bocheng Zou, Mu Cai, Jianrui Zhang, and Yong Jae Lee. VG- Bench: Evaluating large language models on vector graphics understanding and generation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 3647–3659, Miami, Florida, USA,
work page 2024
-
[66]
Association for Computational Linguistics. 3 16
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.