Reason-SVG: Enhancing Structured Reasoning for Vector Graphics Generation with Reinforcement Learning

Dong Xu; Jing Zhang; Qian Yu; Ximing Xing; Yandong Guan; Ziteng Xue

arxiv: 2505.24499 · v2 · submitted 2025-05-30 · 💻 cs.CV

Reason-SVG: Enhancing Structured Reasoning for Vector Graphics Generation with Reinforcement Learning

Ximing Xing , Ziteng Xue , Yandong Guan , Jing Zhang , Dong Xu , Qian Yu This is my paper

Pith reviewed 2026-05-19 12:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords SVG generationstructured reasoningreinforcement learningDrawing-with-Thoughtlarge language modelsvector graphicshybrid rewardsupervised fine-tuning

0 comments

The pith

Explicit design reasoning during training lets language models create more accurate vector graphics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tries to establish that large language models can generate better Scalable Vector Graphics when they are trained to output explicit design thoughts alongside the code itself. The approach uses supervised fine-tuning on reasoned examples followed by reinforcement learning driven by a hybrid reward that scores both the reasoning quality and the final graphic's structure, semantics, and appearance. A reader would care because current models frequently produce invalid paths, wrong shapes, or visuals that do not match the prompt, limiting practical use in illustration and design. The work also supplies a 10,000-pair dataset of SVG code paired with its design rationale to make the method reproducible.

Core claim

Reason-SVG introduces the Drawing-with-Thought paradigm in which the model must generate both SVG code and explicit design rationales. A first supervised stage on the SVGX-DwT-10k dataset builds basic reasoning ability, after which reinforcement learning with Group Relative Policy Optimization and a hybrid reward refines the outputs for structural validity, semantic alignment, and visual coherence, yielding measurable gains for both language models and vision-language models.

What carries the argument

The Drawing-with-Thought (DwT) paradigm, in which the model produces both SVG code and explicit design rationales that guide generation.

If this is right

Generated SVGs exhibit higher rates of structural validity with fewer broken paths or overlapping shapes.
Semantic alignment improves so that the graphics more closely reflect the content of the input description.
Visual coherence rises, producing results that appear more polished without direct pixel supervision.
The same pipeline lifts performance on both pure language models and those that also process images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same explicit-reasoning training pattern could be applied to other structured code outputs such as HTML layouts or diagram specifications.
Well-designed hybrid rewards might reduce the volume of human-labeled data needed for creative generation tasks.
The stored rationales could later support interactive editing, where a user modifies the reasoning steps rather than the code directly.

Load-bearing premise

The hybrid reward function is assumed to correctly measure the presence of useful design reasoning together with structural, semantic, and visual quality without the model learning to exploit scoring loopholes.

What would settle it

If a model trained with the full DwT-plus-RL pipeline produces SVGs whose rendered images match input prompts no better than a standard supervised baseline, as measured by human ratings or automated structural checks on a held-out prompt set, the benefit of the added reasoning stage would be refuted.

Figures

Figures reproduced from arXiv: 2505.24499 by Dong Xu, Jing Zhang, Qian Yu, Ximing Xing, Yandong Guan, Ziteng Xue.

**Figure 1.** Figure 1: Overview of Reason-SVG. Reason-SVG incorporates structured reasoning through the Drawing-with-Thought (DwT) paradigm, enabling LLMs to synthesize SVGs guided by explicit visual planning and compositional logic. (a) DwT Reasoning Process: An example of the Drawing-with-Thought reasoning process, illustrating structured design decisions across stages such as conceptual design, preliminary design, and detaile… view at source ↗

**Figure 2.** Figure 2: Framework of Reason-SVG. The “Drawing-with-Thought” (DwT, Sec. 4.1) module guides the LLM through a step-by-step visual reasoning process to generate both the SVG code (O) and its corresponding design rationale (C). This process comprises the following stages: a) concept sketching, b) canvas planning, c) shape decomposition, d) coordinate calculation, e) styling and coloring, and f) final assembly. These r… view at source ↗

**Figure 3.** Figure 3: Qualitative results of Reason-SVG. For science diagrams, the model follows the instruction “drawing an SVG-format diagram following prompt” to generate structured plots and analytic charts. Across diverse SVG categories—including Science Diagram, UI/UX, and Complex Scene—Reason-SVG exhibits strong visual reasoning and structural understanding. The proposed DwT reasoning further enables more coherent layout… view at source ↗

read the original abstract

Generating high-quality Scalable Vector Graphics (SVGs) is challenging for Large Language Models (LLMs), as it requires advanced reasoning for structural validity, semantic accuracy, and visual coherence -- areas where current LLMs often struggle. In this work, we introduce Reason-SVG, a novel framework equipped with enhanced structured reasoning for SVG generation. Reason-SVG pioneers the ``Drawing-with-Thought'' (DwT) paradigm, in which models generate both SVG code and explicit design rationales. Reason-SVG follows a two-stage training strategy: First, Supervised Fine-Tuning (SFT) trains the LLM on the DwT paradigm to develop foundational reasoning abilities. Second, Reinforcement Learning (RL), utilizing Group Relative Policy Optimization (GRPO), empowers the model to generate both DwT and SVG rationales through refined, reward-driven reasoning. To enable reasoning-driven SVG generation, we design a Hybrid Reward function that evaluates the presence and effectiveness of DwT reasoning, along with structural validity, semantic alignment, and visual quality. We also introduce the SVGX-DwT-10k dataset, a high-quality corpus of 10k SVG-DwT pairs, where each SVG code is generated based on explicit DwT reasoning. By integrating DwT, SFT, and Hybrid Reward-guided RL, Reason-SVG significantly improves the performance of LLMs and VLMs in generating accurate and visually coherent SVGs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Reason-SVG adds a DwT reasoning step plus GRPO training for SVG code but the abstract gives no numbers or reward details to show it works.

read the letter

The main thing to know is that this paper introduces a Drawing-with-Thought setup where the model outputs both explicit design reasoning and SVG code, trained first by supervised fine-tuning and then by GRPO reinforcement learning with a hybrid reward that scores reasoning presence, structural validity, semantic fit, and visual quality. They also release the SVGX-DwT-10k dataset of 10k such pairs. That combination is new relative to the SVG and reasoning papers they cite, and the two-stage recipe is laid out clearly enough to follow at a high level. The approach targets a real weakness in current LLMs when they try to produce structured vector output without just guessing at the code. The hybrid reward idea makes sense on paper as a way to push the model toward useful intermediate reasoning rather than direct code emission. The soft spot is that the abstract states significant gains without any quantitative results, baseline numbers, or equations for how the reward weights its components. Without those, it is impossible to tell whether the DwT part actually drives better SVGs or whether the policy is just learning to emit plausible-sounding text that scores well. The stress-test concern about reward hacking therefore lands because the description leaves the reward implementation and validation unspecified. This work is aimed at researchers who build LLM pipelines for design tools or controllable graphics generation. A reader interested in new training recipes for structured outputs could pick up the DwT framing and dataset construction even if the performance claims need the full paper to evaluate. I would send it to peer review so the experiments and reward details can be checked properly.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Reason-SVG, a framework for SVG generation that augments LLMs and VLMs with structured reasoning via the new 'Drawing-with-Thought' (DwT) paradigm. Models are trained to output both SVG code and explicit design rationales. Training proceeds in two stages: supervised fine-tuning (SFT) on the introduced SVGX-DwT-10k dataset of 10k SVG-DwT pairs, followed by reinforcement learning with Group Relative Policy Optimization (GRPO) driven by a hybrid reward that scores DwT reasoning presence/effectiveness together with structural validity, semantic alignment, and visual quality. The central claim is that this pipeline yields significant gains in accurate and visually coherent SVG outputs.

Significance. If the empirical results hold with proper controls, the DwT paradigm and the accompanying dataset constitute a useful contribution toward interpretable, reasoning-augmented generation of structured graphics. The two-stage SFT-then-GRPO recipe is a standard template but is applied here to a new domain with a composite reward; credit is due for releasing the SVGX-DwT-10k corpus. The work sits at the intersection of vision-language models and controllable vector graphics, an area of growing practical interest.

major comments (2)

[Abstract and §3] Abstract and §3 (Hybrid Reward): the central performance claim rests on the hybrid reward correctly measuring and incentivizing genuine DwT reasoning rather than superficial rationales. No equations, weighting coefficients, or validation protocol for the reward components are supplied in the abstract or high-level description, leaving open the possibility of reward hacking that inflates metrics without improving SVG coherence.
[§4] §4 (Experiments): the abstract asserts 'significant improvements' for both LLMs and VLMs yet supplies no quantitative results, baseline comparisons, or details on how visual quality is scored. Without these numbers and controls the load-bearing claim that DwT + SFT + GRPO is responsible for the gains cannot be evaluated.

minor comments (2)

Define GRPO and all other acronyms on first use; clarify whether the visual-quality term in the hybrid reward is computed by an automated metric, an LLM judge, or human raters.
Figure captions and dataset statistics should explicitly state the split sizes, diversity of SVG categories, and any filtering criteria applied to the 10k SVG-DwT pairs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve clarity and transparency as indicated.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Hybrid Reward): the central performance claim rests on the hybrid reward correctly measuring and incentivizing genuine DwT reasoning rather than superficial rationales. No equations, weighting coefficients, or validation protocol for the reward components are supplied in the abstract or high-level description, leaving open the possibility of reward hacking that inflates metrics without improving SVG coherence.

Authors: We appreciate the referee's emphasis on reward transparency. While Section 3 describes the four components of the Hybrid Reward (DwT reasoning effectiveness, structural validity, semantic alignment, and visual quality), we acknowledge that explicit equations, specific weighting coefficients, and a validation protocol are not presented at a high level. To address concerns about potential reward hacking, we will revise Section 3 to include the mathematical formulations for each component, the weighting scheme used, and a description of how the reward was validated to prioritize substantive reasoning over superficial outputs. A concise summary of the reward formulation will also be added to the abstract. revision: yes
Referee: §4 (Experiments): the abstract asserts 'significant improvements' for both LLMs and VLMs yet supplies no quantitative results, baseline comparisons, or details on how visual quality is scored. Without these numbers and controls the load-bearing claim that DwT + SFT + GRPO is responsible for the gains cannot be evaluated.

Authors: We agree that the abstract would be strengthened by including key quantitative evidence. The experiments in §4 report results across LLMs and VLMs with baseline comparisons, using metrics for structural validity, semantic alignment, and visual quality (the latter via automated metrics combined with human evaluation protocols detailed in the section). To make the central claims more evaluable from the abstract alone, we will revise the abstract to incorporate specific performance deltas and baseline references while maintaining conciseness. This revision will better substantiate the contributions of the DwT paradigm, SFT, and GRPO stages. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical training pipeline (DwT paradigm + SFT + GRPO RL with a newly designed hybrid reward) applied to a newly introduced dataset (SVGX-DwT-10k). No mathematical derivations, fitted parameters renamed as predictions, or self-citations are used to justify the central performance claims. All load-bearing elements are introduced as novel components whose effectiveness is asserted via training outcomes rather than reducing to prior inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework rests on standard LLM fine-tuning assumptions plus the new DwT format and hybrid reward design. No free parameters are explicitly fitted in the abstract description. The main invented element is the DwT reasoning format itself.

axioms (1)

domain assumption LLMs can be effectively aligned to produce both reasoning text and code via SFT followed by RL with a composite reward.
Invoked in the two-stage training strategy section of the abstract.

invented entities (2)

Drawing-with-Thought (DwT) paradigm no independent evidence
purpose: To force explicit design rationales before SVG code generation
New format introduced to improve structural validity and semantic accuracy.
Hybrid Reward function no independent evidence
purpose: To evaluate DwT presence, structural validity, semantic alignment, and visual quality
Composite reward designed specifically for this task.

pith-pipeline@v0.9.0 · 5795 in / 1353 out tokens · 22534 ms · 2026-05-19T12:56:33.516913+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we design a Hybrid Reward function that evaluates the presence and effectiveness of DwT reasoning, along with structural validity, semantic alignment, and visual quality
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Drawing-with-Thought (DwT) paradigm, in which models generate both SVG code and explicit design rationales

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
cs.CV 2026-05 unverdicted novelty 7.0

VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.
Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback
cs.CV 2026-04 unverdicted novelty 7.0

Render-in-the-Loop reformulates SVG generation as a step-wise visual-context-aware process using self-feedback from rendered intermediate states, VSF training, and RaV inference to outperform baselines on MMSVGBench f...
AmodalSVG: Amodal Image Vectorization via Semantic Layer Peeling
cs.CV 2026-04 unverdicted novelty 7.0

AmodalSVG produces semantically separate and geometrically complete SVG layers from natural images by using VLM-guided semantic layer peeling for amodal completion followed by adaptive vectorization.
Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis
cs.LG 2026-04 unverdicted novelty 7.0

Element-level leave-one-out analysis yields per-element quality scores and four structural metrics (purity, coverage, compactness, locality) that quantify SVG modularity and enable artifact detection.
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
cs.LG 2026-04 unverdicted novelty 7.0

HiVG introduces hierarchical SVG tokenization with atomic and segment tokens plus HMN initialization to enable more efficient and stable autoregressive generation of vector graphics programs.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages · cited by 5 Pith papers · 8 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 3, 6, 7, 9

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Claude 3.5 sonnet

Anthropic. Claude 3.5 sonnet. https : / / www . anthropic . com / news / claude - 3 - 5 - sonnet, 2024

work page 2024
[3]

Claude 3.7 sonnet and claude code

Anthropic. Claude 3.7 sonnet and claude code. https: / / www . anthropic . com / news / claude - 3 - 7 - sonnet, 2025. 3, 6, 7, 9

work page 2025
[4]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 5, 6, 7, 9, 12

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Deepsvg: A hierarchical generative network for vector graphics animation.Advances in Neural Informa- tion Processing Systems (NeurIPS), 33:16351–16361, 2020

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. Deepsvg: A hierarchical generative network for vector graphics animation.Advances in Neural Informa- tion Processing Systems (NeurIPS), 33:16351–16361, 2020. 3

work page 2020
[6]

CairoSVG: A Simple SVG Con- verter based on Cairo

CourtBouillon. CairoSVG: A Simple SVG Con- verter based on Cairo. https : / / cairosvg . org / documentation/, 2024. Version 2.7.1 or later. Accessed: 2025-05-14. 5, 9, 10, 11

work page 2024
[7]

Draw with thought: Unleashing multimodal reasoning for scientific diagram generation.arXiv preprint arXiv:2504.09479, 2025

Zhiqing Cui, Jiahao Yuan, Hanqing Wang, Yanshu Li, Chenxu Du, and Zhenglong Ding. Draw with thought: Unleashing multimodal reasoning for scientific diagram generation.arXiv preprint arXiv:2504.09479, 2025. 3

work page arXiv 2025
[8]

Gemini 2.5 pro - best for coding and complex prompts

DeepMind. Gemini 2.5 pro - best for coding and complex prompts. https : / / deepmind . google / technologies/gemini/pro/, 2024. 6, 9, 11

work page 2024
[9]

Shuguang Dou, Xinyang Jiang, Lu Liu, Lu Ying, Caihua Shan, Yifei Shen, Xuanyi Dong, Yun Wang, Dongsheng Li, and Cairong Zhao. Hierarchically recognizing vector graphics and a new chart-based vector graphics dataset.IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 46 (12):7556–7573, 2024. 3

work page 2024
[10]

CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders

Kevin Frans, Lisa Soros, and Olaf Witkowski. CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 1, 3

work page 2022
[11]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2, 3, 4, 6, 7, 9

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

A neural representation of sketch drawings

David Ha and Douglas Eck. A neural representation of sketch drawings. InInternational Conference on Learning Represen- tations (ICLR), 2018. 3

work page 2018
[13]

Vec- torpainter: Advanced stylized vector graphics synthesis using stroke-style priors

Juncheng Hu, Ximing Xing, Jing Zhang, and Qian Yu. Vec- torpainter: Advanced stylized vector graphics synthesis using stroke-style priors. InIEEE International Conference on Multimedia and Expo (ICME). IEEE, 2025. 1, 3

work page 2025
[14]

Supersvg: Superpixel-based scalable vector graphics synthesis

Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L Rosin, and Yu-Kun Lai. Supersvg: Superpixel-based scalable vector graphics synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24892–24901, 2024. 3

work page 2024
[15]

Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4), 2023

Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, and Ariel Shamir. Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4), 2023

work page 2023
[16]

Vectorfusion: Text- to-svg by abstracting pixel-based diffusion models

Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text- to-svg by abstracting pixel-based diffusion models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 1, 3, 6, 7, 9

work page 2023
[17]

Recognizing vector graphics without rasterization

Xinyang Jiang, Lu Liu, Caihua Shan, Yifei Shen, Xuanyi Dong, and Dongsheng Li. Recognizing vector graphics without rasterization. InProceedings of the 35th Interna- tional Conference on Neural Information Processing Systems (NeurIPS), Red Hook, NY , USA, 2021. Curran Associates Inc. 3

work page 2021
[18]

Unisvg: A unified dataset for vector graphic understanding and genera- tion with multimodal large language models

Jinke Li, Jiarui Yu, Chenxing Wei, Hande Dong, Qiang Lin, Liangjing Yang, Zhicai Wang, and Yanbin Hao. Unisvg: A unified dataset for vector graphic understanding and genera- tion with multimodal large language models. InProceedings 13 of the 33rd ACM International Conference on Multimedia, pages 13156–13163, 2025. 1, 3

work page 2025
[19]

Starcoder: may the source be with you! Transactions on Machine Learning Research (TMLR), 2023

Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muen- nighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia LI, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier De- haene, Joel Lamy-Poirier, Joao Monteiro, Nicolas Gontier, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Ben Lip- kin, Muhtasham Oblokul...

work page 2023
[20]

Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):193:1–193:15, 2020

Tzu-Mao Li, Michal Lukáˇc, Gharbi Michaël, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):193:1–193:15, 2020. 1, 3

work page 2020
[21]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[22]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. InThirty-seventh Conference on Neural Information Processing Systems (NeurIP), 2023. 3

work page 2023
[23]

A learned representation for scalable vec- tor graphics

Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vec- tor graphics. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 1, 3

work page 2019
[24]

Towards layer-wise image vectorization

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer-wise image vectorization. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 16314–16323, 2022. 3

work page 2022
[25]

Chart4blind: An intelligent interface for chart accessibility conversion

Omar Moured, Morris Baumgarten-Egemole, Karin Müller, Alina Roitberg, Thorsten Schwarz, and Rainer Stiefelhagen. Chart4blind: An intelligent interface for chart accessibility conversion. InProceedings of the 29th International Con- ference on Intelligent User Interfaces, pages 504–514, 2024. 1

work page 2024
[26]

Svgeditbench: A bench- mark dataset for quantitative assessment of llm’s svg editing capabilities

Kunato Nishina and Yusuke Matsui. Svgeditbench: A bench- mark dataset for quantitative assessment of llm’s svg editing capabilities. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8142–8147,

work page
[27]

Introducing openai o3 and o4-mini

OpenAI. Introducing openai o3 and o4-mini. https:// openai.com/index/introducing- o3- and- o4- mini/, 2025. 3, 6

work page 2025
[28]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patrick...

work page 2024
[29]

Training language models to follow instructions with human feedback.Ad- vances in neural information processing systems (NeurIP), 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Car- roll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Ad- vances in neural information processing systems (NeurIP), 35:27730–27744, 2022. 3

work page 2022
[30]

Neuralsvg: An implicit repre- sentation for text-to-vector generation.arXiv preprint arXiv:2501.03992, 2025

Sagi Polaczek, Yuval Alaluf, Elad Richardson, Yael Vinker, and Daniel Cohen-Or. Neuralsvg: An implicit repre- sentation for text-to-vector generation.arXiv preprint arXiv:2501.03992, 2025. 3

work page arXiv 2025
[31]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021. 1, 3, 5, 9, 10

work page 2021
[32]

Im2vec: Synthesizing vector graphics without vector supervision

Pradyumna Reddy, Michael Gharbi, Michal Lukac, and Niloy J Mitra. Im2vec: Synthesizing vector graphics without vector supervision. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7342–7351, 2021. 3

work page 2021
[33]

Starvector: Generating scalable vector graphics code from images

Juan A Rodriguez, Shubham Agarwal, Issam H Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, and Marco Ped- ersoli. Starvector: Generating scalable vector graphics code from images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 3, 6, 7, 9

work page 2025
[34]

Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Rishav Pramanik, Aarash Feizi, Pascal Wichmann, Arnab Kumar Mondal, Mohammad Reza Samsami, Rabiul Awal, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, and Marco Pedersoli. Rendering-aware re- inforcement learning for vector graphics generation. InThe Thirty-ninth Annual Confere...

work page 2025
[35]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2017
[36]

Clipgen: A deep generative model for clipart vectorization and synthesis.IEEE Trans- actions on Visualization and Computer Graphics (TOG), 28 (12):4211–4224, 2022

I-Chao Shen and Bing-Yu Chen. Clipgen: A deep generative model for clipart vectorization and synthesis.IEEE Trans- actions on Visualization and Computer Graphics (TOG), 28 (12):4211–4224, 2022. 3

work page 2022
[37]

Clipvg: Text-guided image manipulation using differentiable vector graphics

Yiren Song, Xuning Shao, Kang Chen, Weidong Zhang, Zhongliang Jing, and Minzhe Li. Clipvg: Text-guided image manipulation using differentiable vector graphics. InProceed- ings of the Conference on Artificial Intelligence (AAAI), 2023. 3

work page 2023
[38]

Reason-rft: Reinforcement fine-tuning for visual reasoning.arXiv preprint arXiv:2503.20752, 2025

Huajie Tan, Yuheng Ji, Xiaoshuai Hao, Minglan Lin, Pengwei Wang, Zhongyuan Wang, and Shanghang Zhang. Reason- 14 rft: Reinforcement fine-tuning for visual reasoning.arXiv preprint arXiv:2503.20752, 2025. 3

work page arXiv 2025
[39]

Strokenuwa: tokeniz- ing strokes for vector graphic synthesis

Zecheng Tang, Chenfei Wu, Zekai Zhang, Minheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, and Nan Duan. Strokenuwa: tokeniz- ing strokes for vector graphic synthesis. InProceedings of the 41st International Conference on Machine Learning (ICML). JMLR.org, 2024. 1, 3

work page 2024
[40]

Vecfusion: Vector font gen- eration with diffusion

Vikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michaël Gharbi, Oliver Wang, Alec Jacob- son, and Evangelos Kalogerakis. Vecfusion: Vector font gen- eration with diffusion. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 7943–7952, 2024. 1

work page 2024
[41]

Nivel: Neural implicit vector layers for text-to-vector generation

Vikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, and Michal Lukac. Nivel: Neural implicit vector layers for text-to-vector generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4589–4597,

work page
[42]

Clipasso: Semantically-aware ob- ject sketching.ACM Transactions on Graphics (TOG), 41(4): 1–11, 2022

Yael Vinker, Ehsan Pajouheshgar, Jessica Y Bo, Roman Chris- tian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. Clipasso: Semantically-aware ob- ject sketching.ACM Transactions on Graphics (TOG), 41(4): 1–11, 2022. 3

work page 2022
[43]

Clipascene: Scene sketching with different types and levels of abstraction

Yael Vinker, Yuval Alaluf, Daniel Cohen-Or, and Ariel Shamir. Clipascene: Scene sketching with different types and levels of abstraction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4146–4156, 2023. 3

work page 2023
[44]

Svgen: Interpretable vector graphics generation with large language models

Feiyu Wang, Zhiyuan Zhao, Yuandong Liu, Da Zhang, Junyu Gao, Hao Sun, and Xuelong Li. Svgen: Interpretable vector graphics generation with large language models. InProceed- ings of the 33rd ACM International Conference on Multime- dia, page 9608–9617, 2025. 1, 3

work page 2025
[45]

Internsvg: Towards unified svg tasks with multimodal large language models.arXiv preprint arXiv:2510.11341, 2025

Haomin Wang, Jinhui Yin, Qi Wei, Wenguang Zeng, Lixin Gu, Shenglong Ye, Zhangwei Gao, Yaohui Wang, Yanting Zhang, Yuanqi Li, et al. Internsvg: Towards unified svg tasks with multimodal large language models.arXiv preprint arXiv:2510.11341, 2025. 1

work page arXiv 2025
[46]

Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning.ACM Transactions on Graphics (TOG), 40(6), 2021

Yizhi Wang and Zhouhui Lian. Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning.ACM Transactions on Graphics (TOG), 40(6), 2021. 1, 3

work page 2021
[47]

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions.arXiv preprint arXiv:2212.10560, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022
[48]

Unified multimodal chain-of-thought reward model through reinforcement fine-tuning.arXiv:2505.03318, 2025

Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang. Unified multimodal chain- of-thought reward model through reinforcement fine-tuning. arXiv preprint arXiv:2505.03318, 2025. 3

work page arXiv 2025
[49]

Visually descrip- tive language model for vector graphics reasoning.Transac- tions on Machine Learning Research

Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, and Heng Ji. Visually descrip- tive language model for vector graphics reasoning.Transac- tions on Machine Learning Research. 3

work page
[50]

Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Trans

Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Trans. Graph., 42(6), 2023. 1, 3

work page 2023
[51]

Chat2svg: Vector graphics generation with large language models and image diffusion models

Ronghuan Wu, Wanchao Su, and Jing Liao. Chat2svg: Vector graphics generation with large language models and image diffusion models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

work page
[52]

Human preference score: Better aligning text-to- image models with human preference

Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hong- sheng Li. Human preference score: Better aligning text-to- image models with human preference. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2096–2105, 2023. 5, 9, 10

work page 2096
[53]

DiffSketcher: Text guided vector sketch synthesis through latent diffusion models

Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, and Dong Xu. DiffSketcher: Text guided vector sketch synthesis through latent diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 1, 3, 6, 7, 9

work page 2023
[54]

SVGFusion: A VAE-Diffusion Transformer for Vector Graphic Generation

Ximing Xing, Juncheng Hu, Jing Zhang, Dong Xu, and Qian Yu. Svgfusion: Scalable text-to-svg generation via vector space diffusion.arXiv preprint arXiv:2412.10437, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[55]

SVGDreamer: Text guided svg generation with diffusion model

Ximing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, and Qian Yu. SVGDreamer: Text guided svg generation with diffusion model. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 4546–4555, 2024. 1, 3, 6, 7, 9

work page 2024
[56]

Empowering llms to understand and gener- ate complex vector graphics

Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, and Qian Yu. Empowering llms to understand and gener- ate complex vector graphics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 3, 5, 6, 7, 9

work page 2025
[57]

SVGDreamer++: Advancing editabil- ity and diversity in text-guided svg generation.IEEE Transac- tions on Pattern Analysis and Machine Intelligence (T-PAMI), pages 1–18, 2025

Ximing Xing, Qian Yu, Chuang Wang, Haitao Zhou, Jing Zhang, and Dong Xu. SVGDreamer++: Advancing editabil- ity and diversity in text-guided svg generation.IEEE Transac- tions on Pattern Analysis and Machine Intelligence (T-PAMI), pages 1–18, 2025. 3

work page 2025
[58]

Exploring the capability of llms in performing low-level visual analytic tasks on svg data visualizations

Zhongzheng Xu and Emily Wall. Exploring the capability of llms in performing low-level visual analytic tasks on svg data visualizations. In2024 IEEE Visualization and Visual Analytics (VIS), pages 126–130. IEEE, 2024. 1

work page 2024
[59]

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report.arXiv preprint arXiv:2412.15115, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[60]

Omnisvg: A unified scalable vector graphics generation model

Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Fukun Yin, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, and Yu-Gang Jiang. Omnisvg: A unified scalable vector graphics generation model. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 1, 3

work page 2025
[61]

Text-guided vector graphics customization

Peiying Zhang, Nanxuan Zhao, and Jing Liao. Text-guided vector graphics customization. InSIGGRAPH Asia 2023 Conference Papers, New York, NY , USA, 2023. Association for Computing Machinery. 3

work page 2023
[62]

Text-to-vector generation with neural path representation.ACM Transactions on Graphics (TOG), 43(4):1–13, 2024

Peiying Zhang, Nanxuan Zhao, and Jing Liao. Text-to-vector generation with neural path representation.ACM Transactions on Graphics (TOG), 43(4):1–13, 2024. 1, 3

work page 2024
[63]

Beyond pixels: Exploring human-readable svg generation for simple images with vision language models

Tong Zhang, Haoyang Liu, Peiyan Zhang, Yuxuan Cheng, and Haohan Wang. Beyond pixels: Exploring human-readable svg generation for simple images with vision language models. ArXiv, abs/2311.15543, 2023. 3 15

work page arXiv 2023
[64]

R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025

Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, et al. R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025. 3

work page arXiv 2025
[65]

VG- Bench: Evaluating large language models on vector graphics understanding and generation

Bocheng Zou, Mu Cai, Jianrui Zhang, and Yong Jae Lee. VG- Bench: Evaluating large language models on vector graphics understanding and generation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 3647–3659, Miami, Florida, USA,

work page 2024
[66]

Association for Computational Linguistics. 3 16

work page

[1] [1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023. 3, 6, 7, 9

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Claude 3.5 sonnet

Anthropic. Claude 3.5 sonnet. https : / / www . anthropic . com / news / claude - 3 - 5 - sonnet, 2024

work page 2024

[3] [3]

Claude 3.7 sonnet and claude code

Anthropic. Claude 3.7 sonnet and claude code. https: / / www . anthropic . com / news / claude - 3 - 7 - sonnet, 2025. 3, 6, 7, 9

work page 2025

[4] [4]

Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, et al. Qwen2. 5-vl technical report.arXiv preprint arXiv:2502.13923, 2025. 5, 6, 7, 9, 12

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

Deepsvg: A hierarchical generative network for vector graphics animation.Advances in Neural Informa- tion Processing Systems (NeurIPS), 33:16351–16361, 2020

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. Deepsvg: A hierarchical generative network for vector graphics animation.Advances in Neural Informa- tion Processing Systems (NeurIPS), 33:16351–16361, 2020. 3

work page 2020

[6] [6]

CairoSVG: A Simple SVG Con- verter based on Cairo

CourtBouillon. CairoSVG: A Simple SVG Con- verter based on Cairo. https : / / cairosvg . org / documentation/, 2024. Version 2.7.1 or later. Accessed: 2025-05-14. 5, 9, 10, 11

work page 2024

[7] [7]

Draw with thought: Unleashing multimodal reasoning for scientific diagram generation.arXiv preprint arXiv:2504.09479, 2025

Zhiqing Cui, Jiahao Yuan, Hanqing Wang, Yanshu Li, Chenxu Du, and Zhenglong Ding. Draw with thought: Unleashing multimodal reasoning for scientific diagram generation.arXiv preprint arXiv:2504.09479, 2025. 3

work page arXiv 2025

[8] [8]

Gemini 2.5 pro - best for coding and complex prompts

DeepMind. Gemini 2.5 pro - best for coding and complex prompts. https : / / deepmind . google / technologies/gemini/pro/, 2024. 6, 9, 11

work page 2024

[9] [9]

Shuguang Dou, Xinyang Jiang, Lu Liu, Lu Ying, Caihua Shan, Yifei Shen, Xuanyi Dong, Yun Wang, Dongsheng Li, and Cairong Zhao. Hierarchically recognizing vector graphics and a new chart-based vector graphics dataset.IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 46 (12):7556–7573, 2024. 3

work page 2024

[10] [10]

CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders

Kevin Frans, Lisa Soros, and Olaf Witkowski. CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders. InAdvances in Neural Information Processing Systems (NeurIPS), 2022. 1, 3

work page 2022

[11] [11]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 2, 3, 4, 6, 7, 9

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

A neural representation of sketch drawings

David Ha and Douglas Eck. A neural representation of sketch drawings. InInternational Conference on Learning Represen- tations (ICLR), 2018. 3

work page 2018

[13] [13]

Vec- torpainter: Advanced stylized vector graphics synthesis using stroke-style priors

Juncheng Hu, Ximing Xing, Jing Zhang, and Qian Yu. Vec- torpainter: Advanced stylized vector graphics synthesis using stroke-style priors. InIEEE International Conference on Multimedia and Expo (ICME). IEEE, 2025. 1, 3

work page 2025

[14] [14]

Supersvg: Superpixel-based scalable vector graphics synthesis

Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L Rosin, and Yu-Kun Lai. Supersvg: Superpixel-based scalable vector graphics synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24892–24901, 2024. 3

work page 2024

[15] [15]

Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4), 2023

Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, and Ariel Shamir. Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4), 2023

work page 2023

[16] [16]

Vectorfusion: Text- to-svg by abstracting pixel-based diffusion models

Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text- to-svg by abstracting pixel-based diffusion models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 1, 3, 6, 7, 9

work page 2023

[17] [17]

Recognizing vector graphics without rasterization

Xinyang Jiang, Lu Liu, Caihua Shan, Yifei Shen, Xuanyi Dong, and Dongsheng Li. Recognizing vector graphics without rasterization. InProceedings of the 35th Interna- tional Conference on Neural Information Processing Systems (NeurIPS), Red Hook, NY , USA, 2021. Curran Associates Inc. 3

work page 2021

[18] [18]

Unisvg: A unified dataset for vector graphic understanding and genera- tion with multimodal large language models

Jinke Li, Jiarui Yu, Chenxing Wei, Hande Dong, Qiang Lin, Liangjing Yang, Zhicai Wang, and Yanbin Hao. Unisvg: A unified dataset for vector graphic understanding and genera- tion with multimodal large language models. InProceedings 13 of the 33rd ACM International Conference on Multimedia, pages 13156–13163, 2025. 1, 3

work page 2025

[19] [19]

Starcoder: may the source be with you! Transactions on Machine Learning Research (TMLR), 2023

Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muen- nighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia LI, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier De- haene, Joel Lamy-Poirier, Joao Monteiro, Nicolas Gontier, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Ben Lip- kin, Muhtasham Oblokul...

work page 2023

[20] [20]

Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):193:1–193:15, 2020

Tzu-Mao Li, Michal Lukáˇc, Gharbi Michaël, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):193:1–193:15, 2020. 1, 3

work page 2020

[21] [21]

DeepSeek-V3 Technical Report

Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[22] [22]

Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. InThirty-seventh Conference on Neural Information Processing Systems (NeurIP), 2023. 3

work page 2023

[23] [23]

A learned representation for scalable vec- tor graphics

Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vec- tor graphics. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 1, 3

work page 2019

[24] [24]

Towards layer-wise image vectorization

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer-wise image vectorization. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 16314–16323, 2022. 3

work page 2022

[25] [25]

Chart4blind: An intelligent interface for chart accessibility conversion

Omar Moured, Morris Baumgarten-Egemole, Karin Müller, Alina Roitberg, Thorsten Schwarz, and Rainer Stiefelhagen. Chart4blind: An intelligent interface for chart accessibility conversion. InProceedings of the 29th International Con- ference on Intelligent User Interfaces, pages 504–514, 2024. 1

work page 2024

[26] [26]

Svgeditbench: A bench- mark dataset for quantitative assessment of llm’s svg editing capabilities

Kunato Nishina and Yusuke Matsui. Svgeditbench: A bench- mark dataset for quantitative assessment of llm’s svg editing capabilities. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8142–8147,

work page

[27] [27]

Introducing openai o3 and o4-mini

OpenAI. Introducing openai o3 and o4-mini. https:// openai.com/index/introducing- o3- and- o4- mini/, 2025. 3, 6

work page 2025

[28] [28]

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Je- gou, Julien Mairal, Patrick...

work page 2024

[29] [29]

Training language models to follow instructions with human feedback.Ad- vances in neural information processing systems (NeurIP), 35:27730–27744, 2022

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Car- roll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Ad- vances in neural information processing systems (NeurIP), 35:27730–27744, 2022. 3

work page 2022

[30] [30]

Neuralsvg: An implicit repre- sentation for text-to-vector generation.arXiv preprint arXiv:2501.03992, 2025

Sagi Polaczek, Yuval Alaluf, Elad Richardson, Yael Vinker, and Daniel Cohen-Or. Neuralsvg: An implicit repre- sentation for text-to-vector generation.arXiv preprint arXiv:2501.03992, 2025. 3

work page arXiv 2025

[31] [31]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021. 1, 3, 5, 9, 10

work page 2021

[32] [32]

Im2vec: Synthesizing vector graphics without vector supervision

Pradyumna Reddy, Michael Gharbi, Michal Lukac, and Niloy J Mitra. Im2vec: Synthesizing vector graphics without vector supervision. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 7342–7351, 2021. 3

work page 2021

[33] [33]

Starvector: Generating scalable vector graphics code from images

Juan A Rodriguez, Shubham Agarwal, Issam H Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, and Marco Ped- ersoli. Starvector: Generating scalable vector graphics code from images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 3, 6, 7, 9

work page 2025

[34] [34]

Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Rishav Pramanik, Aarash Feizi, Pascal Wichmann, Arnab Kumar Mondal, Mohammad Reza Samsami, Rabiul Awal, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, and Marco Pedersoli. Rendering-aware re- inforcement learning for vector graphics generation. InThe Thirty-ninth Annual Confere...

work page 2025

[35] [35]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Rad- ford, and Oleg Klimov. Proximal policy optimization algo- rithms.arXiv preprint arXiv:1707.06347, 2017. 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2017

[36] [36]

Clipgen: A deep generative model for clipart vectorization and synthesis.IEEE Trans- actions on Visualization and Computer Graphics (TOG), 28 (12):4211–4224, 2022

I-Chao Shen and Bing-Yu Chen. Clipgen: A deep generative model for clipart vectorization and synthesis.IEEE Trans- actions on Visualization and Computer Graphics (TOG), 28 (12):4211–4224, 2022. 3

work page 2022

[37] [37]

Clipvg: Text-guided image manipulation using differentiable vector graphics

Yiren Song, Xuning Shao, Kang Chen, Weidong Zhang, Zhongliang Jing, and Minzhe Li. Clipvg: Text-guided image manipulation using differentiable vector graphics. InProceed- ings of the Conference on Artificial Intelligence (AAAI), 2023. 3

work page 2023

[38] [38]

Reason-rft: Reinforcement fine-tuning for visual reasoning.arXiv preprint arXiv:2503.20752, 2025

Huajie Tan, Yuheng Ji, Xiaoshuai Hao, Minglan Lin, Pengwei Wang, Zhongyuan Wang, and Shanghang Zhang. Reason- 14 rft: Reinforcement fine-tuning for visual reasoning.arXiv preprint arXiv:2503.20752, 2025. 3

work page arXiv 2025

[39] [39]

Strokenuwa: tokeniz- ing strokes for vector graphic synthesis

Zecheng Tang, Chenfei Wu, Zekai Zhang, Minheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, and Nan Duan. Strokenuwa: tokeniz- ing strokes for vector graphic synthesis. InProceedings of the 41st International Conference on Machine Learning (ICML). JMLR.org, 2024. 1, 3

work page 2024

[40] [40]

Vecfusion: Vector font gen- eration with diffusion

Vikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michaël Gharbi, Oliver Wang, Alec Jacob- son, and Evangelos Kalogerakis. Vecfusion: Vector font gen- eration with diffusion. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 7943–7952, 2024. 1

work page 2024

[41] [41]

Nivel: Neural implicit vector layers for text-to-vector generation

Vikas Thamizharasan, Difan Liu, Matthew Fisher, Nanxuan Zhao, Evangelos Kalogerakis, and Michal Lukac. Nivel: Neural implicit vector layers for text-to-vector generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4589–4597,

work page

[42] [42]

Clipasso: Semantically-aware ob- ject sketching.ACM Transactions on Graphics (TOG), 41(4): 1–11, 2022

Yael Vinker, Ehsan Pajouheshgar, Jessica Y Bo, Roman Chris- tian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. Clipasso: Semantically-aware ob- ject sketching.ACM Transactions on Graphics (TOG), 41(4): 1–11, 2022. 3

work page 2022

[43] [43]

Clipascene: Scene sketching with different types and levels of abstraction

Yael Vinker, Yuval Alaluf, Daniel Cohen-Or, and Ariel Shamir. Clipascene: Scene sketching with different types and levels of abstraction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4146–4156, 2023. 3

work page 2023

[44] [44]

Svgen: Interpretable vector graphics generation with large language models

Feiyu Wang, Zhiyuan Zhao, Yuandong Liu, Da Zhang, Junyu Gao, Hao Sun, and Xuelong Li. Svgen: Interpretable vector graphics generation with large language models. InProceed- ings of the 33rd ACM International Conference on Multime- dia, page 9608–9617, 2025. 1, 3

work page 2025

[45] [45]

Internsvg: Towards unified svg tasks with multimodal large language models.arXiv preprint arXiv:2510.11341, 2025

Haomin Wang, Jinhui Yin, Qi Wei, Wenguang Zeng, Lixin Gu, Shenglong Ye, Zhangwei Gao, Yaohui Wang, Yanting Zhang, Yuanqi Li, et al. Internsvg: Towards unified svg tasks with multimodal large language models.arXiv preprint arXiv:2510.11341, 2025. 1

work page arXiv 2025

[46] [46]

Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning.ACM Transactions on Graphics (TOG), 40(6), 2021

Yizhi Wang and Zhouhui Lian. Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning.ACM Transactions on Graphics (TOG), 40(6), 2021. 1, 3

work page 2021

[47] [47]

Self-Instruct: Aligning Language Models with Self-Generated Instructions

Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A Smith, Daniel Khashabi, and Hannaneh Hajishirzi. Self-instruct: Aligning language models with self-generated instructions.arXiv preprint arXiv:2212.10560, 2022. 3

work page internal anchor Pith review Pith/arXiv arXiv 2022

[48] [48]

Unified multimodal chain-of-thought reward model through reinforcement fine-tuning.arXiv:2505.03318, 2025

Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang. Unified multimodal chain- of-thought reward model through reinforcement fine-tuning. arXiv preprint arXiv:2505.03318, 2025. 3

work page arXiv 2025

[49] [49]

Visually descrip- tive language model for vector graphics reasoning.Transac- tions on Machine Learning Research

Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, and Heng Ji. Visually descrip- tive language model for vector graphics reasoning.Transac- tions on Machine Learning Research. 3

work page

[50] [50]

Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Trans

Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Icon- shop: Text-guided vector icon synthesis with autoregressive transformers.ACM Trans. Graph., 42(6), 2023. 1, 3

work page 2023

[51] [51]

Chat2svg: Vector graphics generation with large language models and image diffusion models

Ronghuan Wu, Wanchao Su, and Jing Liao. Chat2svg: Vector graphics generation with large language models and image diffusion models. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR),

work page

[52] [52]

Human preference score: Better aligning text-to- image models with human preference

Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hong- sheng Li. Human preference score: Better aligning text-to- image models with human preference. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2096–2105, 2023. 5, 9, 10

work page 2096

[53] [53]

DiffSketcher: Text guided vector sketch synthesis through latent diffusion models

Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, and Dong Xu. DiffSketcher: Text guided vector sketch synthesis through latent diffusion models. InAdvances in Neural Information Processing Systems (NeurIPS), 2023. 1, 3, 6, 7, 9

work page 2023

[54] [54]

SVGFusion: A VAE-Diffusion Transformer for Vector Graphic Generation

Ximing Xing, Juncheng Hu, Jing Zhang, Dong Xu, and Qian Yu. Svgfusion: Scalable text-to-svg generation via vector space diffusion.arXiv preprint arXiv:2412.10437, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[55] [55]

SVGDreamer: Text guided svg generation with diffusion model

Ximing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, and Qian Yu. SVGDreamer: Text guided svg generation with diffusion model. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 4546–4555, 2024. 1, 3, 6, 7, 9

work page 2024

[56] [56]

Empowering llms to understand and gener- ate complex vector graphics

Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, and Qian Yu. Empowering llms to understand and gener- ate complex vector graphics. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025. 1, 3, 5, 6, 7, 9

work page 2025

[57] [57]

SVGDreamer++: Advancing editabil- ity and diversity in text-guided svg generation.IEEE Transac- tions on Pattern Analysis and Machine Intelligence (T-PAMI), pages 1–18, 2025

Ximing Xing, Qian Yu, Chuang Wang, Haitao Zhou, Jing Zhang, and Dong Xu. SVGDreamer++: Advancing editabil- ity and diversity in text-guided svg generation.IEEE Transac- tions on Pattern Analysis and Machine Intelligence (T-PAMI), pages 1–18, 2025. 3

work page 2025

[58] [58]

Exploring the capability of llms in performing low-level visual analytic tasks on svg data visualizations

Zhongzheng Xu and Emily Wall. Exploring the capability of llms in performing low-level visual analytic tasks on svg data visualizations. In2024 IEEE Visualization and Visual Analytics (VIS), pages 126–130. IEEE, 2024. 1

work page 2024

[59] [59]

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al. Qwen2. 5 technical report.arXiv preprint arXiv:2412.15115, 2024. 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[60] [60]

Omnisvg: A unified scalable vector graphics generation model

Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Fukun Yin, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, and Yu-Gang Jiang. Omnisvg: A unified scalable vector graphics generation model. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 1, 3

work page 2025

[61] [61]

Text-guided vector graphics customization

Peiying Zhang, Nanxuan Zhao, and Jing Liao. Text-guided vector graphics customization. InSIGGRAPH Asia 2023 Conference Papers, New York, NY , USA, 2023. Association for Computing Machinery. 3

work page 2023

[62] [62]

Text-to-vector generation with neural path representation.ACM Transactions on Graphics (TOG), 43(4):1–13, 2024

Peiying Zhang, Nanxuan Zhao, and Jing Liao. Text-to-vector generation with neural path representation.ACM Transactions on Graphics (TOG), 43(4):1–13, 2024. 1, 3

work page 2024

[63] [63]

Beyond pixels: Exploring human-readable svg generation for simple images with vision language models

Tong Zhang, Haoyang Liu, Peiyan Zhang, Yuxuan Cheng, and Haohan Wang. Beyond pixels: Exploring human-readable svg generation for simple images with vision language models. ArXiv, abs/2311.15543, 2023. 3 15

work page arXiv 2023

[64] [64]

R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025

Yi-Fan Zhang, Xingyu Lu, Xiao Hu, Chaoyou Fu, Bin Wen, Tianke Zhang, Changyi Liu, Kaiyu Jiang, Kaibing Chen, Kaiyu Tang, et al. R1-reward: Training multimodal reward model through stable reinforcement learning.arXiv preprint arXiv:2505.02835, 2025. 3

work page arXiv 2025

[65] [65]

VG- Bench: Evaluating large language models on vector graphics understanding and generation

Bocheng Zou, Mu Cai, Jianrui Zhang, and Yong Jae Lee. VG- Bench: Evaluating large language models on vector graphics understanding and generation. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP), pages 3647–3659, Miami, Florida, USA,

work page 2024

[66] [66]

Association for Computational Linguistics. 3 16

work page