arxiv: 2604.10940 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

AmodalSVG: Amodal Image Vectorization via Semantic Layer Peeling

Juncheng Hu , Ziteng Xue , Guotao Liang , Anran Qi , Buyu Li , Sheng Wang , Dong Xu , Qian Yu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords amodal vectorizationsemantic layer peelingSVGimage vectorizationocclusion inpaintingvector graphics editingVLM-guided decomposition

0 comments

The pith

AmodalSVG reconstructs natural images into separate editable vector layers that include the full geometry of each object, even parts hidden by occlusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AmodalSVG as a way to convert photographs into vector graphics where every object appears as its own complete shape rather than stopping at visible edges. It does this by first breaking the image into semantic layers with the help of a vision-language model that guides progressive separation and uses inpainting to fill in occluded areas, then vectorizing each layer on its own. The result is a set of independent SVGs that preserve object identities and allow direct edits to individual elements without affecting the rest of the image. This addresses the limitation of standard vectorization tools that produce entangled and incomplete representations when objects overlap.

Core claim

AmodalSVG reformulates image vectorization as a two-stage process that first applies Semantic Layer Peeling to decompose an input image into amodally complete semantic layers through VLM-guided progressive decomposition and hybrid inpainting to recover occluded object appearances, then converts those layers into independent SVGs using Adaptive Layered Vectorization that allocates primitives according to an error budget.

What carries the argument

Semantic Layer Peeling (SLP), a VLM-guided strategy that progressively decomposes an image into semantically coherent layers while using hybrid inpainting to recover complete object appearances under occlusions.

If this is right

The resulting amodal layers support object-level editing directly in the vector domain.
SVGs become semantically organized rather than entangled across overlapping objects.
Geometric completeness is achieved for each object rather than only tracing visible pixels.
Independent vectorization of layers enables capabilities absent from prior modal vectorization methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Design software could incorporate amodal layers as a native editing mode so users adjust hidden parts of objects without redrawing.
The approach may extend naturally to video by propagating layer decompositions across frames to maintain consistent editable vectors over time.
Automated illustration pipelines could use the complete layers to generate variants where objects are rearranged or restyled while preserving vector editability.

Load-bearing premise

The vision-language model guided peeling step must produce accurate semantic separation and artifact-free inpainted appearances for occluded regions so that the subsequent vectorization remains high quality.

What would settle it

A controlled test set of images containing objects with known full geometries where the output amodal SVGs are checked for geometric fidelity in hidden regions and for whether manual edits to individual layers produce visually consistent results without introducing new artifacts.

Figures

Figures reproduced from arXiv: 2604.10940 by Anran Qi, Buyu Li, Dong Xu, Guotao Liang, Juncheng Hu, Qian Yu, Sheng Wang, Ziteng Xue.

**Figure 1.** Figure 1: AmodalSVG vectorizes a raster image into semantically decoupled and amodally complete vector layers (e.g., the unoccluded lighthouse and its shadow on the top left, and the full giraffe on the top right). The resulting vector layers enable object-level editing–such as Reposition , Recolor , Replace , Resize , Reorder, and Remove – directly in the vector domain while maintaining global visual consistency (b… view at source ↗

**Figure 2.** Figure 2: Overview of the AmodalSVG. Our framework adopts a two-stage pipeline: raster-level semantic decoupling and completion, followed by layer-wise vectorization. (Left) For semantic decoupling and completion, Semantic Layer Peeling (SLP) iteratively identifies entities via VLMs to guide SAM segmentation. A hybrid inpainting mechanism then reconstitutes occluded regions to produce structurally integral raster l… view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of SVG reconstruction fidelity. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of SVG layering quality. [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation of Semantic Layer Peeling (SLP) strategies. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative ablation of ALV. Contentaware initialization prevents convergence failures and facilitates structural recovery. Primitive addition helps capture high-frequency details (e.g., melon net), and primitive pruning removes redundant primitives for efficiency. Our full ALV achieves superior visual fidelity with an optimized primitive budget [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Structural editability and downstream applications. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

We introduce AmodalSVG, a new framework for amodal image vectorization that produces semantically organized and geometrically complete SVG representations from natural images. Existing vectorization methods operate under a modal paradigm: tracing only visible pixels and disregarding occlusion. Consequently, the resulting SVGs are semantically entangled and geometrically incomplete, limiting SVG's structural editability. In contrast, AmodalSVG reconstructs full object geometries, including occluded regions, into independent, editable vector layers. To achieve this, AmodalSVG reformulates image vectorization as a two-stage framework, performing semantic decoupling and completion in the raster domain to produce amodally complete semantic layers, which are then independently vectorized. In the first stage, we introduce Semantic Layer Peeling (SLP), a VLM-guided strategy that progressively decomposes an image into semantically coherent layers. By hybrid inpainting, SLP recovers complete object appearances under occlusions, enabling explicit semantic decoupling. To vectorize these layers efficiently, we propose Adaptive Layered Vectorization (ALV), which dynamically modulates the primitive budget via an error-budget-driven adjustment mechanism. Extensive experiments demonstrate that AmodalSVG significantly outperforms prior methods in visual fidelity. Moreover, the resulting amodal layers enable object-level editing directly in the vector domain, capabilities not supported by existing vectorization approaches. Code will be released upon acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AmodalSVG proposes a VLM-guided two-stage pipeline for amodal vectorization that targets editability but leaves the core occluded-region claims lightly tested.

read the letter

The main point is that this paper reframes vectorization as semantic layer peeling plus inpainting in raster space, followed by per-layer adaptive vectorization. That produces independent SVG layers that include hidden geometry, which standard methods skip entirely. The SLP stage uses a VLM to drive progressive decomposition and hybrid inpainting, while ALV adjusts primitive budgets by error. This is a clear extension beyond modal approaches and directly tackles the entanglement and incompleteness problem in SVG outputs for editing tasks. The pipeline idea itself is reasonable and fills a practical gap for graphics workflows that need object-level changes in vector form. The adaptive mechanism in ALV is a straightforward way to keep simple layers from wasting primitives. The experiments claim better visual fidelity and show editing examples, which at least demonstrates the intended use case. The soft spots sit in the evaluation of the amodal part. The abstract and stress-test note that downstream fidelity is reported, but there is no isolation of completion quality on occluded pixels, no IoU or perceptual metrics for inpainted regions, and no ablation on how VLM errors propagate into the vector stage. VLMs can misalign or hallucinate under occlusion, so the claim that layers are semantically correct and geometrically complete rests on an assumption that is not yet strongly evidenced. The citation pattern is standard and the work avoids circular math since it is a pipeline rather than a closed derivation. This is for readers working on vector graphics, layered image decomposition, or VLM integration in CV applications. Someone building editing tools or exploring multimodal graphics pipelines could extract useful structure from it. It deserves a serious referee because the problem is real and the reformulation is coherent, even if the validation needs tightening on the peeling and inpainting accuracy.

Referee Report

2 major / 1 minor

Summary. The paper introduces AmodalSVG, a two-stage framework for amodal image vectorization from natural images. It first applies Semantic Layer Peeling (SLP), a VLM-guided progressive decomposition with hybrid inpainting to recover complete object appearances under occlusion and produce semantically decoupled raster layers; these are then independently vectorized via Adaptive Layered Vectorization (ALV), which modulates primitive budgets using an error-budget threshold. The result is a set of independent, geometrically complete SVG layers that support object-level editing in the vector domain, unlike existing modal vectorization methods.

Significance. If the central claims hold, the work would be significant for extending vector graphics beyond visible surfaces, enabling new editing workflows in design and graphics applications. The SLP and ALV components introduce a novel pipeline for semantic decoupling and adaptive vectorization; the planned code release would further strengthen reproducibility.

major comments (2)

[Abstract] Abstract: the claim that AmodalSVG 'significantly outperforms prior methods in visual fidelity' and enables 'capabilities not supported by existing vectorization approaches' is unsupported by any quantitative metrics, baselines, ablation results, or dataset details, which are load-bearing for the central claim of amodal completeness and editability.
[SLP stage] SLP stage: the VLM-guided semantic decoupling and hybrid inpainting for occluded regions lacks any reported metrics isolating completion quality (e.g., amodal mask IoU, perceptual consistency of inpainted textures, or error propagation into ALV), leaving the independence and geometric completeness of the resulting layers under-supported given known VLM hallucination risks under heavy occlusion.

minor comments (1)

Define all acronyms (VLM, SLP, ALV) at first use and ensure consistent notation for the error-budget threshold across the method description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications on where the supporting evidence appears in the paper and noting where revisions will be made to strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that AmodalSVG 'significantly outperforms prior methods in visual fidelity' and enables 'capabilities not supported by existing vectorization approaches' is unsupported by any quantitative metrics, baselines, ablation results, or dataset details, which are load-bearing for the central claim of amodal completeness and editability.

Authors: The abstract is a concise summary of the paper's contributions and conclusions. The quantitative support for these claims—including comparisons to prior vectorization methods on visual fidelity metrics (e.g., PSNR, SSIM, LPIPS), ablation studies on the SLP and ALV components, baseline implementations, and dataset details—is provided in full in Section 4 (Experiments) and Section 3 (Datasets and Implementation). Editability is demonstrated through both qualitative object-level editing examples and a user study in Section 4.3. We do not believe the abstract itself requires expansion given typical length constraints, as the load-bearing evidence resides in the body of the paper. revision: no
Referee: [SLP stage] SLP stage: the VLM-guided semantic decoupling and hybrid inpainting for occluded regions lacks any reported metrics isolating completion quality (e.g., amodal mask IoU, perceptual consistency of inpainted textures, or error propagation into ALV), leaving the independence and geometric completeness of the resulting layers under-supported given known VLM hallucination risks under heavy occlusion.

Authors: We acknowledge that isolated metrics for the SLP stage would provide stronger support. The current manuscript reports end-to-end results and qualitative layer visualizations in Section 4.2, which show improved semantic independence and geometric completeness relative to modal baselines. However, we did not report standalone amodal completion metrics such as mask IoU or texture perceptual scores, partly because standard datasets lack amodal ground truth. We will add a dedicated ablation subsection with perceptual metrics (e.g., FID for inpainted regions) and explicit discussion of hallucination mitigation via the hybrid inpainting strategy. This revision will be incorporated. revision: yes

Circularity Check

0 steps flagged

No circularity in the proposed AmodalSVG pipeline

full rationale

The paper describes a two-stage algorithmic framework (SLP for VLM-guided semantic decomposition plus hybrid inpainting, followed by ALV for error-budget-driven vectorization) rather than any mathematical derivation chain, first-principles result, or prediction. No equations, fitted parameters renamed as outputs, self-definitional constructs, or load-bearing self-citations appear in the abstract or described method. The central claims rest on the empirical performance of the pipeline components, which are presented as novel but externally motivated techniques, not reductions to their own inputs by construction. This is a standard non-circular method paper.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 2 invented entities

The framework rests on the assumption that current VLMs and inpainting techniques can produce sufficiently accurate semantic layers and completions; no free parameters are explicitly named in the abstract, but the primitive budget modulation in ALV implies tunable thresholds.

free parameters (1)

error budget threshold
Used to dynamically adjust primitive count during vectorization of each layer.

axioms (1)

domain assumption VLMs can guide accurate progressive semantic decomposition and occlusion recovery via hybrid inpainting
Invoked as the core mechanism of Semantic Layer Peeling.

invented entities (2)

Semantic Layer Peeling (SLP) no independent evidence
purpose: Progressively decompose image into semantically coherent layers while recovering occluded regions
New VLM-guided strategy introduced for the first stage.
Adaptive Layered Vectorization (ALV) no independent evidence
purpose: Vectorize each amodal layer with dynamic primitive budget control
New mechanism for the second stage to balance fidelity and complexity.

pith-pipeline@v0.9.0 · 5557 in / 1262 out tokens · 33403 ms · 2026-05-10T16:32:43.477412+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation
cs.CV 2026-05 unverdicted novelty 7.0

VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.

Reference graph

Works this paper leans on

60 extracted references · 9 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

IEEE transactions on pattern analysis and machine intelligence34(11), 2274–2282 (2012) 4

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: Slic superpix- els compared to state-of-the-art superpixel methods. IEEE transactions on pattern analysis and machine intelligence34(11), 2274–2282 (2012) 4

2012
[2]

AI,R.:Aiimagevectorizer:Convertrasterimagestovectorgraphics(2024),https: //www.recraft.ai/ai-image-vectorizer10

2024
[3]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Ao, J., Jiang, Y., Ke, Q., Ehinger, K.A.: Open-world amodal appearance comple- tion. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 6490–6499 (2025) 4

2025
[4]

Qwen3-VL Technical Report

Bai, S., Cai, Y., Chen, R., Chen, K., Chen, X., Cheng, Z., Deng, L., Ding, W., Gao, C., Ge, C., Ge, W., Guo, Z., Huang, Q., Huang, J., Huang, F., Hui, B., Jiang, S., Li, Z., Li, M., Li, M., Li, K., Lin, Z., Lin, J., Liu, X., Liu, J., Liu, C., Liu, Y., Liu, D., Liu, S., Lu, D., Luo, R., Lv, C., Men, R., Meng, L., Ren, X., Ren, X., Song, S., Sun, Y., Tang, ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

In: European conference on computer vision

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European conference on computer vision. pp. 213–229. Springer (2020) 6

2020
[6]

Advances in Neural Information Processing Systems (NeurIPS)33, 16351–16361 (2020) 3, 4

Carlier, A., Danelljan, M., Alahi, A., Timofte, R.: Deepsvg: A hierarchical gen- erative network for vector graphics animation. Advances in Neural Information Processing Systems (NeurIPS)33, 16351–16361 (2020) 3, 4

2020
[7]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Chen, H., Zhao, Z., Chen, Y., Liang, Z., Ni, B.: Svgthinker: Instruction-aligned and reasoning-driven text-to-svg generation. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 11004–11012 (2025) 4

2025
[8]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Chen, Y., Hu, Z., Zhao, Z., Zhu, Y., Shi, Y., Xiong, Y., Ni, B.: Easy-editable image vectorization with multi-layer multi-scale distributed visual feature embedding. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 23345–23354 (2025) 6

2025
[9]

Consortium, W.W.W.: Scalable vector graphics (svg) specification (1999),https: //www.w3.org/TR/1999/WD-SVG-19990211/2

1999
[10]

In: Advances in Neural Information Pro- cessing Systems (NeurIPS) (2022) 3

Frans, K., Soros, L., Witkowski, O.: CLIPDraw: Exploring text-to-drawing syn- thesis through language-image encoders. In: Advances in Neural Information Pro- cessing Systems (NeurIPS) (2022) 3

2022
[11]

In: Proceedings of the IEEE/CVF inter- national conference on computer vision

Gao, J., Qian, X., Wang, Y., Xiao, T., He, T., Zhang, Z., Fu, Y.: Coarse-to-fine amodal segmentation with shape prior. In: Proceedings of the IEEE/CVF inter- national conference on computer vision. pp. 1262–1271 (2023) 4

2023
[12]

In: International Conference on Learning Representations (ICLR) (2018),https://openreview

Ha, D., Eck, D.: A neural representation of sketch drawings. In: International Conference on Learning Representations (ICLR) (2018),https://openreview. net/forum?id=Hy6GHpkCW3

2018
[13]

In: Proceedings of the AAAI Conference on Artificial In- telligence

Hirschorn, O., Jevnisek, A., Avidan, S.: Optimize & reduce: a top-down approach for image vectorization. In: Proceedings of the AAAI Conference on Artificial In- telligence. vol. 38, pp. 2148–2156 (2024) 2, 3, 4, 7, 10, 13

2024
[14]

In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

Hu, T., Yi, R., Qian, B., Zhang, J., Rosin, P.L., Lai, Y.K.: Supersvg: Superpixel- based scalable vector graphics synthesis. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. pp. 24892–24901 (2024) 4, 6 16 J. Hu et al

2024
[15]

ACM Transactions on Graphics (TOG)42(4), 1–11 (2023) 3

Iluz, S., Vinker, Y., Hertz, A., Berio, D., Cohen-Or, D., Shamir, A.: Word-as- image for semantic typography. ACM Transactions on Graphics (TOG)42(4), 1–11 (2023) 3

2023
[16]

In: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition

Jain, A., Xie, A., Abbeel, P.: Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition. pp. 1911–1920 (2023) 3

1911
[17]

In: Psychology of learning and motivation

Kellman, P.J., Massey, C.M.: Perceptual learning, cognition, and expertise. In: Psychology of learning and motivation. Elsevier (2013) 4

2013
[18]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023) 4, 6, 19

2023
[19]

black forest lab: Flux.1 fill [pro] (2024),https://huggingface.co/black-forest- labs/FLUX.1-Fill-dev6, 19, 22

2024
[20]

ACM Transactions on Graphics (TOG) 39(6), 1–15 (2020) 2, 3, 4, 7, 10, 13, 19, 22

Li, T.M., Lukáč, M., Gharbi, M., Ragan-Kelley, J.: Differentiable vector graph- ics rasterization for editing and learning. ACM Transactions on Graphics (TOG) 39(6), 1–15 (2020) 2, 3, 4, 7, 10, 13, 19, 22

2020
[21]

In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision

Li, Z., Lavreniuk, M., Shi, J., Bhat, S.F., Wonka, P.: Amodal depth anything: Amodal depth estimation in the wild. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision. pp. 9673–9682 (2025) 4

2025
[22]

In: European Conference on Computer Vision

Li, Z., Ye, W., Jiang, T., Huang, T.: 2d amodal instance segmentation guided by 3d shape prior. In: European Conference on Computer Vision. pp. 165–181. Springer (2022) 4

2022
[23]

In: European conference on computer vision

Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., et al.: Grounding dino: Marrying dino with grounded pre-training for open-set object detection. In: European conference on computer vision. pp. 38–55. Springer (2024) 19

2024
[24]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ma, X., Zhou, Y., Xu, X., Sun, B., Filev, V., Orlov, N., Fu, Y., Shi, H.: Towards layer-wise image vectorization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 16314–16323 (2022) 2, 3, 4, 10, 11, 13

2022
[25]

ACM Transactions on Graphics (ToG)27(3), 1–8 (2008) 4

Orzan, A., Bousseau, A., Winnemöller, H., Barla, P., Thollot, J., Salesin, D.: Diffu- sion curves: a vector representation for smooth-shaded images. ACM Transactions on Graphics (ToG)27(3), 1–8 (2008) 4

2008
[26]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Ozguroglu, E., Liu, R., Surís, D., Chen, D., Dave, A., Tokmakov, P., Vondrick, C.: pix2gestalt: Amodal segmentation by synthesizing wholes. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3931–3940. IEEE Computer Society (2024) 4

2024
[27]

Proceedings of the ACM on Computer Graphics and Interactive Techniques7(1), 1–17 (2024) 7

Papantonakis, P., Kopanas, G., Kerbl, B., Lanvin, A., Drettakis, G.: Reducing the memory footprint of 3d gaussian splatting. Proceedings of the ACM on Computer Graphics and Interactive Techniques7(1), 1–17 (2024) 7

2024
[28]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with kins dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3014–3023 (2019) 4

2019
[29]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Reddy, P., Gharbi, M., Lukac, M., Mitra, N.J.: Im2vec: Synthesizing vector graph- ics without vector supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7342–7351 (2021) 2, 3, 4, 13

2021
[30]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016) 6 AmodalSVG: Amodal Image Vectorization via Semantic Layer Peeling 17

2016
[31]

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Ren, T., Liu, S., Zeng, A., Lin, J., Li, K., Cao, H., Chen, J., Huang, X., Chen, Y., Yan, F., et al.: Grounded sam: Assembling open-world models for diverse visual tasks. arXiv preprint arXiv:2401.14159 (2024) 6, 19, 21

work page internal anchor Pith review arXiv 2024
[32]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Rodriguez, J.A., Puri, A., Agarwal, S., Laradji, I.H., Rodriguez, P., Rajeswar, S., Vazquez, D., Pal, C., Pedersoli, M.: Starvector: Generating scalable vector graphics code from images and text. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 16175–16186 (2025) 4

2025
[33]

A., Zhang, H., Puri, A., Feizi, A., Pramanik, R., Wichmann, P., Mondal, A., Samsami, M

Rodriguez, J.A., Zhang, H., Puri, A., Feizi, A., Pramanik, R., Wichmann, P., Mondal, A., Samsami, M.R., Awal, R., Taslakian, P., et al.: Rendering-aware rein- forcement learning for vector graphics generation. arXiv preprint arXiv:2505.20793 (2025) 4

work page arXiv 2025
[34]

Selinger, P.: Potrace: a polygon-based tracing algorithm (2003) 2, 4, 13

2003
[35]

Layertracer: Cognitive-aligned layered svg synthesis via diffusion transformer.arXiv preprint arXiv:2502.01105, 2025

Song, Y., Chen, D., Shou, M.Z.: Layertracer: Cognitive-aligned layered svg syn- thesis via diffusion transformer. arXiv preprint arXiv:2502.01105 (2025) 2, 4, 10, 11, 13

work page arXiv 2025
[36]

In: Proceedings of the IEEE/CVF winter conference on applications of computer vision

Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A., Kong, N., Goka, H., Park, K., Lempitsky, V.: Resolution-robust large mask inpainting with fourier convolutions. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 2149–2159 (2022) 6, 19, 22

2022
[37]

In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision

Vinker, Y., Alaluf, Y., Cohen-Or, D., Shamir, A.: Clipascene: Scene sketching with different types and levels of abstraction. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision. pp. 4146–4156 (2023) 3

2023
[38]

ACM Transactions on Graphics (TOG)41(4), 1–11 (2022) 3

Vinker, Y., Pajouheshgar, E., Bo, J.Y., Bachmann, R.C., Bermano, A.H., Cohen- Or, D., Zamir, A., Shamir, A.: Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG)41(4), 1–11 (2022) 3

2022
[39]

In: Proceedings of the Com- puter Vision and Pattern Recognition Conference

Vinker, Y., Shaham, T.R., Zheng, K., Zhao, A., E Fan, J., Torralba, A.: Sketcha- gent: Language-driven sequential sketch generation. In: Proceedings of the Com- puter Vision and Pattern Recognition Conference. pp. 23355–23368 (2025) 4

2025
[40]

Visioncortex: vtracer (2020),https://www.visioncortex.org/vtracer-docs2, 3, 4, 10, 13

2020
[41]

In: Proceedings of the 33rd ACM International Conference on Multimedia

Wang,F.,Zhao,Z.,Liu,Y.,Zhang,D.,Gao,J.,Sun,H.,Li,X.:Svgen:Interpretable vector graphics generation with large language models. In: Proceedings of the 33rd ACM International Conference on Multimedia. pp. 9608–9617 (2025) 4

2025
[42]

arXiv preprint arXiv:2510.11341 (2025)

Wang, H., Yin, J., Wei, Q., Zeng, W., Gu, L., Ye, S., Gao, Z., Wang, Y., Zhang, Y., Li, Y., et al.: Internsvg: Towards unified svg tasks with multimodal large language models. arXiv preprint arXiv:2510.11341 (2025) 4

work page arXiv 2025
[43]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wang, Z., Huang, J., Sun, Z., Gong, Y., Cohen-Or, D., Lu, M.: Layered image vectorization via semantic simplification. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 7728–7738 (2025) 2, 4, 10, 11, 13

2025
[44]

IEEE transactions on image processing 13(4), 600–612 (2004) 10, 22, 23

Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004) 10, 22, 23

2004
[45]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Wu, R., Su, W., Liao, J.: Chat2svg: Vector graphics generation with large language models and image diffusion models. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 23690–23700 (2025) 4

2025
[46]

In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers (2025) 2, 4, 10, 11, 13

Wu, R., Su, W., Liao, J.: Layerpeeler: Autoregressive peeling for layer-wise image vectorization. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers (2025) 2, 4, 10, 11, 13

2025
[47]

Reason-SVG: Enhancing Structured Reasoning for Vector Graphics Generation with Reinforcement Learning

Xing, X., Guan, Y., Zhang, J., Xu, D., Yu, Q.: Reason-svg: Hybrid reward rl for aha-moments in vector graphics generation. arXiv preprint arXiv:2505.24499 (2025) 4 18 J. Hu et al

work page internal anchor Pith review Pith/arXiv arXiv 2025
[48]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Xing, X., Hu, J., Liang, G., Zhang, J., Xu, D., Yu, Q.: Empowering llms to un- derstand and generate complex vector graphics. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 19487–19497 (2025) 4

2025
[49]

Advances in Neural Infor- mation Processing Systems36, 15869–15889 (2023) 3

Xing, X., Wang, C., Zhou, H., Zhang, J., Yu, Q., Xu, D.: Diffsketcher: Text guided vector sketch synthesis through latent diffusion models. Advances in Neural Infor- mation Processing Systems36, 15869–15889 (2023) 3

2023
[50]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 3, 7

Xing, X., Yu, Q., Wang, C., Zhou, H., Zhang, J., Xu, D.: Svgdreamer++: Advanc- ing editability and diversity in text-guided svg generation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2025) 3, 7

2025
[51]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Xing, X., Zhou, H., Wang, C., Zhang, J., Xu, D., Yu, Q.: Svgdreamer: Text guided svg generation with diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4546–4555 (2024) 3

2024
[52]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Xu, K., Zhang, L., Shi, J.: Amodal completion via progressive mixed context dif- fusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9099–9109 (2024) 4

2024
[53]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Xue, Z., Guo, M., Fan, H., Zhang, S., Zhang, Z.: Corrbev: Multi-view 3d object detection by correlation learning with multi-modal prototypes. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 27413–27423 (2025) 4

2025
[54]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025) 4

Yang, Y., Cheng, W., Chen, S., Zeng, X., Yin, F., Zhang, J., Wang, L., Yu, G., Ma, X., Jiang, Y.G.: Omnisvg: A unified scalable vector graphics generation model. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025) 4

2025
[55]

Qwen-image-layered: Towards inherent editability via layer decomposition.arXiv preprint arXiv:2512.15603, 2025

Yin, S., Zhang, Z., Tang, Z., Gao, K., Xu, X., Yan, K., Li, J., Chen, Y., Chen, Y., Shum, H.Y., et al.: Qwen-image-layered: Towards inherent editability via layer decomposition. arXiv preprint arXiv:2512.15603 (2025) 11

work page arXiv 2025
[56]

arXiv preprint arXiv:2512.10894 (2025)

Zhang, P., Zhao, N., Fisher, M., Xu, Y., Liao, J., Liu, D.: Duetsvg: Uni- fied multimodal svg generation with internal visual guidance. arXiv preprint arXiv:2512.10894 (2025) 4

work page arXiv 2025
[57]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018) 10, 22, 23

2018
[58]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Zhao, K., Bao, L., Li, Y., Su, X., Zhang, K., Qiao, X.: Less is more: Efficient image vectorization with adaptive parameterization. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 18166–18175 (2025) 2, 4, 10, 11, 13

2025
[59]

In: European Conference on Computer Vision

Zhou, H., Zhang, H., Wang, B.: Segmentation-guided layer-wise image vectoriza- tion with gradient fills. In: European Conference on Computer Vision. pp. 165–180. Springer (2024) 2, 4, 10, 11, 13

2024
[60]

left dog

Zhu, H., Chong, J.I., Hu, T., Yi, R., Lai, Y.K., Rosin, P.L.: Samvg: A multi-stage image vectorization model with the segment-anything model. In: ICASSP 2024- 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 4350–4354. IEEE (2024) 3, 4, 6 AmodalSVG: Amodal Image Vectorization via Semantic Layer Peeling 19 Supplem...

work page arXiv 2024