IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing

Botian Shi; Daocheng Fu; Hairong Zhang; Hongbin Zhou; Jiaxin Ai; Licheng Wen; Nianchen Deng; Pinlong Cai; Shu Zou; Siqi Li

arxiv: 2606.13368 · v1 · pith:P4ENQTEKnew · submitted 2026-06-11 · 💻 cs.AI · cs.CV

IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing

Tao Hu , Jiaxin Ai , Licheng Wen , Xueheng Li , Shu Zou , Siqi Li , Nianchen Deng , Xinyu Cai

show 7 more authors

Hongbin Zhou Pinlong Cai Daocheng Fu Yu Yang Hairong Zhang Botian Shi Xuemeng Yang

This is my paper

Pith reviewed 2026-06-27 06:33 UTC · model grok-4.3

classification 💻 cs.AI cs.CV

keywords CAD generationmultimodal agentiterative refinementcode generationgeometric precisionreinforcement learningengineering drawingsclosed-loop interaction

0 comments

The pith

IterCAD frames CAD generation as closed-loop multi-turn agent interaction with an executable sandbox to enable iterative refinement from drawings or text.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents IterCAD as a multimodal agent that treats CAD tasks as repeated interactions inside a code-executing sandbox rather than single-pass outputs. It covers drawing-to-code, text-to-code, and editing by first building a data pipeline that produces engineering drawings, editing sequences, and interaction traces from industrial-style features. The agent then receives progressive supervised fine-tuning followed by geometry-aware reinforcement learning that masks invalid prefixes. A new benchmark suite measures both whether code runs and how closely the resulting geometry matches targets using a tolerance-recall curve. Experiments indicate the resulting system produces more executable code, tighter geometric matches, and stronger handling of successive edits than prior one-shot methods.

Core claim

IterCAD is a unified multimodal agent framework for closed-loop interactive CAD generation and editing formulated as multi-turn interaction between the agent and an executable CAD sandbox. The approach rests on a data synthesis pipeline that creates standard-compliant multi-view drawings, complex code-editing tasks, and high-fidelity trajectories, followed by progressive supervised fine-tuning and geometry-aware reinforcement learning with viable-prefix masking. Evaluation on the introduced IterCAD-Bench uses the Chamfer Distance Tolerance-Recall curve and its AUC-TR metric to show higher code executability, geometric precision, and iterative refinement ability than existing approaches.

What carries the argument

the closed-loop multi-turn interaction between a multimodal agent and an executable CAD sandbox, trained first by progressive SFT then by geometry-aware RL with viable-prefix masking

If this is right

Code generated by the agent runs successfully at higher rates on CAD interpreters.
Final shapes lie closer to target geometry across tolerance levels tracked by the CD-TR curve.
Multi-turn editing sessions converge faster and with fewer invalid steps than open-loop baselines.
The AUC-TR metric supplies a single scalar that jointly scores validity and precision without discarding invalid outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sandbox-loop structure could be applied to other parametric modeling environments that expose an execution API.
Adding sensor feedback from physical prototypes into the reward signal would turn the loop into a full digital-twin controller.
Extending the synthesis pipeline to assemblies of multiple parts would test whether the agent can maintain consistency across linked components.

Load-bearing premise

The data synthesis pipeline produces drawings, editing tasks, and interaction trajectories that match the distribution and difficulty of real industrial CAD work.

What would settle it

A head-to-head test on a held-out collection of actual engineer CAD sessions in which IterCAD requires the same or more refinement turns than one-shot baselines to reach an executable design of target geometry.

Figures

Figures reproduced from arXiv: 2606.13368 by Botian Shi, Daocheng Fu, Hairong Zhang, Hongbin Zhou, Jiaxin Ai, Licheng Wen, Nianchen Deng, Pinlong Cai, Shu Zou, Siqi Li, Tao Hu, Xinyu Cai, Xueheng Li, Xuemeng Yang, Yu Yang.

**Figure 2.** Figure 2: Overview of the IterCAD framework. IterCAD formulates interactive CAD generation and editing as a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Data curation pipeline for IterCAD. The pipeline first constructs three categories of high-quality CAD [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: CD-TR Curve on IterCAD-Draw bench. Benchmarks. We evaluate multi-turn generation across: 1) IterCAD-Bench: Our proposed suite with 1K drawing and 200 editing tasks; 2) Text2CAD Bench [14]: 8, 046 multimodal parts with text specifications; and 3) CADPrompt Bench [34]: 200 expert instructions for zero-shot text-to-CAD synthesis. Evaluation Metrics. Performance is assessed via a multi-dimensional metric suit… view at source ↗

**Figure 5.** Figure 5: Representative samples from the IterCAD-Draw benchmark across two difficulty levels, showcasing multi-view engineering drawings paired with ground-truth 3D geometries. Complexity increases from simple extruded profiles (Easy-level) to parts requiring advanced operations such as shells, fillets, and through-cuts (Hard-level). Csrc. For each corrupted instance, we pair it with a concise design-change instruc… view at source ↗

**Figure 6.** Figure 6: Representative samples from the IterCAD-Edit benchmark. Each pair shows the source code (left), the editing instruction (middle), and the target code after modification (right), illustrating diverse edit operations including feature addition, Boolean subtraction, and parametric adjustment [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Top-20 CadQuery API operation distribution in the [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Average number of interaction turns during RL training. GSPO alone (blue) rapidly [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison on representative [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

**Figure 10.** Figure 10: Drawing-to-code self-correction case. Starting from a dimensioned engineering drawing, [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Text-to-CAD self-correction case. The initial code creates an offset cylinder and misses the [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 12.** Figure 12: Instruction-based CAD editing case. Starting from an existing rounded base, IterCAD [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

**Figure 13.** Figure 13: Unified generation-and-editing example. IterCAD first reconstructs a base plate from [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗

**Figure 14.** Figure 14: System prompt for IterCAD code generation and editing. [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗

read the original abstract

Computer-Aided Design is pivotal in modern manufacturing, yet existing automated methods predominantly rely on open-loop, one-shot generation, creating a mismatch with iterative real-world practices. In this paper, we present IterCAD, a unified multimodal agent framework for closed-loop, interactive CAD generation and editing. We formulate the task as a multi-turn interaction between a multimodal agent and an executable CAD sandbox, covering three tasks: Drawing-to-Code, Text-to-Code, and Interactive Editing. To support this, we develop a data synthesis pipeline incorporating advanced industrial manufacturing features to generate standard-compliant multi-view engineering drawings, complex code-editing tasks, and high-fidelity interaction trajectories. We optimize the agent via progressive SFT followed by geometry-aware reinforcement learning with viable-prefix masking to enhance code executability and geometric fidelity. Finally, we introduce the IterCAD-Bench evaluation suite and propose the Chamfer Distance Tolerance-Recall (CD-TR) curve alongside its AUC-TR metric, establishing a survivor-bias-free standard that unifies code validity and geometric precision. Extensive experiments demonstrate that IterCAD achieves highly competitive performance across multiple benchmarks, significantly outperforming existing approaches in both code executability and geometric precision, while exhibiting superior capabilities in closed-loop iterative refinement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IterCAD packages an iterative multimodal agent for CAD with some fresh pieces, but the abstract leaves the performance claims and benchmark fidelity uncheckable.

read the letter

The main takeaway is that this paper shifts CAD generation toward closed-loop interaction in a sandbox, covering drawing-to-code, text-to-code, and editing in one agent setup. That framing matches real design work better than one-shot methods.

What is actually new is the single agent formulation across those three tasks, the data synthesis pipeline that adds industrial manufacturing features, viable-prefix masking during the RL stage, and the CD-TR curve plus AUC-TR metric meant to combine executability with geometric precision without survivor bias. The IterCAD-Bench is presented as a new evaluation suite built from that pipeline.

The paper does a reasonable job naming the mismatch between existing open-loop tools and iterative practice, and the components like the masking trick and the tolerance-recall curve are specific enough to be worth looking at.

The soft spots sit in the evidence. No numbers, baselines, or ablations appear in the abstract, so the claims of significant outperformance on executability and precision cannot be assessed. The benchmark is generated internally, and the description gives no quantitative comparison of the synthetic drawings, editing sequences, or trajectories against real industrial CAD corpora. If the generated data turns out simpler or cleaner than actual practice, the reported gains could be tied to that distribution. The stress-test concern about pipeline fidelity lands directly on what is visible here.

This is for researchers working on multimodal agents or code generation for engineering design tools. A reader focused on practical applications in manufacturing might extract the task breakdown or the evaluation idea.

I would send it for peer review. The domain is concrete and the iterative angle is worth testing, provided the full paper supplies the missing experiments and validation.

Referee Report

2 major / 2 minor

Summary. The paper presents IterCAD, a multimodal agent for closed-loop CAD generation and editing formulated as multi-turn interaction with an executable sandbox across Drawing-to-Code, Text-to-Code, and Interactive Editing tasks. It introduces a data synthesis pipeline using industrial manufacturing features to create standard-compliant multi-view drawings, code-editing tasks, and interaction trajectories; optimizes the agent via progressive supervised fine-tuning followed by geometry-aware reinforcement learning with viable-prefix masking; proposes the IterCAD-Bench suite together with the Chamfer Distance Tolerance-Recall (CD-TR) curve and AUC-TR metric; and reports that the resulting system significantly outperforms prior methods on code executability and geometric precision while showing stronger closed-loop iterative refinement.

Significance. If the performance claims hold under rigorous validation, the work would advance automated CAD by shifting from open-loop one-shot generation to interactive, closed-loop refinement that better matches manufacturing practice. The CD-TR/AUC-TR metric is a constructive contribution that avoids survivor bias by jointly penalizing invalid code and geometric deviation. The combination of SFT and geometry-aware RL with masking is a reasonable technical approach for improving executability.

major comments (2)

[Abstract, §3] Abstract and §3 (Data Synthesis Pipeline): the headline claims of outperformance on executability, geometric precision, and iterative refinement rest entirely on IterCAD-Bench, which is generated by the described pipeline. No quantitative validation (feature-distribution statistics, tolerance usage, editing-sequence length distributions, or comparison to any real industrial CAD corpus) is supplied to establish that the synthetic data are representative; without this, the reported gains on CD-TR/AUC-TR and closed-loop metrics could be artifacts of the benchmark construction rather than genuine capability.
[§4, Tables 2-3] §4 (Experiments) and Table 2/3: the paper states that IterCAD “significantly outperforming existing approaches,” yet the abstract and available description provide no ablation isolating the contribution of viable-prefix masking versus standard RL, nor any statistical significance tests or variance estimates across the multiple benchmarks; these omissions make it impossible to determine whether the claimed superiority is robust or sensitive to post-hoc choices in the synthetic data.

minor comments (2)

[§4.2] Notation for the CD-TR curve and AUC-TR should be defined with an explicit equation (e.g., recall at tolerance τ) rather than left to prose, to allow direct reproduction.
[Figure 4] Figure captions for the interaction trajectories should include the exact number of turns and the success criterion used in the closed-loop evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (Data Synthesis Pipeline): the headline claims of outperformance on executability, geometric precision, and iterative refinement rest entirely on IterCAD-Bench, which is generated by the described pipeline. No quantitative validation (feature-distribution statistics, tolerance usage, editing-sequence length distributions, or comparison to any real industrial CAD corpus) is supplied to establish that the synthetic data are representative; without this, the reported gains on CD-TR/AUC-TR and closed-loop metrics could be artifacts of the benchmark construction rather than genuine capability.

Authors: We agree that explicit quantitative validation of the synthetic data's alignment with real industrial distributions would strengthen the claims. The pipeline is designed around standard-compliant industrial manufacturing features (e.g., GD&T tolerances, multi-view projections, and feature-based modeling), but the current manuscript does not include feature-distribution histograms, tolerance-usage statistics, or sequence-length comparisons. In the revision we will add these analyses on the generated corpus and, where feasible, contrast them against publicly available CAD datasets to reduce the risk that performance gains are benchmark-specific artifacts. revision: yes
Referee: [§4, Tables 2-3] §4 (Experiments) and Table 2/3: the paper states that IterCAD “significantly outperforming existing approaches,” yet the abstract and available description provide no ablation isolating the contribution of viable-prefix masking versus standard RL, nor any statistical significance tests or variance estimates across the multiple benchmarks; these omissions make it impossible to determine whether the claimed superiority is robust or sensitive to post-hoc choices in the synthetic data.

Authors: We acknowledge the absence of these controls. The manuscript reports overall gains but does not isolate viable-prefix masking from standard RL nor supply run-to-run variance or significance tests. In the revised version we will add an ablation table comparing the full geometry-aware RL with viable-prefix masking against a standard RL baseline, together with standard deviations across multiple random seeds and paired statistical significance tests (e.g., Wilcoxon or t-tests) on the CD-TR/AUC-TR metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is an empirical system description with no equations or derivations. It reports performance on multiple benchmarks (including self-introduced IterCAD-Bench generated via the described pipeline) after SFT and RL training. No load-bearing step reduces claimed results to fitted parameters, self-citations, or inputs by construction. The work is self-contained against external benchmarks and does not match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be extracted or audited; the ledger is left empty pending full text access.

pith-pipeline@v0.9.1-grok · 5797 in / 1261 out tokens · 25163 ms · 2026-06-27T06:33:08.746229+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 1 canonical work pages

[1]

Comparing 3d cad models: uses, methods, tools and perspectives.Computer-Aided Design and Applications, 9(6):771–794, 2012

Antoine Brière-Côté, Louis Rivest, and Roland Maranzana. Comparing 3d cad models: uses, methods, tools and perspectives.Computer-Aided Design and Applications, 9(6):771–794, 2012

2012
[2]

Penev, Bryan Weissinger, M

AU, Jeremy Wright, thebluedirt, Marcus Boyd, Lorenz, Innovations Technology Solutions, Hasan Yavuz ÖZDERYA, Bruno Agostini, Jojain, Michael Greminger, Seth Fischer, Justin Buchanan, cactrot, huskier, Ruben, iulianOnofrei (U-lee aan), Miguel Sánchez de León Peque, Martin Budden, Hecatron, Peter Boin, Wink Saville, Pavel M. Penev, Bryan Weissinger, M. Greys...

work page doi:10.5281/zenodo.10513848 2024
[3]

Text-to-cadquery: A new paradigm for cad generation with scalable large model capabilities.arXiv preprint arXiv:2505.06507, 2025

Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model capabilities.arXiv preprint arXiv:2505.06507, 2025

arXiv 2025
[4]

Cad translator: An effective drive for text to 3d parametric computer-aided design generative modeling

Xueyang Li, Yu Song, Yunzhong Lou, and Xiangdong Zhou. Cad translator: An effective drive for text to 3d parametric computer-aided design generative modeling. InProceedings of the 32nd ACM International Conference on Multimedia, pages 8461–8470, 2024

2024
[5]

Clarify before you draw: Proactive agents for robust text-to-cad generation.arXiv preprint arXiv:2602.03045, 2026

Bo Yuan, Zelin Zhao, Petr Molodyk, Bin Hu, and Yongxin Chen. Clarify before you draw: Proactive agents for robust text-to-cad generation.arXiv preprint arXiv:2602.03045, 2026

Pith/arXiv arXiv 2026
[6]

Skexgen: Autoregressive generation of cad construction sequences with disentangled codebooks

Xiang Xu, Karl DD Willis, Joseph G Lambourne, Chin-Yi Cheng, Pradeep Kumar Jayaraman, and Yasutaka Furukawa. Skexgen: Autoregressive generation of cad construction sequences with disentangled codebooks. arXiv preprint arXiv:2207.04632, 2022

arXiv 2022
[7]

Cadsmith: Multi-agent cad generation with programmatic geometric validation.arXiv preprint arXiv:2603.26512, 2026

Jesse Barkley, Rumi Loghmani, and Amir Barati Farimani. Cadsmith: Multi-agent cad generation with programmatic geometric validation.arXiv preprint arXiv:2603.26512, 2026

arXiv 2026
[8]

Cme-cad: Heterogeneous collaborative multi-expert reinforcement learning for cad code generation.arXiv preprint arXiv:2512.23333, 2025

Ke Niu, Haiyang Yu, Zhuofan Chen, Zhengtao Yao, Weitao Jia, Xiaodong Ge, Jingqun Tang, Benlei Cui, Bin Li, and Xiangyang Xue. Cme-cad: Heterogeneous collaborative multi-expert reinforcement learning for cad code generation.arXiv preprint arXiv:2512.23333, 2025

arXiv 2025
[9]

Cad-coder: Text-to-cad generation with chain-of-thought and geometric reward.Advances in Neural Information Processing Systems, 38:59765–59789, 2026

Yandong Guan, Xilin Wang, Ximing Xing, Jing Zhang, Dong Xu, and Qian Yu. Cad-coder: Text-to-cad generation with chain-of-thought and geometric reward.Advances in Neural Information Processing Systems, 38:59765–59789, 2026

2026
[10]

cadrille: Multi-modal cad recon- struction with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025

Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna V orontsova, Anton Konushin, Vladislav Kurenkov, and Danila Rukhovich. cadrille: Multi-modal cad recon- struction with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025

arXiv 2025
[11]

Cadreasoner: Iterative program editing for cad reverse engineering.arXiv preprint arXiv:2603.29847, 2026

Soslan Kabisov, Vsevolod Kirichuk, Andrey V olkov, Gennadii Savrasov, Marina Barannikov, Anton Konushin, Andrey Kuznetsov, and Dmitrii Zhemchuzhnikov. Cadreasoner: Iterative program editing for cad reverse engineering.arXiv preprint arXiv:2603.29847, 2026

arXiv 2026
[12]

Cad-judge: Toward efficient morphological grading and verification for text-to-cad generation

Zheyuan Zhou, Jiayi Han, Liang Du, Naiyu Fang, Lemiao Qiu, and Shuyou Zhang. Cad-judge: Toward efficient morphological grading and verification for text-to-cad generation. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1021–1025. IEEE, 2026

2026
[13]

Gift: Bootstrapping image-to-cad program synthesis via geometric feedback.arXiv preprint arXiv:2603.27448, 2026

Giorgio Giannone, Anna Clare Doris, Amin Heyrani Nobari, Kai Xu, Akash Srivastava, and Faez Ahmed. Gift: Bootstrapping image-to-cad program synthesis via geometric feedback.arXiv preprint arXiv:2603.27448, 2026

arXiv 2026
[14]

Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

Mohammad S Khan, Sankalp Sinha, Talha U Sheikh, Didier Stricker, Sk A Ali, and Muhammad Z Afzal. Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

2024
[15]

Deepcad: A deep generative network for computer-aided design models

Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer-aided design models. InProceedings of the IEEE/CVF international conference on computer vision, pages 6772–6782, 2021

2021
[16]

Secad-net: Self-supervised cad reconstruction by learning sketch-extrude operations

Pu Li, Jianwei Guo, Xiaopeng Zhang, and Dong-Ming Yan. Secad-net: Self-supervised cad reconstruction by learning sketch-extrude operations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16826, 2023. 10

2023
[17]

Inversecsg: Automatic conversion of 3d models to csg trees.ACM Transac- tions on Graphics (TOG), 37(6):1–16, 2018

Tao Du, Jeevana Priya Inala, Yewen Pu, Andrew Spielberg, Adriana Schulz, Daniela Rus, Armando Solar- Lezama, and Wojciech Matusik. Inversecsg: Automatic conversion of 3d models to csg trees.ACM Transac- tions on Graphics (TOG), 37(6):1–16, 2018

2018
[18]

Capri-net: Learning compact cad shapes with adaptive primitive assembly

Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang. Capri-net: Learning compact cad shapes with adaptive primitive assembly. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11768–11778, 2022

2022
[19]

Solidgen: An autoregressive model for direct b-rep synthesis.arXiv preprint arXiv:2203.13944, 2022

Pradeep Kumar Jayaraman, Joseph G Lambourne, Nishkrit Desai, Karl DD Willis, Aditya Sanghi, and Nigel JW Morris. Solidgen: An autoregressive model for direct b-rep synthesis.arXiv preprint arXiv:2203.13944, 2022

arXiv 2022
[20]

Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43(4):1–14, 2024

Xiang Xu, Joseph Lambourne, Pradeep Jayaraman, Zhengqing Wang, Karl Willis, and Yasutaka Furukawa. Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43(4):1–14, 2024

2024
[21]

Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation

Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, and Xiangdong Zhou. Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18563–18573, 2025

2025
[22]

Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek.arXiv preprint arXiv:2505.17702, 2025

Xueyang Li, Jiahao Li, Yu Song, Yunzhong Lou, and Xiangdong Zhou. Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek.arXiv preprint arXiv:2505.17702, 2025

arXiv 2025
[23]

From intent to execution: Multimodal chain-of-thought reinforcement learning for precise cad code generation

Ke Niu, Haiyang Yu, Zhuofan Chen, Mengyang Zhao, Teng Fu, Bin Li, and Xiangyang Xue. From intent to execution: Multimodal chain-of-thought reinforcement learning for precise cad code generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 8160–8167, 2026

2026
[24]

Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021

Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021

2021
[25]

Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing.arXiv preprint arXiv:2502.03997, 2025

Yu Yuan, Shizhao Sun, Qi Liu, and Jiang Bian. Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing.arXiv preprint arXiv:2502.03997, 2025

arXiv 2025
[26]

Pr-cad: Progressive refinement for unified controllable and faithful text-to-cad generation with large language models.arXiv preprint arXiv:2604.19773, 2026

Jiyuan An, Jiachen Zhao, Fan Chen, Liner Yang, Zhenghao Liu, Hongyan Wang, Weihua An, Meishan Zhang, and Erhong Yang. Pr-cad: Progressive refinement for unified controllable and faithful text-to-cad generation with large language models.arXiv preprint arXiv:2604.19773, 2026

Pith/arXiv arXiv 2026
[27]

Caddesigner: Conceptual design of cad models based on general-purpose agent.arXiv preprint arXiv:2508.01031, 2025

Fengxiao Fan, Jingzhe Ni, Xiaolong Yin, Sirui Wang, Xingyu Lu, Qiang Zou, Ruofeng Tong, Min Tang, and Peng Du. Caddesigner: Conceptual design of cad models based on general-purpose agent.arXiv preprint arXiv:2508.01031, 2025

Pith/arXiv arXiv 2025
[28]

Toolcad: Exploring tool-using large language models in text-to-cad generation with reinforcement learning.arXiv preprint arXiv:2604.07960, 2026

Yifei Gong, Xing Wu, Wenda Liu, and Kang Tu. Toolcad: Exploring tool-using large language models in text-to-cad generation with reinforcement learning.arXiv preprint arXiv:2604.07960, 2026

Pith/arXiv arXiv 2026
[29]

Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

Pith/arXiv arXiv 2025
[30]

Group sequence policy optimization.arXiv preprint arXiv:2507.18071, 2025

Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, et al. Group sequence policy optimization.arXiv preprint arXiv:2507.18071, 2025

Pith/arXiv arXiv 2025
[31]

Sensenova-mars: Empowering multimodal agentic reasoning and search via reinforcement learning.arXiv preprint arXiv:2512.24330, 2025

Yong Xien Chng, Tao Hu, Wenwen Tong, Xueheng Li, Jiandong Chen, Haojia Yu, Jiefan Lu, Hewei Guo, Hanming Deng, Chengjun Xie, et al. Sensenova-mars: Empowering multimodal agentic reasoning and search via reinforcement learning.arXiv preprint arXiv:2512.24330, 2025

arXiv 2025
[32]

Swift: a scalable lightweight infrastructure for fine-tuning

Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, et al. Swift: a scalable lightweight infrastructure for fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 29733–29735, 2025

2025
[33]

Qwen3.5: Accelerating productivity with native multimodal agents, February 2026

Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, February 2026. URL https://qwen.ai/blog?id=qwen3.5

2026
[34]

Generating cad code with vision-language models for 3d designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi, Megan Langwasser, Wei Xu, and Matthew Gombolay. Generating cad code with vision-language models for 3d designs. InInternational Conference on Learning Representations, volume 2025, pages 52236–52262, 2025

2025
[35]

sketch-and-extrude

OpenAI. Gpt-5.https://openai.com/gpt-5, 2025. 11 Appendix Contents A Related Work 12 A.1 CAD Representations and Generation . . . . . . . . . . . . . . . . . . . . . . . . 12 A.2 Multi-Turn CAD Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 B CAD Pairs Construction 13 B.1 Drawing-Code Pairs. . . . . . . . . . . . . . . . . . . . ....

arXiv 2025
[36]

2.Text: modeling instructions, dimensional constraints, or edit requests

Technical Drawing Image: orthographic projections such as Front, Top, Side, and ISO views with dimensions. 2.Text: modeling instructions, dimensional constraints, or edit requests. 3.Existing Code: a CadQuery script that should be preserved or modified when possible. Objective.Create or edit a 3D model that satisfies the user request. Output Format.Always...
[37]

– Plan: define the origin, workplanes, sketch sequence, Boolean operations, and key dimen- sions

Structure the<thinking>process strictly based on the provided inputs: •If no feedback is provided, such as initial generation or completely new instructions: – Requirement Analysis: break down visual or textual inputs into CadQuery features. – Plan: define the origin, workplanes, sketch sequence, Boolean operations, and key dimen- sions. •If feedback or e...
[38]

Code Implementation Rules

If feedback explicitly confirms that the 3D model is correct with no remaining issues, briefly state the assessment in<thinking></thinking>, then output<DONE>. Code Implementation Rules. • Use Python as the programming language. • Use CadQuery withimport cadquery as cq. • Assign the final result to variabler. • If scaling is needed, define scale_factor an...

[1] [1]

Comparing 3d cad models: uses, methods, tools and perspectives.Computer-Aided Design and Applications, 9(6):771–794, 2012

Antoine Brière-Côté, Louis Rivest, and Roland Maranzana. Comparing 3d cad models: uses, methods, tools and perspectives.Computer-Aided Design and Applications, 9(6):771–794, 2012

2012

[2] [2]

Penev, Bryan Weissinger, M

AU, Jeremy Wright, thebluedirt, Marcus Boyd, Lorenz, Innovations Technology Solutions, Hasan Yavuz ÖZDERYA, Bruno Agostini, Jojain, Michael Greminger, Seth Fischer, Justin Buchanan, cactrot, huskier, Ruben, iulianOnofrei (U-lee aan), Miguel Sánchez de León Peque, Martin Budden, Hecatron, Peter Boin, Wink Saville, Pavel M. Penev, Bryan Weissinger, M. Greys...

work page doi:10.5281/zenodo.10513848 2024

[3] [3]

Text-to-cadquery: A new paradigm for cad generation with scalable large model capabilities.arXiv preprint arXiv:2505.06507, 2025

Haoyang Xie and Feng Ju. Text-to-cadquery: A new paradigm for cad generation with scalable large model capabilities.arXiv preprint arXiv:2505.06507, 2025

arXiv 2025

[4] [4]

Cad translator: An effective drive for text to 3d parametric computer-aided design generative modeling

Xueyang Li, Yu Song, Yunzhong Lou, and Xiangdong Zhou. Cad translator: An effective drive for text to 3d parametric computer-aided design generative modeling. InProceedings of the 32nd ACM International Conference on Multimedia, pages 8461–8470, 2024

2024

[5] [5]

Clarify before you draw: Proactive agents for robust text-to-cad generation.arXiv preprint arXiv:2602.03045, 2026

Bo Yuan, Zelin Zhao, Petr Molodyk, Bin Hu, and Yongxin Chen. Clarify before you draw: Proactive agents for robust text-to-cad generation.arXiv preprint arXiv:2602.03045, 2026

Pith/arXiv arXiv 2026

[6] [6]

Skexgen: Autoregressive generation of cad construction sequences with disentangled codebooks

Xiang Xu, Karl DD Willis, Joseph G Lambourne, Chin-Yi Cheng, Pradeep Kumar Jayaraman, and Yasutaka Furukawa. Skexgen: Autoregressive generation of cad construction sequences with disentangled codebooks. arXiv preprint arXiv:2207.04632, 2022

arXiv 2022

[7] [7]

Cadsmith: Multi-agent cad generation with programmatic geometric validation.arXiv preprint arXiv:2603.26512, 2026

Jesse Barkley, Rumi Loghmani, and Amir Barati Farimani. Cadsmith: Multi-agent cad generation with programmatic geometric validation.arXiv preprint arXiv:2603.26512, 2026

arXiv 2026

[8] [8]

Cme-cad: Heterogeneous collaborative multi-expert reinforcement learning for cad code generation.arXiv preprint arXiv:2512.23333, 2025

Ke Niu, Haiyang Yu, Zhuofan Chen, Zhengtao Yao, Weitao Jia, Xiaodong Ge, Jingqun Tang, Benlei Cui, Bin Li, and Xiangyang Xue. Cme-cad: Heterogeneous collaborative multi-expert reinforcement learning for cad code generation.arXiv preprint arXiv:2512.23333, 2025

arXiv 2025

[9] [9]

Cad-coder: Text-to-cad generation with chain-of-thought and geometric reward.Advances in Neural Information Processing Systems, 38:59765–59789, 2026

Yandong Guan, Xilin Wang, Ximing Xing, Jing Zhang, Dong Xu, and Qian Yu. Cad-coder: Text-to-cad generation with chain-of-thought and geometric reward.Advances in Neural Information Processing Systems, 38:59765–59789, 2026

2026

[10] [10]

cadrille: Multi-modal cad recon- struction with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025

Maksim Kolodiazhnyi, Denis Tarasov, Dmitrii Zhemchuzhnikov, Alexander Nikulin, Ilya Zisman, Anna V orontsova, Anton Konushin, Vladislav Kurenkov, and Danila Rukhovich. cadrille: Multi-modal cad recon- struction with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025

arXiv 2025

[11] [11]

Cadreasoner: Iterative program editing for cad reverse engineering.arXiv preprint arXiv:2603.29847, 2026

Soslan Kabisov, Vsevolod Kirichuk, Andrey V olkov, Gennadii Savrasov, Marina Barannikov, Anton Konushin, Andrey Kuznetsov, and Dmitrii Zhemchuzhnikov. Cadreasoner: Iterative program editing for cad reverse engineering.arXiv preprint arXiv:2603.29847, 2026

arXiv 2026

[12] [12]

Cad-judge: Toward efficient morphological grading and verification for text-to-cad generation

Zheyuan Zhou, Jiayi Han, Liang Du, Naiyu Fang, Lemiao Qiu, and Shuyou Zhang. Cad-judge: Toward efficient morphological grading and verification for text-to-cad generation. InICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1021–1025. IEEE, 2026

2026

[13] [13]

Gift: Bootstrapping image-to-cad program synthesis via geometric feedback.arXiv preprint arXiv:2603.27448, 2026

Giorgio Giannone, Anna Clare Doris, Amin Heyrani Nobari, Kai Xu, Akash Srivastava, and Faez Ahmed. Gift: Bootstrapping image-to-cad program synthesis via geometric feedback.arXiv preprint arXiv:2603.27448, 2026

arXiv 2026

[14] [14]

Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

Mohammad S Khan, Sankalp Sinha, Talha U Sheikh, Didier Stricker, Sk A Ali, and Muhammad Z Afzal. Text2cad: Generating sequential cad designs from beginner-to-expert level text prompts.Advances in Neural Information Processing Systems, 37:7552–7579, 2024

2024

[15] [15]

Deepcad: A deep generative network for computer-aided design models

Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer-aided design models. InProceedings of the IEEE/CVF international conference on computer vision, pages 6772–6782, 2021

2021

[16] [16]

Secad-net: Self-supervised cad reconstruction by learning sketch-extrude operations

Pu Li, Jianwei Guo, Xiaopeng Zhang, and Dong-Ming Yan. Secad-net: Self-supervised cad reconstruction by learning sketch-extrude operations. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16826, 2023. 10

2023

[17] [17]

Inversecsg: Automatic conversion of 3d models to csg trees.ACM Transac- tions on Graphics (TOG), 37(6):1–16, 2018

Tao Du, Jeevana Priya Inala, Yewen Pu, Andrew Spielberg, Adriana Schulz, Daniela Rus, Armando Solar- Lezama, and Wojciech Matusik. Inversecsg: Automatic conversion of 3d models to csg trees.ACM Transac- tions on Graphics (TOG), 37(6):1–16, 2018

2018

[18] [18]

Capri-net: Learning compact cad shapes with adaptive primitive assembly

Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang. Capri-net: Learning compact cad shapes with adaptive primitive assembly. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11768–11778, 2022

2022

[19] [19]

Solidgen: An autoregressive model for direct b-rep synthesis.arXiv preprint arXiv:2203.13944, 2022

Pradeep Kumar Jayaraman, Joseph G Lambourne, Nishkrit Desai, Karl DD Willis, Aditya Sanghi, and Nigel JW Morris. Solidgen: An autoregressive model for direct b-rep synthesis.arXiv preprint arXiv:2203.13944, 2022

arXiv 2022

[20] [20]

Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43(4):1–14, 2024

Xiang Xu, Joseph Lambourne, Pradeep Jayaraman, Zhengqing Wang, Karl Willis, and Yasutaka Furukawa. Brepgen: A b-rep generative diffusion model with structured latent geometry.ACM Transactions on Graphics (TOG), 43(4):1–14, 2024

2024

[21] [21]

Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation

Jiahao Li, Weijian Ma, Xueyang Li, Yunzhong Lou, Guichun Zhou, and Xiangdong Zhou. Cad-llama: leveraging large language models for computer-aided design parametric 3d model generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18563–18573, 2025

2025

[22] [22]

Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek.arXiv preprint arXiv:2505.17702, 2025

Xueyang Li, Jiahao Li, Yu Song, Yunzhong Lou, and Xiangdong Zhou. Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek.arXiv preprint arXiv:2505.17702, 2025

arXiv 2025

[23] [23]

From intent to execution: Multimodal chain-of-thought reinforcement learning for precise cad code generation

Ke Niu, Haiyang Yu, Zhuofan Chen, Mengyang Zhao, Teng Fu, Bin Li, and Xiangyang Xue. From intent to execution: Multimodal chain-of-thought reinforcement learning for precise cad code generation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 8160–8167, 2026

2026

[24] [24]

Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021

Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences.ACM Transactions on Graphics (TOG), 40(4):1–24, 2021

2021

[25] [25]

Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing.arXiv preprint arXiv:2502.03997, 2025

Yu Yuan, Shizhao Sun, Qi Liu, and Jiang Bian. Cad-editor: A locate-then-infill framework with automated training data synthesis for text-based cad editing.arXiv preprint arXiv:2502.03997, 2025

arXiv 2025

[26] [26]

Pr-cad: Progressive refinement for unified controllable and faithful text-to-cad generation with large language models.arXiv preprint arXiv:2604.19773, 2026

Jiyuan An, Jiachen Zhao, Fan Chen, Liner Yang, Zhenghao Liu, Hongyan Wang, Weihua An, Meishan Zhang, and Erhong Yang. Pr-cad: Progressive refinement for unified controllable and faithful text-to-cad generation with large language models.arXiv preprint arXiv:2604.19773, 2026

Pith/arXiv arXiv 2026

[27] [27]

Caddesigner: Conceptual design of cad models based on general-purpose agent.arXiv preprint arXiv:2508.01031, 2025

Fengxiao Fan, Jingzhe Ni, Xiaolong Yin, Sirui Wang, Xingyu Lu, Qiang Zou, Ruofeng Tong, Min Tang, and Peng Du. Caddesigner: Conceptual design of cad models based on general-purpose agent.arXiv preprint arXiv:2508.01031, 2025

Pith/arXiv arXiv 2025

[28] [28]

Toolcad: Exploring tool-using large language models in text-to-cad generation with reinforcement learning.arXiv preprint arXiv:2604.07960, 2026

Yifei Gong, Xing Wu, Wenda Liu, and Kang Tu. Toolcad: Exploring tool-using large language models in text-to-cad generation with reinforcement learning.arXiv preprint arXiv:2604.07960, 2026

Pith/arXiv arXiv 2026

[29] [29]

Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

Pith/arXiv arXiv 2025

[30] [30]

Group sequence policy optimization.arXiv preprint arXiv:2507.18071, 2025

Chujie Zheng, Shixuan Liu, Mingze Li, Xiong-Hui Chen, Bowen Yu, Chang Gao, Kai Dang, Yuqiong Liu, Rui Men, An Yang, et al. Group sequence policy optimization.arXiv preprint arXiv:2507.18071, 2025

Pith/arXiv arXiv 2025

[31] [31]

Sensenova-mars: Empowering multimodal agentic reasoning and search via reinforcement learning.arXiv preprint arXiv:2512.24330, 2025

Yong Xien Chng, Tao Hu, Wenwen Tong, Xueheng Li, Jiandong Chen, Haojia Yu, Jiefan Lu, Hewei Guo, Hanming Deng, Chengjun Xie, et al. Sensenova-mars: Empowering multimodal agentic reasoning and search via reinforcement learning.arXiv preprint arXiv:2512.24330, 2025

arXiv 2025

[32] [32]

Swift: a scalable lightweight infrastructure for fine-tuning

Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, et al. Swift: a scalable lightweight infrastructure for fine-tuning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 29733–29735, 2025

2025

[33] [33]

Qwen3.5: Accelerating productivity with native multimodal agents, February 2026

Qwen Team. Qwen3.5: Accelerating productivity with native multimodal agents, February 2026. URL https://qwen.ai/blog?id=qwen3.5

2026

[34] [34]

Generating cad code with vision-language models for 3d designs

Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Haider Zaidi, Megan Langwasser, Wei Xu, and Matthew Gombolay. Generating cad code with vision-language models for 3d designs. InInternational Conference on Learning Representations, volume 2025, pages 52236–52262, 2025

2025

[35] [35]

sketch-and-extrude

OpenAI. Gpt-5.https://openai.com/gpt-5, 2025. 11 Appendix Contents A Related Work 12 A.1 CAD Representations and Generation . . . . . . . . . . . . . . . . . . . . . . . . 12 A.2 Multi-Turn CAD Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 B CAD Pairs Construction 13 B.1 Drawing-Code Pairs. . . . . . . . . . . . . . . . . . . . ....

arXiv 2025

[36] [36]

2.Text: modeling instructions, dimensional constraints, or edit requests

Technical Drawing Image: orthographic projections such as Front, Top, Side, and ISO views with dimensions. 2.Text: modeling instructions, dimensional constraints, or edit requests. 3.Existing Code: a CadQuery script that should be preserved or modified when possible. Objective.Create or edit a 3D model that satisfies the user request. Output Format.Always...

[37] [37]

– Plan: define the origin, workplanes, sketch sequence, Boolean operations, and key dimen- sions

Structure the<thinking>process strictly based on the provided inputs: •If no feedback is provided, such as initial generation or completely new instructions: – Requirement Analysis: break down visual or textual inputs into CadQuery features. – Plan: define the origin, workplanes, sketch sequence, Boolean operations, and key dimen- sions. •If feedback or e...

[38] [38]

Code Implementation Rules

If feedback explicitly confirms that the 3D model is correct with no remaining issues, briefly state the assessment in<thinking></thinking>, then output<DONE>. Code Implementation Rules. • Use Python as the programming language. • Use CadQuery withimport cadquery as cq. • Assign the final result to variabler. • If scaling is needed, define scale_factor an...