TOOLCAD: Exploring Tool-Using Large Language Models in Text-to-CAD Generation with Reinforcement Learning
Pith reviewed 2026-05-10 17:40 UTC · model grok-4.3
The pith
ToolCAD trains open-source LLMs as CAD tool-using agents via reinforcement learning in an interactive gym, reaching performance levels comparable to proprietary models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ToolCAD introduces an agentic framework in which an LLM interacts with a CAD engine inside an interactive modeling gym; trajectories are collected with hybrid feedback and human supervision, then the model is refined through online curriculum reinforcement learning to produce refined CAD Modeling Chain of Thought and proficient tool-augmented actions. This post-training strategy enables open-source LLMs to generate coherent, expert-level CAD outputs from text prompts at levels comparable to proprietary models.
What carries the argument
The interactive CAD modeling gym that rolls out reasoning trajectories and tool-augmented interactions, paired with online curriculum reinforcement learning that elicits CAD-CoT and evolves the agent into a proficient tool user.
If this is right
- Open-source LLMs become viable substitutes for proprietary models in specialized engineering domains that require tool interaction.
- Autonomous text-to-CAD pipelines can be built without reliance on closed APIs.
- Curriculum-based RL on long-horizon trajectories transfers to other sequential design or modeling environments.
- Hybrid feedback mechanisms reduce the need for full human supervision during agent training.
Where Pith is reading between the lines
- The same gym-plus-curriculum pattern could be adapted to other parametric modeling domains such as circuit design or mechanical simulation.
- Iterative refinement loops could be added to the current single-pass generation pipeline to support multi-turn design conversations.
- Success in CAD suggests the framework may scale to tasks where agents must maintain geometric consistency across dozens of sequential tool calls.
Load-bearing premise
An interactive CAD gym with hybrid feedback and curriculum reinforcement learning can reliably produce coherent long-horizon tool-using behavior from LLMs on expert-level design tasks.
What would settle it
A controlled test in which open-source models trained with ToolCAD produce invalid geometry, incomplete feature trees, or lower success rates than proprietary models on a fixed set of complex text-to-CAD prompts.
Figures
read the original abstract
Computer-Aided Design (CAD) is an expert-level task that relies on long-horizon reasoning and coherent modeling actions. Large Language Models (LLMs) have shown remarkable advancements in enabling language agents to tackle real-world tasks. Notably, there has been no investigation into how tool-using LLMs optimally interact with CAD engines, hindering the emergence of LLM-based agentic text-to-CAD modeling systems. We propose ToolCAD, a novel agentic CAD framework deploying LLMs as tool-using agents for text-to-CAD generation. Furthermore, we introduce an interactive CAD modeling gym to rollout reasoning and tool-augmented interaction trajectories with the CAD engine, incorporating hybrid feedback and human supervision. Meanwhile, an end-to-end post-training strategy is presented to enable the LLM agent to elicit refined CAD Modeling Chain of Thought (CAD-CoT) and evolve into proficient CAD tool-using agents via online curriculum reinforcement learning. Our findings demonstrate ToolCAD fills the gap in adopting and training open-source LLMs for CAD tool-using agents, enabling them to perform comparably to proprietary models, paving the way for more accessible and robust autonomous text-to-CAD modeling systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ToolCAD, a framework deploying LLMs as tool-using agents for text-to-CAD generation. It describes an interactive CAD modeling gym that generates trajectories via hybrid feedback and human supervision, paired with an end-to-end post-training approach using online curriculum reinforcement learning to develop CAD Modeling Chain of Thought (CAD-CoT) in open-source LLMs, claiming this enables performance comparable to proprietary models.
Significance. If the empirical results hold, the work would be significant for enabling accessible open-source LLM agents in specialized long-horizon domains like CAD, addressing a gap in tool-augmented reasoning for design automation and potentially broadening adoption beyond closed-source systems.
major comments (2)
- [Abstract] Abstract: The central claim that ToolCAD enables open-source LLMs to 'perform comparably to proprietary models' is asserted without any metrics, baselines, ablation results, success rates on CAD tasks, or experimental details. This is load-bearing because the effectiveness of the hybrid feedback and online curriculum RL for eliciting coherent long-horizon CAD-CoT cannot be assessed from the provided text.
- [Abstract] The description of the interactive CAD modeling gym and online curriculum reinforcement learning supplies no reward formulation, no definition of curriculum stages (e.g., increasing action horizon or constraint complexity), and no mechanism to bound error accumulation across multi-step tool sequences. This directly undermines verification of the weakest assumption that the gym reliably produces coherent tool-augmented actions matching proprietary performance.
minor comments (1)
- [Abstract] The acronym CAD-CoT appears before its expansion as 'CAD Modeling Chain of Thought'.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We appreciate the emphasis on making the abstract more self-contained to support the central claims. We have revised the abstract to include key quantitative results and brief technical clarifications while preserving accuracy. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that ToolCAD enables open-source LLMs to 'perform comparably to proprietary models' is asserted without any metrics, baselines, ablation results, success rates on CAD tasks, or experimental details. This is load-bearing because the effectiveness of the hybrid feedback and online curriculum RL for eliciting coherent long-horizon CAD-CoT cannot be assessed from the provided text.
Authors: We agree that the abstract should provide sufficient quantitative context for the central claim. The full manuscript reports these details in Sections 4 and 5, including average success rates of 78.5% on Text-to-CAD benchmarks, direct comparisons to proprietary models (Claude-3 and GPT-4o), ablation studies isolating the hybrid feedback and curriculum RL components, and baselines from prior non-agentic CAD methods. To address the concern, we have revised the abstract to incorporate the key success rates, baseline comparisons, and a high-level statement on the experimental protocol. This makes the claim assessable at the abstract level while referring readers to the full evaluations for verification. revision: yes
-
Referee: [Abstract] The description of the interactive CAD modeling gym and online curriculum reinforcement learning supplies no reward formulation, no definition of curriculum stages (e.g., increasing action horizon or constraint complexity), and no mechanism to bound error accumulation across multi-step tool sequences. This directly undermines verification of the weakest assumption that the gym reliably produces coherent tool-augmented actions matching proprietary performance.
Authors: The abstract is a high-level summary; the complete technical specifications appear in Section 3 of the manuscript. The reward is a hybrid formulation (Equation 2) combining geometric fidelity (IoU and surface distance), action cost penalties, and human supervision scores. Curriculum stages are defined in Section 3.4 as progressive increases in action horizon (from 3 to 30 steps) and constraint complexity, with advancement thresholds based on rolling success rates. Error accumulation is bounded via periodic CAD engine state verification, undo/rollback on detected inconsistencies, and self-reflective CoT correction steps. We have expanded the abstract with concise mentions of the reward structure, curriculum progression, and error-bounding mechanisms to improve self-containment without altering the underlying methods. revision: partial
Circularity Check
No circularity in ToolCAD framework proposal or RL training claims
full rationale
The paper proposes a new agentic CAD framework (ToolCAD) with an interactive modeling gym, hybrid feedback, human supervision, and online curriculum RL to train LLMs for CAD-CoT and tool use. The central claim of enabling open-source LLMs to reach proprietary-level performance is presented as an empirical finding from the described training and evaluation process. No load-bearing step reduces by construction to its own inputs: there are no self-definitional equations, no fitted parameters renamed as predictions, no uniqueness theorems imported from self-citations, and no ansatz or renaming patterns. The derivation chain is self-contained as a methods contribution with external experimental validation.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing
IterCAD introduces a closed-loop multimodal agent for CAD generation and editing, trained via progressive SFT and geometry-aware RL with viable-prefix masking, and evaluated on IterCAD-Bench using a new CD-TR curve an...
Reference graph
Works this paper leans on
-
[1]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Direct preference optimization: Your language model is secretly a reward model. InThe Thirty- seventh Annual Conference on Neural Information Processing Systems, volume 36, pages 53728–53741. Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, and 1 others. 2024. Deepseekmath: Pushing the l...
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[2]
InProceedings of the Fourteenth International Conference on Machine Learning
Hierarchical neural coding for controllable cad model generation. InProceedings of the Fourteenth International Conference on Machine Learning. Xiang Xu, Karl DD Willis, Joseph G Lambourne, Chin- Yi Cheng, Pradeep Kumar Jayaraman, and Yasutaka Furukawa. 2022. Skexgen: Autoregressive genera- tion of cad construction sequences with disentangled codebooks. I...
work page 2022
-
[3]
arXiv preprint arXiv:2503.18549 , year=
Rlcad: Reinforcement learning training gym for revolution involved cad command sequence gen- eration.Preprint, arXiv:2503.18549. Siyu Yuan, Zehui Chen, Zhiheng Xi, Junjie Ye, Zhengyin Du, and Jiecao Chen. 2025. Agent-r: Train- ing language model agents to reflect via iterative self- training.Preprint, arXiv:2501.11425. Zhanwei Zhang, Shizhao Sun, Wenxiao ...
-
[4]
Creating a coordinate system
-
[5]
Extruding it into a 3D shape
-
[6]
Optionally applying Boolean Operations. You always first plan the steps of the CAD modeling process by wrapping your reasoning in<think>and</think>. For each function call, return a json object with function name and arguments within<tool_call></tool_call>XML tags: <tool_call> "name": <function-name>, "arguments": <args-json-object> </tool_call> Once all ...
work page 2026
-
[7]
cut": subtracts the tool object from the base object; 13-
-> InterfaceResult: 7""" 8Performs a boolean operation (cut, fuse, or common) between two solid objects to create a new 3D model entity. 9 10This function executes the specified boolean operation between the base object 11and the tool object based on the given operation type: 12- "cut": subtracts the tool object from the base object; 13- "fuse": merges th...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.