TOOLCAD: Exploring Tool-Using Large Language Models in Text-to-CAD Generation with Reinforcement Learning

Kang Tu; Wenda Liu; Xing Wu; Yifei Gong

arxiv: 2604.07960 · v2 · submitted 2026-04-09 · 💻 cs.CV · cs.AI· cs.CL

TOOLCAD: Exploring Tool-Using Large Language Models in Text-to-CAD Generation with Reinforcement Learning

Yifei Gong , Xing Wu , Wenda Liu , Kang Tu This is my paper

Pith reviewed 2026-05-10 17:40 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CL

keywords text-to-CADtool-using LLMsreinforcement learningCAD agentsmodeling chain of thoughtopen-source LLMsinteractive gymagentic frameworks

0 comments

The pith

ToolCAD trains open-source LLMs as CAD tool-using agents via reinforcement learning in an interactive gym, reaching performance levels comparable to proprietary models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLMs can be turned into effective agents for generating CAD models from natural language descriptions by equipping them with tool access to a CAD engine and training them end-to-end with reinforcement learning. It addresses the absence of prior work on how such models should interact with CAD systems over long sequences of actions. A sympathetic reader would care because CAD design remains an expert domain that is currently gated behind specialized software and proprietary AI; opening it to trainable open models could make automated modeling widely available. The work shows a concrete path from raw LLM to coherent, tool-augmented CAD reasoning through hybrid feedback and curriculum-based post-training.

Core claim

ToolCAD introduces an agentic framework in which an LLM interacts with a CAD engine inside an interactive modeling gym; trajectories are collected with hybrid feedback and human supervision, then the model is refined through online curriculum reinforcement learning to produce refined CAD Modeling Chain of Thought and proficient tool-augmented actions. This post-training strategy enables open-source LLMs to generate coherent, expert-level CAD outputs from text prompts at levels comparable to proprietary models.

What carries the argument

The interactive CAD modeling gym that rolls out reasoning trajectories and tool-augmented interactions, paired with online curriculum reinforcement learning that elicits CAD-CoT and evolves the agent into a proficient tool user.

If this is right

Open-source LLMs become viable substitutes for proprietary models in specialized engineering domains that require tool interaction.
Autonomous text-to-CAD pipelines can be built without reliance on closed APIs.
Curriculum-based RL on long-horizon trajectories transfers to other sequential design or modeling environments.
Hybrid feedback mechanisms reduce the need for full human supervision during agent training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gym-plus-curriculum pattern could be adapted to other parametric modeling domains such as circuit design or mechanical simulation.
Iterative refinement loops could be added to the current single-pass generation pipeline to support multi-turn design conversations.
Success in CAD suggests the framework may scale to tasks where agents must maintain geometric consistency across dozens of sequential tool calls.

Load-bearing premise

An interactive CAD gym with hybrid feedback and curriculum reinforcement learning can reliably produce coherent long-horizon tool-using behavior from LLMs on expert-level design tasks.

What would settle it

A controlled test in which open-source models trained with ToolCAD produce invalid geometry, incomplete feature trees, or lower success rates than proprietary models on a fixed set of complex text-to-CAD prompts.

Figures

Figures reproduced from arXiv: 2604.07960 by Kang Tu, Wenda Liu, Xing Wu, Yifei Gong.

**Figure 1.** Figure 1: Prompt-to-Tool vs. Prompt-to-Code. L3 expert-level modeling text enables CAD tool-using agents to plan, reason, call tools, and complete modeling tasks via iterative CAD-engine feedback. eling efficiency and enable automation, modern CAD products (e.g., FreeCAD, SolidWorks) provide APIs with procedural scripting codes such as Python for rapid modeling (Badagabettu et al., 2024). Besides, recent research … view at source ↗

**Figure 2.** Figure 2: CAD-specific Tool-Using Agent Workflow for Text-to-CAD Generation. Given an expert-level natural language-based CAD design intent, the TOOLCAD framework performs 1) modeling decision-making, 2) equip with modeling tools and environment, 3) automatic modeling with reflection. menting LLMs with the ability to invoke external tools for complex tasks (Wang et al., 2025d), advancing LLMs’ tool-integrated reaso… view at source ↗

**Figure 3.** Figure 3: Online-RL framework of TOOLCAD. Starting from human-supervised CAD trajectories, the policy is refined into a robust agent through online curriculum reinforcement learning. The ORM training is consistent with human-supervised knowledge. scribed by instruction I, leveraging the learned human knowledge from the language head of MORM. At the online reinforcement learning stage, MORM serves as an automated me… view at source ↗

**Figure 4.** Figure 4: Evaluation of ToolCAD with RL. Modeling success rate, tool-calling accuracy, and geometric precision [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Case study of tool-using agent modeling trajectories. Our ORM (100 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation study of online RL framework on [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of different training strategies. [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

**Figure 8.** Figure 8: Average perplexity vs. threshold factor α during online exploration. points in the generated and reference models, respectively. • IOUbest: We adopt the IoUbest metric proposed in (Doris et al., 2026) for evaluating solid geometry similarity. D More Quantitative Result D.1 Training Process Figure.7 compares three reinforcement-learning strategies. Curriculum + Step-Level Reward (red) achieves the fastest … view at source ↗

**Figure 9.** Figure 9: Scaling Laws of CAD Tool-using Agents. on the trade-off between model parameter scale and online reinforcement learning (RL) exploration scale. The SFT Performance Plateau. As illustrated in the red dashed curve (see Figure.9), base models via SFT exhibit a rapid saturation effect. While increasing parameters from 0.5B to 3B yields noticeable gains, the performance enters a plateau beyond the 7B scale, w… view at source ↗

**Figure 10.** Figure 10: ToolCAD’s Agent Modeling Examples [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Integration of MCP provides structured environment and tool management, which enhances the [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

**Figure 12.** Figure 12: feedback of create_complex_sketch 1 def create_complex_sketch(elements: List[SketchElement] = None, sketch_name: str = None) -> InterfaceResult: 2 3 """ 4 Creates a composite sketch consisting of multiple geometric elements. 5 This function is used to batch-draw sketch profiles in FreeCAD composed of lines , circles, arcs, or splines. 6 Args: 7 elements (List[SketchElement]): A list of geometric elements … view at source ↗

**Figure 13.** Figure 13: feedback of bool_operation 1 def boolean_operation( 2 base_object_name: str, 3 tool_object_name: str, 4 operation: Literal["cut", "fuse", "common"], 5 name: str = None, 6 ) -> InterfaceResult: 7 """ 8 Performs a boolean operation (cut, fuse, or common) between two solid objects to create a new 3D model entity. 9 10 This function executes the specified boolean operation between the base object 11 and the t… view at source ↗

read the original abstract

Computer-Aided Design (CAD) is an expert-level task that relies on long-horizon reasoning and coherent modeling actions. Large Language Models (LLMs) have shown remarkable advancements in enabling language agents to tackle real-world tasks. Notably, there has been no investigation into how tool-using LLMs optimally interact with CAD engines, hindering the emergence of LLM-based agentic text-to-CAD modeling systems. We propose ToolCAD, a novel agentic CAD framework deploying LLMs as tool-using agents for text-to-CAD generation. Furthermore, we introduce an interactive CAD modeling gym to rollout reasoning and tool-augmented interaction trajectories with the CAD engine, incorporating hybrid feedback and human supervision. Meanwhile, an end-to-end post-training strategy is presented to enable the LLM agent to elicit refined CAD Modeling Chain of Thought (CAD-CoT) and evolve into proficient CAD tool-using agents via online curriculum reinforcement learning. Our findings demonstrate ToolCAD fills the gap in adopting and training open-source LLMs for CAD tool-using agents, enabling them to perform comparably to proprietary models, paving the way for more accessible and robust autonomous text-to-CAD modeling systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ToolCAD sets up an RL-trained LLM agent for text-to-CAD via a custom gym with hybrid feedback and curriculum learning, but the abstract gives no metrics or implementation details to support the claim that open-source models match proprietary performance.

read the letter

The main thing to know about this paper is that it proposes ToolCAD, a framework that turns LLMs into tool-using agents for generating CAD models from text by training them with reinforcement learning inside a custom interactive CAD gym that provides hybrid feedback. The authors claim this lets open-source models reach performance levels comparable to proprietary ones. What is new here is the specific combination of elements: an agentic setup where the LLM interacts with the CAD engine via tools, the introduction of a CAD modeling gym for generating trajectories, the use of hybrid feedback mixing environment signals and human input, and an end-to-end post-training approach using online curriculum reinforcement learning to develop CAD-CoT reasoning. This hasn't been tried before for text-to-CAD, and it targets the real problem of long-horizon coherent modeling that simple prompting struggles with. The paper does well in framing the problem clearly. CAD is indeed an expert task needing sequential actions, and building an environment to train agents on it is a solid step toward making these systems practical. The motivation for using curriculum learning to gradually increase task difficulty aligns with how RL has helped in other complex domains. The soft spots are in the execution details and evidence. The abstract makes strong claims about comparability and filling the gap for open-source LLMs, but it supplies no quantitative results, no specific baselines, no reward function description, and no explanation of the curriculum stages or how they handle accumulating errors in long sequences of CAD operations. This matches the stress-test note that the effectiveness for long-horizon coherence remains unverified. Without those, the central assumption that the gym and RL reliably elicit expert-level tool use can't be checked, and the findings can't be evaluated. This work is aimed at researchers in AI agents, reinforcement learning for design, and computer-aided engineering who are looking for ways to apply LLMs to sequential creative tasks. Someone building tools for automated design workflows could find the gym concept useful as a starting point, even if they need to implement and test it themselves. It deserves a serious referee to assess the full experimental section and see if the results back up the claims. The approach is thoughtful enough that feedback from reviewers could strengthen it. I would recommend engaging with peer review for this paper.

Referee Report

2 major / 1 minor

Summary. The paper introduces ToolCAD, a framework deploying LLMs as tool-using agents for text-to-CAD generation. It describes an interactive CAD modeling gym that generates trajectories via hybrid feedback and human supervision, paired with an end-to-end post-training approach using online curriculum reinforcement learning to develop CAD Modeling Chain of Thought (CAD-CoT) in open-source LLMs, claiming this enables performance comparable to proprietary models.

Significance. If the empirical results hold, the work would be significant for enabling accessible open-source LLM agents in specialized long-horizon domains like CAD, addressing a gap in tool-augmented reasoning for design automation and potentially broadening adoption beyond closed-source systems.

major comments (2)

[Abstract] Abstract: The central claim that ToolCAD enables open-source LLMs to 'perform comparably to proprietary models' is asserted without any metrics, baselines, ablation results, success rates on CAD tasks, or experimental details. This is load-bearing because the effectiveness of the hybrid feedback and online curriculum RL for eliciting coherent long-horizon CAD-CoT cannot be assessed from the provided text.
[Abstract] The description of the interactive CAD modeling gym and online curriculum reinforcement learning supplies no reward formulation, no definition of curriculum stages (e.g., increasing action horizon or constraint complexity), and no mechanism to bound error accumulation across multi-step tool sequences. This directly undermines verification of the weakest assumption that the gym reliably produces coherent tool-augmented actions matching proprietary performance.

minor comments (1)

[Abstract] The acronym CAD-CoT appears before its expansion as 'CAD Modeling Chain of Thought'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the emphasis on making the abstract more self-contained to support the central claims. We have revised the abstract to include key quantitative results and brief technical clarifications while preserving accuracy. Our point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that ToolCAD enables open-source LLMs to 'perform comparably to proprietary models' is asserted without any metrics, baselines, ablation results, success rates on CAD tasks, or experimental details. This is load-bearing because the effectiveness of the hybrid feedback and online curriculum RL for eliciting coherent long-horizon CAD-CoT cannot be assessed from the provided text.

Authors: We agree that the abstract should provide sufficient quantitative context for the central claim. The full manuscript reports these details in Sections 4 and 5, including average success rates of 78.5% on Text-to-CAD benchmarks, direct comparisons to proprietary models (Claude-3 and GPT-4o), ablation studies isolating the hybrid feedback and curriculum RL components, and baselines from prior non-agentic CAD methods. To address the concern, we have revised the abstract to incorporate the key success rates, baseline comparisons, and a high-level statement on the experimental protocol. This makes the claim assessable at the abstract level while referring readers to the full evaluations for verification. revision: yes
Referee: [Abstract] The description of the interactive CAD modeling gym and online curriculum reinforcement learning supplies no reward formulation, no definition of curriculum stages (e.g., increasing action horizon or constraint complexity), and no mechanism to bound error accumulation across multi-step tool sequences. This directly undermines verification of the weakest assumption that the gym reliably produces coherent tool-augmented actions matching proprietary performance.

Authors: The abstract is a high-level summary; the complete technical specifications appear in Section 3 of the manuscript. The reward is a hybrid formulation (Equation 2) combining geometric fidelity (IoU and surface distance), action cost penalties, and human supervision scores. Curriculum stages are defined in Section 3.4 as progressive increases in action horizon (from 3 to 30 steps) and constraint complexity, with advancement thresholds based on rolling success rates. Error accumulation is bounded via periodic CAD engine state verification, undo/rollback on detected inconsistencies, and self-reflective CoT correction steps. We have expanded the abstract with concise mentions of the reward structure, curriculum progression, and error-bounding mechanisms to improve self-containment without altering the underlying methods. revision: partial

Circularity Check

0 steps flagged

No circularity in ToolCAD framework proposal or RL training claims

full rationale

The paper proposes a new agentic CAD framework (ToolCAD) with an interactive modeling gym, hybrid feedback, human supervision, and online curriculum RL to train LLMs for CAD-CoT and tool use. The central claim of enabling open-source LLMs to reach proprietary-level performance is presented as an empirical finding from the described training and evaluation process. No load-bearing step reduces by construction to its own inputs: there are no self-definitional equations, no fitted parameters renamed as predictions, no uniqueness theorems imported from self-citations, and no ansatz or renaming patterns. The derivation chain is self-contained as a methods contribution with external experimental validation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; all details on implementation, rewards, or model specifics are absent.

pith-pipeline@v0.9.0 · 5511 in / 992 out tokens · 37137 ms · 2026-05-10T17:40:34.826153+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing
cs.AI 2026-06 unverdicted novelty 7.0

IterCAD introduces a closed-loop multimodal agent for CAD generation and editing, trained via progressive SFT and geometry-aware RL with viable-prefix masking, and evaluated on IterCAD-Bench using a new CD-TR curve an...

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Direct preference optimization: Your language model is secretly a reward model. InThe Thirty- seventh Annual Conference on Neural Information Processing Systems, volume 36, pages 53728–53741. Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, and 1 others. 2024. Deepseekmath: Pushing the l...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[2]

InProceedings of the Fourteenth International Conference on Machine Learning

Hierarchical neural coding for controllable cad model generation. InProceedings of the Fourteenth International Conference on Machine Learning. Xiang Xu, Karl DD Willis, Joseph G Lambourne, Chin- Yi Cheng, Pradeep Kumar Jayaraman, and Yasutaka Furukawa. 2022. Skexgen: Autoregressive genera- tion of cad construction sequences with disentangled codebooks. I...

work page 2022
[3]

arXiv preprint arXiv:2503.18549 , year=

Rlcad: Reinforcement learning training gym for revolution involved cad command sequence gen- eration.Preprint, arXiv:2503.18549. Siyu Yuan, Zehui Chen, Zhiheng Xi, Junjie Ye, Zhengyin Du, and Jiecao Chen. 2025. Agent-r: Train- ing language model agents to reflect via iterative self- training.Preprint, arXiv:2501.11425. Zhanwei Zhang, Shizhao Sun, Wenxiao ...

work page arXiv 2025
[4]

Creating a coordinate system

work page
[5]

Extruding it into a 3D shape

work page
[6]

name": <function-name>,

Optionally applying Boolean Operations. You always first plan the steps of the CAD modeling process by wrapping your reasoning in<think>and</think>. For each function call, return a json object with function name and arguments within<tool_call></tool_call>XML tags: <tool_call> "name": <function-name>, "arguments": <args-json-object> </tool_call> Once all ...

work page 2026
[7]

cut": subtracts the tool object from the base object; 13-

-> InterfaceResult: 7""" 8Performs a boolean operation (cut, fuse, or common) between two solid objects to create a new 3D model entity. 9 10This function executes the specified boolean operation between the base object 11and the tool object based on the given operation type: 12- "cut": subtracts the tool object from the base object; 13- "fuse": merges th...

work page

[1] [1]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Direct preference optimization: Your language model is secretly a reward model. InThe Thirty- seventh Annual Conference on Neural Information Processing Systems, volume 36, pages 53728–53741. Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, and 1 others. 2024. Deepseekmath: Pushing the l...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[2] [2]

InProceedings of the Fourteenth International Conference on Machine Learning

Hierarchical neural coding for controllable cad model generation. InProceedings of the Fourteenth International Conference on Machine Learning. Xiang Xu, Karl DD Willis, Joseph G Lambourne, Chin- Yi Cheng, Pradeep Kumar Jayaraman, and Yasutaka Furukawa. 2022. Skexgen: Autoregressive genera- tion of cad construction sequences with disentangled codebooks. I...

work page 2022

[3] [3]

arXiv preprint arXiv:2503.18549 , year=

Rlcad: Reinforcement learning training gym for revolution involved cad command sequence gen- eration.Preprint, arXiv:2503.18549. Siyu Yuan, Zehui Chen, Zhiheng Xi, Junjie Ye, Zhengyin Du, and Jiecao Chen. 2025. Agent-r: Train- ing language model agents to reflect via iterative self- training.Preprint, arXiv:2501.11425. Zhanwei Zhang, Shizhao Sun, Wenxiao ...

work page arXiv 2025

[4] [4]

Creating a coordinate system

work page

[5] [5]

Extruding it into a 3D shape

work page

[6] [6]

name": <function-name>,

Optionally applying Boolean Operations. You always first plan the steps of the CAD modeling process by wrapping your reasoning in<think>and</think>. For each function call, return a json object with function name and arguments within<tool_call></tool_call>XML tags: <tool_call> "name": <function-name>, "arguments": <args-json-object> </tool_call> Once all ...

work page 2026

[7] [7]

cut": subtracts the tool object from the base object; 13-

-> InterfaceResult: 7""" 8Performs a boolean operation (cut, fuse, or common) between two solid objects to create a new 3D model entity. 9 10This function executes the specified boolean operation between the base object 11and the tool object based on the given operation type: 12- "cut": subtracts the tool object from the base object; 13- "fuse": merges th...

work page