pith. sign in

arxiv: 2602.03045 · v2 · pith:TDPHGDNHnew · submitted 2026-02-03 · 💻 cs.LG

Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation

classification 💻 cs.LG
keywords agentproactiveclarificationmodelsspecificationagenticambiguousbefore
0
0 comments X
read the original abstract

Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural-language prompts. In practice, however, geometric descriptions can be under-specified or internally inconsistent: critical dimensions may be missing and constraints may conflict. However, existing fine-tuned models tend to reactively follow the user instructions and hallucinate dimensions when the text is ambiguous. To address this, we propose a proactive agentic framework for text-to-CadQuery generation, named as ProCAD, that resolves specification issues before code synthesis. Our framework pairs a proactive clarifying agent, which audits the prompt and asks targeted clarification questions only when necessary to produce a self-consistent specification, with a CAD coding agent that translates the specification into an executable CadQuery program. We fine-tune the coding agent based on a curated high-quality text-to-CadQuery dataset and train the clarifying agent via agentic SFT on clarification trajectories. Experiments show that proactive clarification significantly improves robustness to ambiguous prompts while keeping interaction overhead low. ProCAD outperforms frontier closed-source models, including Claude Sonnet 4.5, reducing the mean Chamfer distance by 79.9% and lowering the invalidity ratio from 4.8% to 0.9%. Our code and datasets are made publicly available on https://github.com/BoYuanVisionary/Pro-CAD.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing

    cs.AI 2026-06 unverdicted novelty 7.0

    IterCAD introduces a closed-loop multimodal agent for CAD generation and editing, trained via progressive SFT and geometry-aware RL with viable-prefix masking, and evaluated on IterCAD-Bench using a new CD-TR curve an...

  2. P3D-Bench: Benchmarking MLLMs for Parametric 3D Generation and Structural Reasoning

    cs.CV 2026-06 unverdicted novelty 7.0

    P3D-Bench is a benchmark with three task families that scores MLLMs on generating executable parametric 3D programs, finding failures in precise geometry and part assembly.