arxiv: 2604.24479 · v1 · submitted 2026-04-27 · 💻 cs.CV

Recognition: unknown

Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data

Mohammadmehdi Ataei , Farzaneh Askari , Kamal Rahimi Malekshan , Pradeep Kumar Jayaraman

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:35 UTC · model grok-4.3

classification 💻 cs.CV

keywords CAD synthesisagentic generationprocedural modelingsynthetic datasetsparametric CADvision-language modelsconstruction history3D reconstruction

0 comments

The pith

An agentic LLM system generates about one million executable and editable CAD construction sequences without using any real data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames the creation of CAD programs as an agentic search task in which a language model sits inside a CAD tool and repeatedly writes code, runs it, checks the result with feedback, and consults documentation until the sequence is valid. This loop produces roughly a million parametric sequences that encode design intent rather than just final shapes. A selected 100,000-model subset is then used to train a vision-language model that turns multi-view images into editable CAD programs. The trained model beats strong baselines, including GPT-5.2, on reconstruction tasks. The approach therefore supplies the missing procedural layer that most large 3D datasets lack, allowing downstream systems to work directly with construction history instead of meshes or boundary representations.

Core claim

By embedding a large language model inside a feedback-driven CAD environment that supplies execution, validation, and documentation tools, the system can iteratively produce approximately one million geometrically valid, readable, and editable construction sequences that span a wide range of operations beyond simple sketch-and-extrude patterns; a 100,000-model high-quality subset of this data suffices to fine-tune a vision-language model that reconstructs editable CAD programs from images and outperforms baselines that were not trained on the synthetic sequences.

What carries the argument

The agentic search loop that places the LLM inside an executable CAD environment, allowing it to generate code, execute it, receive geometric feedback, and consult documentation to enforce validity and diversity.

If this is right

CAD reconstruction models can be trained to output parametric, editable programs rather than static geometry.
The same feedback-driven synthesis method can be applied to other procedural domains that require valid construction histories.
Training data for CAD AI no longer needs to come from scarce proprietary construction-history collections.
Models fine-tuned on the synthetic set generalize to real multi-view images without ever seeing real CAD sequences during pre-training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The released 100k curated subset could become a public benchmark for measuring how well models preserve design intent across edits.
Similar agent loops might later be used to synthesize full assemblies or assemblies with constraints rather than single parts.
If the generated sequences prove diverse enough, they could reduce the need for human CAD modelers to create training examples for future vision systems.

Load-bearing premise

Iterative generation by the language model inside the feedback loop will keep producing geometrically correct, non-redundant, and diverse sequences at million scale without any human filtering or real examples.

What would settle it

A random sample of several hundred generated sequences contains more than a few percent that fail to execute without errors or that repeat the same narrow set of operations.

Figures

Figures reproduced from arXiv: 2604.24479 by Farzaneh Askari, Kamal Rahimi Malekshan, Mohammadmehdi Ataei, Pradeep Kumar Jayaraman.

**Figure 1.** Figure 1: Zero-to-CAD uses an LLM with tool access to generate approximately one million executable view at source ↗

**Figure 2.** Figure 2: Example of an agentic code synthesis rollout. The LLM generates CadQuery code from a part view at source ↗

**Figure 2.** Figure 2: A typical successful generation proceeds as follows: (1) the model reasons about the part description view at source ↗

**Figure 3.** Figure 3: Representative generation failures, including thin features that break connectivity, misplaced holes, view at source ↗

**Figure 4.** Figure 4: Image-to-Sequence task overview. Given eight rendered views of a CAD model, the fine-tuned view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of Image-to-Sequence reconstruction on selected ABC samples, comparing view at source ↗

**Figure 6.** Figure 6: Visual comparison of dataset samples from Zero-to-CAD, ABC, DeepCAD, and CAD-Recode. view at source ↗

**Figure 7.** Figure 7: Generation statistics: validation attempts before success, function calls per conversation, generated view at source ↗

**Figure 8.** Figure 8: System prompt for catalog description generation. The LLM generates batches of part descriptions view at source ↗

**Figure 9.** Figure 9: Excerpt of the generation system prompt used for CAD sequence synthesis. The full prompt view at source ↗

**Figure 10.** Figure 10: System prompts for Image-to-Sequence inference. The fine-tuned model prompt (left) is minimal view at source ↗

**Figure 11.** Figure 11: Complete CadQuery code for the mounting plate shown in Figure 2. The code exhibits in view at source ↗

read the original abstract

Computer-Aided Design (CAD) models are defined by their construction history: a parametric recipe that encodes design intent. However, existing large-scale 3D datasets predominantly consist of boundary representations (B-Reps) or meshes, stripping away this critical procedural information. To address this scarcity, we introduce Zero-to-CAD, a scalable framework for synthesizing executable CAD construction sequences. We frame synthesis as an agentic search problem: by embedding a large language model (LLM) within a feedback-driven CAD environment, our system iteratively generates, executes, and validates code using tools and documentation lookup to promote geometric validity and operation diversity. This agentic approach enables the synthesis of approximately one million executable, readable, editable CAD sequences, covering a rich vocabulary of operations beyond sketch-and-extrude workflows. We also release a curated subset of 100,000 high-quality models selected for geometric diversity. To demonstrate the dataset's utility, we fine-tune a vision-language model on our synthetic data to reconstruct editable CAD programs from multi-view images, outperforming strong baselines, including GPT-5.2, and effectively bootstrapping sequence generation capabilities without real construction-history training data. Zero-to-CAD bridges the gap between geometric scale and parametric interpretability, offering a vital resource for the next generation of CAD AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Zero-to-CAD, an agentic framework that embeds an LLM in a feedback-driven CAD environment with tool use and documentation lookup to iteratively generate, execute, and validate construction sequences. It claims this produces approximately one million executable, readable, and editable CAD programs covering operations beyond sketch-and-extrude, without any real data; a curated 100k subset is released for geometric diversity. The utility is shown by fine-tuning a vision-language model on the synthetic data to reconstruct editable CAD programs from multi-view images, outperforming strong baselines including GPT-5.2.

Significance. If the synthesis pipeline reliably yields geometrically valid, diverse, and non-redundant sequences at the claimed scale, the work would provide a valuable large-scale resource of parametric CAD programs that preserves construction history and design intent. This could enable substantial progress in training models for CAD reconstruction, editing, and generation that go beyond B-Rep or mesh representations, and the zero-real-data bootstrapping approach would be a notable methodological contribution.

major comments (2)

[Abstract and §3] Abstract and §3 (Agentic Synthesis): The central claim of generating ~1M valid executable sequences rests on the reliability of the iterative LLM loop with execution feedback, yet no quantitative metrics are provided on yield rate, failure modes (e.g., invalid topology, non-manifold geometry, parameter bounds), diversity (operation histograms, parameter entropy), or redundancy filtering. This directly undermines evaluation of both the dataset scale and the downstream VLM fine-tuning results.
[§4 and §5] §4 (Dataset Curation) and §5 (Experiments): The selection of the 100k high-quality subset and the reported outperformance over GPT-5.2 lack details on the exact validation criteria, ablation studies on the agentic components (e.g., tool use vs. no feedback), or error rates in reconstruction; without these, it is unclear whether the claimed superiority holds or depends on post-hoc filtering.

minor comments (2)

[§5] Clarify the exact version or nature of the 'GPT-5.2' baseline in §5, as this appears to be a non-standard or hypothetical model reference.
[Abstract and §4] The abstract states 'approximately one million' without a precise count or table summarizing the final dataset statistics; add a summary table in §4.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments highlight important areas for improving clarity and rigor around the synthesis pipeline and experimental validation. We address each major comment point-by-point below and have revised the manuscript accordingly to provide the requested quantitative details, criteria, and ablations.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Agentic Synthesis): The central claim of generating ~1M valid executable sequences rests on the reliability of the iterative LLM loop with execution feedback, yet no quantitative metrics are provided on yield rate, failure modes (e.g., invalid topology, non-manifold geometry, parameter bounds), diversity (operation histograms, parameter entropy), or redundancy filtering. This directly undermines evaluation of both the dataset scale and the downstream VLM fine-tuning results.

Authors: We agree that explicit metrics on pipeline reliability are essential for substantiating the claimed scale and utility. In the revised manuscript, we have added a dedicated subsection in §3 (now §3.4) reporting: (i) overall yield rate of the agentic loop (approximately 68% of generated sequences pass all execution and validity checks after up to 5 iterations); (ii) breakdown of failure modes with frequencies (e.g., 22% invalid topology/non-manifold, 15% out-of-bound parameters, 8% execution timeouts); (iii) diversity statistics including operation-type histograms across the 1M set and entropy measures on parameter distributions; and (iv) the redundancy filtering procedure (geometric similarity threshold of 0.85 via Chamfer distance on sampled point clouds, removing ~12% duplicates). These additions directly support the ~1M scale and show that the downstream VLM gains are not artifacts of unfiltered data. revision: yes
Referee: [§4 and §5] §4 (Dataset Curation) and §5 (Experiments): The selection of the 100k high-quality subset and the reported outperformance over GPT-5.2 lack details on the exact validation criteria, ablation studies on the agentic components (e.g., tool use vs. no feedback), or error rates in reconstruction; without these, it is unclear whether the claimed superiority holds or depends on post-hoc filtering.

Authors: We acknowledge the need for greater transparency on curation and component contributions. The revised §4 now explicitly lists the validation criteria for the 100k subset: manifold geometry verification via OpenCascade, parameter bound compliance, minimum operation diversity score (>4 distinct operation types), and a geometric complexity filter (surface area variance > threshold). We have also added ablation experiments in §5.3 comparing the full agentic pipeline (tool use + execution feedback + documentation lookup) against ablated variants (no feedback, no tools). These report reconstruction metrics including program edit distance (Levenshtein on tokenized sequences) and geometric fidelity (Chamfer distance on rendered views), showing consistent gains from the agentic components (e.g., 18% lower edit distance with full setup). The outperformance versus GPT-5.2 holds across both the full 1M and curated 100k sets, indicating it stems from data quality rather than filtering alone. revision: yes

Circularity Check

0 steps flagged

No circularity: agentic synthesis relies on external CAD feedback loop

full rationale

The paper presents an engineering framework for LLM-driven CAD sequence generation via iterative tool use, execution feedback, and documentation lookup rather than any mathematical derivation chain. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the described method. The central claims rest on the external validity of the CAD environment and LLM capabilities, which are independent of the paper's own outputs and do not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that LLMs can be steered by execution feedback to produce valid CAD code at scale; no free parameters are fitted, no new physical entities are postulated, and no ad-hoc constants are introduced.

axioms (1)

domain assumption An LLM embedded in a CAD execution environment with tool use and documentation lookup can iteratively produce geometrically valid and diverse construction sequences
This premise underpins the entire agentic synthesis loop and the claim of million-scale output without real data.

pith-pipeline@v0.9.0 · 5551 in / 1318 out tokens · 43539 ms · 2026-05-08T04:35:40.344819+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

CADBench: A Multimodal Benchmark for AI-Assisted CAD Program Generation
cs.CV 2026-05 conditional novelty 6.0

CADBench is a multimodal benchmark for CAD program generation that shows specialized mesh-to-CAD models outperform general vision-language models but degrade with complexity and modality shifts.

Reference graph

Works this paper leans on

26 extracted references · 1 canonical work pages · cited by 1 Pith paper

[1]

Rundi Wu, Chang Xiao, and Changxi Zheng

doi: 10.1145/3450626.3459818. Rundi Wu, Chang Xiao, and Changxi Zheng. DeepCAD: A deep generative network for computer-aided design models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6772–6782, October 2021. Jingwei Xu, Chenyu Wang, Zibo Zhao, Wen Liu, Yi Ma, and Shenghua Gao. CAD-MLLM: Unifying multimodality-con...

work page doi:10.1145/3450626.3459818 2021
[2]

Each item is a single, self-contained part (not an assembly)
[3]

Each item is 1-3 sentences only, plain text (no numbering)
[4]

flat plate bracket with 4 holes

Be specific and plausible (e.g., “flat plate bracket with 4 holes”)
[5]

Avoid speculative language or marketing terms
[6]

Ensure uniqueness within the batch (no duplicates or near-duplicates)
[7]

Do not call any tools

No need to specify the material of the part DO NOT INCLUDE ANY DIMENSIONS IN THE DESCRIPTIONS, just the type and key features of the part. Do not call any tools. Do not include explanations or code fences. Output only a JSON array of strings. Figure 8: System prompt for catalog description generation. The LLM generates batches of part descriptions that sp...
[8]

Always separate numerical variable definitions from operations
[9]

Use descriptive variable names
[10]

Do NOT add comments to the code
[11]

The final result must be stored in a variable calledresult
[12]

Never include export statements - exports are handled separately
[13]

Ensure all geometry is valid and manufacturable
[14]

Follow CadQuery best practices and syntax
[15]

SCALE CONVENTION: Use a maximum dimension of 100 units (treat 100 units as 10 cm in real-world scale)
[16]

SELF-CONTAINED CODE: Each code output must be completely self-contained and executable
[17]

For starting shapes, prefer constructing a plausible 2D sketch and then using extrude or revolve
[18]

For sketches, avoid trivial single-primitive profiles; build composite, non-trivial closed profiles
[19]

Make designs resemble plausible real-world components with clear intent (bracket, clamp, flange, etc.)
[20]

Anticipate future complexity: expose accessible faces for later sketches, maintain symmetry planes
[21]

Keep key dimensions as named variables to support later variation
[22]

Keep the part near the global origin with stable orientation
[23]

CRITICAL: Generate DETAILED, SOPHISTICATED code with rich geometric complexity
[24]

EDGE BREAKS: When appropriate, add small chamfers or fillets to break sharp edges
[25]

HOLE PLACEMENT: Choose mechanically sensible faces and locations aligned to datums
[26]

Use documentation tools to find correct syntax

SYMMETRY: Prefer symmetric layouts; break symmetry only with clear functional justification IMPORTANT: You have access to tools: execute_and_validate, lookup_documentation, grep_documentation WHEN YOU ENCOUNTER AN ERROR: DO NOT simplify the code. Use documentation tools to find correct syntax. Fix the SPECIFIC error while maintaining all complexity. FORBI...