arxiv: 2604.10992 · v2 · submitted 2026-04-13 · 💻 cs.CV

ArtiCAD: Articulated CAD Assembly Design via Multi-Agent Code Generation

Yuan Shui , Yandong Guan , Zhanwei Zhang , Juncheng Hu , Jing Zhang , Dong Xu , Qian Yu This is my paper

Pith reviewed 2026-05-10 15:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords articulated CADmulti-agent systemcode generationCAD assemblytext-to-CADimage-to-CADconnector prediction

0 comments

The pith

A training-free multi-agent system generates editable articulated CAD assemblies from text or images by predicting connectors early.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ArtiCAD as a way to create multi-part, movable CAD models directly from high-level descriptions without any model training. It splits the work across four agents that handle design, code generation, assembly, and review, with the key step of defining attachment points and joint parameters at the very start rather than after geometry exists. Validation steps and a rollback mechanism catch errors at code or design level, while an accumulating experience store lets the system improve on repeated tasks. The result is usable output for conceptual design, physical builds, and AI training data export.

Core claim

ArtiCAD is the first training-free multi-agent system capable of generating editable, articulated CAD assemblies directly from text or images. It divides the task among Design, Generation, Assembly, and Review agents, predicts assembly relationships via a Connector during the initial design stage to bypass LLM spatial reasoning limits, applies validation and cross-stage rollback for error correction, and maintains a self-evolving experience store for ongoing improvement.

What carries the argument

The Connector object, which explicitly records attachment points and joint parameters and is predicted in the design stage before any geometry code is written.

If this is right

Requirement-driven conceptual design becomes possible for products with moving parts.
Generated assemblies can be exported for physical prototyping workflows.
URDF export supplies ready training assets for embodied AI simulation.
Repeated use improves future outputs through the self-evolving experience store.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The early-connector pattern could apply to other code-generation tasks that need spatial or relational structure.
Combining the rollback mechanism with external CAD validation libraries would allow fully automatic repair loops.
The experience store could be seeded with domain-specific templates to accelerate adoption in narrow industries.

Load-bearing premise

Large language and vision models, when given agent roles and early connector instructions, will produce correct geometry code and joint settings without spatial reasoning errors.

What would settle it

Run a prompt requiring two parts to join at a precise offset or angle; inspect whether the generated model contains a valid joint parameter and remains editable in a CAD tool without manual fixes.

Figures

Figures reproduced from arXiv: 2604.10992 by Dong Xu, Jing Zhang, Juncheng Hu, Qian Yu, Yandong Guan, Yuan Shui, Zhanwei Zhang.

**Figure 1.** Figure 1: Top: CAD Assemblies generated by ArtiCAD across three task categories: Static, Articulated, and Industrial. All outputs are editable. Bottom: An example application. Given a user requirement, ArtiCAD generates an articulated CAD assembly with functional components (e.g., an enclosure, rods/handles, and player pieces). The components are then fabricated using a 3D printer (Bambu Lab P1S) and assembled int… view at source ↗

**Figure 2.** Figure 2: Early vs. late assembly relationship prediction. Top: early prediction (ours) specifies connectors at design time; assembly reduces to deterministic frame alignment. Bottom: deferring connection decisions to assembly stage forces a second planning pass that must parse all generated code, infer coordinate systems, and resolve crosspart dimensions—a task with long context and high failure rate. stage. Simil… view at source ↗

**Figure 3.** Figure 3: The five core kinematic joint types utilized in ArtiCAD. Each joint connects two parts at a shared coordinate frame; the specific degrees of freedom (DOF) constraints determine the allowed relative motion, which is subsequently resolved by FreeCAD’s Assembly solver. Representation (B-rep) solid. This solid comprises a set of topological entities Ti (e.g., faces, edges, vertices). As will be detailed in Sec… view at source ↗

**Figure 4.** Figure 4: Overview of the ArtiCAD pipeline. A Design Agent decomposes multimodal input into components and connectors; Generation Agents generate per-part FreeCAD scripts through a generate–execute–repair loop with VLM validation; a deterministic Assembly Agent aligns parts and verifies the result via VLM and LLM judges; a Review Agent scores the output and records the case into the partitioned experience store. Cro… view at source ↗

**Figure 5.** Figure 5: Qualitative results comparing ArtiCAD with Single-VLM Loop on our bench. 5.4 Comparison with Articulated Object Methods We compare ArtiCAD against three representative articulated object methods on the ACD dataset [19]: first, SINGAPO [34] predicts part attributes and kinematics from a single image via diffusion, subsequently assembling the object through mesh retrieval; second, Articulate-Anything [26] em… view at source ↗

**Figure 6.** Figure 6: Qualitative comparisons between ArtiCAD and SINGAPO, ArticulateAnything, and PAct on the ACD dataset. Black arrows indicate prismatic (translational) joints, and red arrows indicate revolute (rotational) joints. as a hollowed-out component, whereas the baseline often collapses it into a solid block, ignoring expected manufacturing structures. Furthermore, for articulated objects, our results exhibit more… view at source ↗

**Figure 7.** Figure 7: URDF export verification for embodied AI applications. Top: exported assemblies loaded in Robot Viewer. Bottom: the same models with joint coordinate frames visualized. The exported URDFs preserve the intended joint structure, axis directions, and motion limits. 6 Applications Since ArtiCAD generates parametric assemblies with typed joints and motion limits, its outputs serve use cases beyond static 3D co… view at source ↗

read the original abstract

Parametric Computer-Aided Design (CAD) of articulated assemblies is essential for product development, yet generating these multi-part, movable models from high-level descriptions remains unexplored. To address this, we propose ArtiCAD, the first training-free multi-agent system capable of generating editable, articulated CAD assemblies directly from text or images. Our system divides this complex task among four specialized agents: Design, Generation, Assembly, and Review. One of our key insights is to predict assembly relationships during the initial design stage rather than the assembly stage. By utilizing a Connector that explicitly defines attachment points and joint parameters, ArtiCAD determines these relationships before geometry generation, effectively bypassing the limited spatial reasoning capabilities of current LLMs and VLMs. To further ensure high-quality outputs, we introduce validation steps in the generation and assembly stages, accompanied by a cross-stage rollback mechanism that accurately isolates and corrects design- and code-level errors. Additionally, a self-evolving experience store accumulates design knowledge to continuously improve performance on future tasks. Extensive evaluations on three datasets (ArtiCAD-Bench, CADPrompt, and ACD) validate the effectiveness of our approach. We further demonstrate the applicability of ArtiCAD in requirement-driven conceptual design, physical prototyping, and the generation of embodied AI training assets through URDF export.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ArtiCAD's multi-agent setup with early Connector prediction is a reasonable engineering attempt at articulated CAD generation, but the bypass of spatial reasoning looks shaky and the results need scrutiny.

read the letter

ArtiCAD presents a multi-agent approach to creating articulated CAD models from natural language or images, using four agents and an upfront Connector to specify joints and attachments. What stands out as new is the training-free setup with early assembly relationship prediction via the Connector, combined with validation, cross-stage rollback, and a self-evolving experience store. This targets the gap in generating editable, movable assemblies, unlike previous single-agent or non-articulated CAD generators. The paper does well in outlining a structured workflow that divides the task logically and incorporates error correction mechanisms to handle LLM shortcomings. The soft spot is the reliance on the Connector to bypass spatial reasoning. Predicting attachment points and joint parameters from text or a single image still calls for 3D spatial understanding and collision avoidance, which the authors identify as a limitation of current LLMs and VLMs. An incorrect Connector would propagate errors downstream, and the rollback might only address local issues rather than overall kinematic consistency. No ablations are mentioned that isolate the Connector's accuracy or compare against later-stage prediction, so the effectiveness of this bypass remains unclear. The evaluations on ArtiCAD-Bench, CADPrompt, and ACD are referenced, but without quantitative details or failure analyses, it's difficult to assess real-world performance. This paper would interest researchers in AI for design, robotics simulation, and multi-agent systems. A reader focused on practical applications of LLMs in engineering could pick up useful ideas on agent specialization and iterative refinement. It deserves a serious referee because the core idea is original and the problem has clear applications, though the results section will need close examination. I would recommend putting it through peer review rather than desk rejecting it.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces ArtiCAD, a training-free multi-agent system with Design, Generation, Assembly, and Review agents that generates editable, parametric, articulated CAD assemblies from text or image inputs. The central technical claim is that early prediction of a Connector (attachment points and joint parameters) during the Design stage bypasses the spatial-reasoning limitations of current LLMs and VLMs; this is augmented by cross-stage validation, rollback for error correction, and a self-evolving experience store. Evaluations are reported on ArtiCAD-Bench, CADPrompt, and ACD, with demonstrations for conceptual design, physical prototyping, and URDF export for embodied AI.

Significance. If the bypass mechanism and end-to-end correctness hold, the work would constitute a practical engineering advance in automated CAD generation by enabling complex articulated models without task-specific training or fine-tuning. The multi-agent decomposition, explicit Connector abstraction, and iterative rollback are reusable ideas that could transfer to other parametric design tasks; the URDF export path is a concrete strength for downstream robotics applications.

major comments (3)

[§3.2] §3.2 (Design Agent) and §3.1 (Connector definition): The assertion that early Connector prediction 'effectively bypasses' spatial-reasoning failures is not substantiated. Determining collision-free attachment points and kinematic parameters from text or a single image still requires 3D spatial inference—the exact capability the paper states current VLMs lack. No evidence is given that the Design agent succeeds at this step where later-stage assembly would fail.
[§4.3] §4.3 (Ablation studies) and §5 (Quantitative results): No ablation isolates Connector prediction accuracy from overall success rate, nor compares early versus late Connector prediction. Without this, the load-bearing claim that the Design-stage placement is the key enabler remains untested; downstream validation/rollback can only detect local syntactic or geometric errors, not global kinematic inconsistency.
[§5] §5 (Evaluation and failure analysis): The reported metrics on the three datasets are not accompanied by per-category error breakdowns or qualitative failure cases for articulated motion (e.g., joint axis misalignment, inter-part collisions after assembly). This makes it impossible to assess whether the rollback mechanism actually corrects the spatial issues the authors identify as central.

minor comments (3)

[§3.1] The Connector data structure is introduced informally; a concise formal specification (fields, constraints, serialization) would improve reproducibility.
[Figures 3-5] Figure captions and legends should explicitly label which elements correspond to predicted Connectors versus generated geometry.
[§3.4] A few sentences clarifying how the self-evolving experience store is initialized and updated (e.g., what constitutes a successful experience) would remove ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which helps us strengthen the presentation of our contributions. We respond to each major comment below, agreeing where the evaluation can be improved and outlining specific revisions.

read point-by-point responses

Referee: [§3.2] §3.2 (Design Agent) and §3.1 (Connector definition): The assertion that early Connector prediction 'effectively bypasses' spatial-reasoning failures is not substantiated. Determining collision-free attachment points and kinematic parameters from text or a single image still requires 3D spatial inference—the exact capability the paper states current VLMs lack. No evidence is given that the Design agent succeeds at this step where later-stage assembly would fail.

Authors: We agree that the manuscript would benefit from more direct evidence for the bypass claim. The current argument rests on the architectural choice of specifying Connectors (attachment points and joint parameters) explicitly in the Design stage before any geometry is generated, which is intended to avoid implicit 3D spatial reasoning during assembly. However, we did not provide a head-to-head comparison against a late-prediction baseline. In the revised manuscript we will add such a comparison, measuring success rates when Connector prediction is performed early versus deferred to the Assembly stage. revision: yes
Referee: [§4.3] §4.3 (Ablation studies) and §5 (Quantitative results): No ablation isolates Connector prediction accuracy from overall success rate, nor compares early versus late Connector prediction. Without this, the load-bearing claim that the Design-stage placement is the key enabler remains untested; downstream validation/rollback can only detect local syntactic or geometric errors, not global kinematic inconsistency.

Authors: The referee is correct that the existing ablations do not isolate the timing of Connector prediction. We will add a new ablation study that separately reports Connector prediction accuracy and directly compares the early-prediction pipeline against a late-prediction variant. This will allow readers to assess whether early placement contributes to avoiding global kinematic inconsistencies beyond what validation and rollback can correct. revision: yes
Referee: [§5] §5 (Evaluation and failure analysis): The reported metrics on the three datasets are not accompanied by per-category error breakdowns or qualitative failure cases for articulated motion (e.g., joint axis misalignment, inter-part collisions after assembly). This makes it impossible to assess whether the rollback mechanism actually corrects the spatial issues the authors identify as central.

Authors: We acknowledge the value of more granular failure analysis. The current section reports aggregate success rates and selected qualitative examples but does not provide systematic per-category breakdowns for articulated-motion errors. In the revision we will include error breakdowns by category (joint-axis misalignment, inter-part collisions, kinematic inconsistency, etc.) together with additional qualitative cases that illustrate both the failures and the corrections performed by the rollback mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering system with no derivations or fitted predictions

full rationale

The paper presents ArtiCAD as a training-free multi-agent architecture (Design, Generation, Assembly, Review agents plus Connector prediction and rollback) for text/image-to-articulated-CAD. No equations, parameters, or first-principles derivations appear; the early-Connector insight is an explicit design choice to address stated LLM spatial-reasoning limits rather than a result derived from or reducing to its own inputs. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing manner. Validation occurs on external datasets (ArtiCAD-Bench, CADPrompt, ACD), rendering the construction self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the unproven premise that LLMs/VLMs can execute the assigned agent roles reliably once the Connector abstraction is introduced; no free parameters are named, but the Connector itself functions as an invented structuring device whose effectiveness is asserted rather than derived.

axioms (1)

domain assumption Current large language and vision-language models possess sufficient code-generation and planning capability to implement the four-agent workflow when provided with the Connector abstraction.
Invoked in the description of how the Design agent predicts relationships before geometry generation.

invented entities (2)

Connector no independent evidence
purpose: Explicitly defines attachment points and joint parameters to bypass LLM spatial reasoning limits.
New data structure introduced to front-load assembly decisions; no independent evidence of its sufficiency is supplied in the abstract.
Self-evolving experience store no independent evidence
purpose: Accumulates design knowledge to improve future performance.
Memory mechanism whose update rules and retrieval are not detailed; effectiveness asserted without external validation.

pith-pipeline@v0.9.0 · 5537 in / 1500 out tokens · 47673 ms · 2026-05-10T15:32:34.592695+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

[1]

In: Proc

Alrashedy, K., Tambwekar, P., Zaidi, Z.H., Langwasser, M., Xu, W., Gombolay, M.: Generating CAD code with vision-language models for 3D designs. In: Proc. Int. Conf. Learn. Represent. (2025) 2, 4, 5, 10, 12, 13

work page 2025
[2]

Accessed: 2026-03-05 11

Anthropic: Claude Opus 4.6 system card.https://www- cdn.anthropic.com/ 0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf(February 2026), system card listed as February 2026. Accessed: 2026-03-05 11

work page 2026
[3]

In: Proc

Asai, A., Wu, Z., Wang, Y., Sil, A., Hajishirzi, H.: Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In: Proc. Int. Conf. Learn. Represent. (2024) 5

work page 2024
[4]

CadQuery Contributors: CadQuery: A python parametric CAD scripting frame- work based on OCCT.https://github.com/CadQuery/cadquery(2024), accessed: 2026-02-17 5

work page 2024
[5]

In: Proc

Cao, Z., Hong, F., Chen, Z., Pan, L., Liu, Z.: Simulation-ready physical 3D assets from single image. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2026) 2, 5

work page 2026
[6]

In: Proc

Chen, C., Wei, J., Chen, T., Zhang, C., Yang, X., Zhang, S., Yang, B., Foo, C.S., Lin, G., Huang, Q., Liu, F.: CADCrafter: Generating computer-aided design mod- els from unconstrained images. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. pp. 11073–11082 (2025) 4, 5

work page 2025
[7]

In: Proc

Chen, D., Chen, R., Zhang, S., Liu, Y., Wang, Y., Zhou, H., Zhang, Q., Zhou, P., Wan, Y., Sun, L.: MLLM-as-a-judge: Assessing multimodal LLM-as-a-judge with vision-language benchmark. In: Proc. Int. Conf. Mach. Learn. pp. 6562–6595 (2024) 5, 11

work page 2024
[8]

In: Proc

Chen, X., Lin, M., Schärli, N., Zhou, D.: Teaching large language models to self- debug. In: Proc. Int. Conf. Learn. Represent. (2024) 5

work page 2024
[9]

In: Proc

Dupont, E., Cherenkova, K., Mallis, D., Gusev, G., Kacem, A., Aouada, D.: Tran- sCAD: A hierarchical transformer for CAD sequence inference from point clouds. In: Proc. Eur. Conf. Comput. Vis. pp. 19–36 (2024) 4

work page 2024
[10]

Elistratov,M.,Barannikov,M.,Ivanov,G.,Khrulkov,V.,Konushin,A.,Kuznetsov, A., Zhemchuzhnikov, D.: Cadevolve: Creating realistic cad via program evolution (2026) 4

work page 2026
[11]

Integrated Computer-Aided Engineering32(2025) 4 ArtiCAD 17

Fan, R., He, F., Liu, Y., Song, Y., Fan, L., Yan, X.: A parametric and feature- based CAD dataset to support human-computer interaction for advanced 3D shape learning. Integrated Computer-Aided Engineering32(2025) 4 ArtiCAD 17

work page 2025
[12]

Fan, Z.: Robot viewer: A web-based URDF visualizer.https://github.com/fan- ziqi/robot_viewer(2024), accessed: 2026-03-10 15

work page 2024
[13]

freecad.org/(2024), version 1.0

FreeCAD Community: FreeCAD: Your own 3D parametric modeler.https://www. freecad.org/(2024), version 1.0. Accessed: 2026-02-17 3, 5

work page 2024
[14]

com/deepmind-media/Model-Cards/Gemini-3-Flash-Model-Card.pdf(Decem- ber 2025), published December 2025

Google DeepMind: Gemini 3 Flash model card.https://storage.googleapis. com/deepmind-media/Model-Cards/Gemini-3-Flash-Model-Card.pdf(Decem- ber 2025), published December 2025. Accessed: 2026-03-05 10, 11

work page 2025
[15]

com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf(December 2025), model card update: December 2025

Google DeepMind: Gemini 3 Pro model card.https://storage.googleapis. com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf(December 2025), model card update: December 2025. Accessed: 2026-03-05 11

work page 2025
[16]

Govindarajan, P., Baldelli, D., Pathak, J., Fournier, Q., Chandar, S.: CADmium: Fine-tuning code language models for text-driven sequential CAD design. Trans. Mach. Learn. Res. (2026) 2, 4

work page 2026
[17]

CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

Guan, Y., Wang, X., Ming, X., Zhang, J., Xu, D., Yu, Q.: CAD-coder: Text- to-CAD generation with chain-of-thought and geometric reward. arXiv preprint arXiv:2505.19713 (2025) 1, 2, 4, 5

work page internal anchor Pith review arXiv 2025
[18]

In: Proc

Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., Zhou, L., Ran, C., Xiao, L., Wu, C., Schmidhuber, J.: MetaGPT: Meta programming for a multi-agent collaborative framework. In: Proc. Int. Conf. Learn. Represent. (2024) 5

work page 2024
[19]

In: Proc

Iliash, D., Jiang, H., Zhang, Y., Savva, M., Chang, A.X.: S2O: Static to openable enhancement for articulated 3D objects. In: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (2026) 4, 5, 10, 13

work page 2026
[20]

In: Proc

Jiang, Z., Hsu, C.C., Zhu, Y.: Ditto: Building digital twins of articulated objects from interaction. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. pp. 5616–5626 (2022) 5

work page 2022
[21]

IEEE Trans

Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data7(3), 535–547 (2019) 10

work page 2019
[22]

ACM Trans

Jones, B., Hildreth, D., Chen, D., Baran, I., Kim, V.G., Schulz, A.: AutoMate: A dataset and learning approach for automatic mating of CAD assemblies. ACM Trans. Graph.40(6), 1–18 (2021) 4

work page 2021
[23]

In: Proc

Khan, M.S., Dupont, E., Ali, S.A., Cherenkova, K., Kacem, A., Aouada, D.: CAD- SIGNet: CADlanguage inference frompoint clouds using layer-wisesketch instance guided attention. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. pp. 4713–4722 (2024) 4

work page 2024
[24]

Khan, M.S., Sinha, S., Uddin, T., Stricker, D., Ali, S.A., Afzal, M.Z.: Text2CAD: Generating sequential CAD designs from beginner-to-expert level text prompts. In: Adv. Neural Inform. Process. Syst. vol. 37, pp. 7552–7579 (2024) 2, 4

work page 2024
[25]

cadrille: Multi-modal cad reconstruc- tion with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025

Kolodiazhnyi, M., Tarasov, D., Zhemchuzhnikov, D., Nikulin, A., Zisman, I., Vorontsova, A., Konushin, A., Kurenkov, V., Rukhovich, D.: Cadrille: Multi- modal CAD reconstruction with online reinforcement learning. arXiv preprint arXiv:2505.22914 (2025) 4, 5

work page arXiv 2025
[26]

In: Proc

Le, L., Xie, J., Liang, W., Wang, H.J., Yang, Y., Ma, Y.J., Vedder, K., Krishna, A., Jayaraman, D., Eaton, E.: Articulate-anything: Automatic modeling of artic- ulated objects via a vision-language foundation model. In: Proc. Int. Conf. Learn. Represent. (2025) 2, 5, 13, 14

work page 2025
[27]

Le, T., Nguyen, K., Huang, B., Ta, T.D., Nguyen, A.: Cadknitter: Compositional cad generation from text and geometry guidance (2025) 4

work page 2025
[28]

Lei, J., Deng, C., Shen, W.B., Guibas, L.J., Daniilidis, K.: NAP: Neural 3D artic- ulated object prior. In: Adv. Neural Inform. Process. Syst. vol. 36 (2023) 5 18 Y. Shui et al

work page 2023
[29]

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Adv. Neural Inform. Process. Syst. vol. 33, pp. 9459–9474 (2020) 5

work page 2020
[30]

In: Proc

Li, J., Ma, W., Li, X., Lou, Y., Zhou, G., Zhou, X.: CAD-llama: Leveraging large language models for computer-aided design parametric 3D model generation. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. pp. 18563–18573 (2025) 4

work page 2025
[31]

Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek.arXiv preprint arXiv:2505.17702, 2025c

Li,X.,Li,J.,Song,Y.,Lou,Y.,Zhou,X.:Seek-CAD:Aself-refinedgenerativemod- eling for 3D parametric CAD using local inference via DeepSeek. arXiv preprint arXiv:2505.17702 (2025) 2, 4, 12, 13

work page arXiv 2025
[32]

In: Proc

Liang,F.,Zhao,H.,Quan,Y.,Fang,W.,Shi,C.:Customizinggraphneuralnetwork for CAD assembly recommendation. In: Proc. ACM SIGKDD Conf. Knowl. Discov. Data Mining. pp. 1746–1757 (2024) 4

work page 2024
[33]

In: Proc

Liu, J., Mahdavi-Amiri, A., Savva, M.: PARIS: Part-level reconstruction and mo- tion analysis for articulated objects. In: Proc. IEEE/CVF Int. Conf. Comput. Vis. pp. 352–363 (2023) 5

work page 2023
[34]

In: Proc

Liu, J., Zhan, D., Wang, Q., Shao, P., Liu, S., Kuo, T.Y., Savva, M.: SINGAPO: Single image controlled generation of articulated parts in objects. In: Proc. Int. Conf. Learn. Represent. (2025) 2, 5, 13, 14

work page 2025
[35]

Pact: Part-decomposed single-view articulated object genera- tion,

Liu, Q., Yao, X., Zhang, S., Deng, Y., Liu, G., Liu, Z., Jia, K.: PAct: Part-decomposed single-view articulated object generation. arXiv preprint arXiv:2602.14965 (2026) 2, 5, 13, 14

work page arXiv 2026
[36]

In: Proc

Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., Zhu, C.: G-Eval: NLG evaluation using GPT-4 with better human alignment. In: Proc. Conf. Empirical Methods Natural Language Process. pp. 2511–2522 (2023) 11

work page 2023
[37]

In: Proc

Liu, Y., Jia, B., Lu, R., Ni, J., Zhu, S.C., Huang, S.: Building interactable replicas of complex articulated objects via Gaussian splatting. In: Proc. Int. Conf. Learn. Represent. (2025) 2, 5

work page 2025
[38]

Computer-Aided Design188, 103926 (2025).https://doi

Lv, C., Bao, J.: Cadinstruct: A multimodal dataset for natural language-guided cad program synthesis. Computer-Aided Design188, 103926 (2025).https://doi. org/10.1016/j.cad.2025.1039264

work page doi:10.1016/j.cad.2025.1039264 2025
[39]

In: Proc

Mo, K., Zhu, S., Chang, A.X., Yi, L., Tripathi, S., Guibas, L.J., Su, H.: PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object un- derstanding. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2019) 5

work page 2019
[40]

NVIDIA: Nvidia isaac sim.https://developer.nvidia.com/isaac/sim, accessed: 2026-04-12 15

work page 2026
[41]

Accessed: 2026-03-05 11

OpenAI: Update to GPT-5 system card: GPT-5.2.https://cdn.openai.com/ pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf(De- cember 2025), published December 11, 2025. Accessed: 2026-03-05 11

work page 2025
[42]

https://github.com/openscad/openscad(2024), accessed: 2026-02-17 5

OpenSCAD Contributors: OpenSCAD: The programmers solid 3D CAD modeller. https://github.com/openscad/openscad(2024), accessed: 2026-02-17 5

work page 2024
[43]

arXiv preprint arXiv:2510.11631 (2025) 2, 4

Preintner, T., Yuan, W., König, A., Bäck, T., Raponi, E., van Stein, N.: EvoCAD: Evolutionary CAD code generation with vision language models. arXiv preprint arXiv:2510.11631 (2025) 2, 4

work page arXiv 2025
[44]

In: Proc

Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., Yang, C., Chen, W., Su, Y., Cong, X., et al.: ChatDev: Communicative agents for software development. In: Proc. 62nd Annu. Meet. Assoc. Comput. Linguist. pp. 15174–15186 (2024) 5

work page 2024
[45]

In: Proc

Rukhovich, D., Dupont, E., Mallis, D., Cherenkova, K., Kacem, A., Aouada, D.: CAD-recode: Reverse engineering CAD code from point clouds. In: Proc. IEEE/CVF Int. Conf. Comput. Vis. pp. 9801–9811 (2025) 2, 4, 5 ArtiCAD 19

work page 2025
[46]

In: Proc

Shen, L., Zhang, S., Li, H., Yang, P., Huang, Z., Zhang, Z., Zhao, H.: GaussianArt: Unified modeling of geometry and motion for articulated objects. In: Proc. Int. Conf. 3D Vision (3DV) (2026) 2, 5

work page 2026
[47]

Shinn,N.,Cassano,F.,Gopinath,A.,Narasimhan,K.,Yao,S.:Reflexion:Language agents with verbal reinforcement learning. In: Adv. Neural Inform. Process. Syst. vol. 36, pp. 8634–8652 (2023) 5

work page 2023
[48]

In: Proc

Wang, S., Chen, C., Le, X., Xu, Q., Xu, L., Zhang, Y., Yang, J.: CAD-GPT: Syn- thesising CAD construction sequence with spatial reasoning-enhanced multimodal LLMs. In: Proc. AAAI Conf. Artif. Intell. vol. 39, pp. 7880–7888 (2025) 1, 4

work page 2025
[49]

In: Proc

Wang, X., Chen, Y., Yuan, L., Zhang, Y., Li, Y., Peng, H., Ji, H.: Executable code actions elicit better LLM agents. In: Proc. Int. Conf. Mach. Learn. (2024) 5

work page 2024
[50]

In: Proc

Willis, K.D., Jayaraman, P.K., Chu, H., Tian, Y., Li, Y., Grandi, D., Sanghi, A., Tran, L., Lambourne, J.G., Solar-Lezama, A., Matusik, W.: JoinABLe: Learning bottom-up assembly of parametric CAD joints. In: Proc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog. pp. 15828–15839 (2022) 4, 10

work page 2022
[51]

ACM Trans

Willis, K.D., Pu, Y., Luo, J., Chu, H., Du, T., Lambourne, J.G., Solar-Lezama, A., Matusik, W.: Fusion 360 gallery: A dataset and environment for programmatic CAD construction from human design sequences. ACM Trans. Graph.40(4), 1–24 (2021) 4

work page 2021
[52]

In: First Conference on Language Modeling (2024) 5

Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., et al.: AutoGen: Enabling next-gen LLM applications via multi-agent conversations. In: First Conference on Language Modeling (2024) 5

work page 2024
[53]

In: Proc

Wu, R., Xiao, C., Zheng, C.: DeepCAD: A deep generative network for computer- aided design models. In: Proc. IEEE/CVF Int. Conf. Comput. Vis. pp. 6772–6782 (2021) 4

work page 2021
[54]

In: Proc

Xiang, F., Qin, Y., Mo, K., Xia, Y., Zhu, H., Liu, F., Liu, M., Jiang, H., Yuan, Y., Wang, H., Yi, L., Chang, A.X., Guibas, L.J., Su, H.: SAPIEN: A simulated part- based interactive environment. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2020) 5, 15

work page 2020
[55]

arXiv:2505.06507 [cs.AI] https://arxiv.org/abs/2505.06507 Xiang Xu, Pradeep Kumar Jayaraman, Joseph G Lambourne, Karl DD Willis, and Yasutaka Furukawa

Xie, H., Ju, F.: Text-to-CadQuery: A new paradigm for CAD generation with scalable large model capabilities. arXiv preprint arXiv:2505.06507 (2025) 2, 4, 5

work page arXiv 2025
[56]

Cad-mllm: Unifying multimodality- conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954, 2024

Xu, J., Wang, C., Zhao, Z., Liu, W., Ma, Y., Gao, S.: CAD-MLLM: Uni- fying multimodality-conditioned CAD generation with MLLM. arXiv preprint arXiv:2411.04954 (2024) 1, 4

work page arXiv 2024
[57]

In: Proc

Xu, X., Willis, K.D., Lambourne, J.G., Cheng, C.Y., Jayaraman, P.K., Furukawa, Y.: SkexGen: Autoregressive generation of CAD construction sequences with dis- entangled codebooks. In: Proc. Int. Conf. Mach. Learn. pp. 24698–24724 (2022) 4

work page 2022
[58]

In: Proc

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: ReAct: Synergizing reasoning and acting in language models. In: Proc. Int. Conf. Learn. Represent. (2023) 5

work page 2023
[59]

Yuan, Z., Lan, H., Zou, Q., Zhao, J.: 3D-PreMise: Can large language models generate 3D shapes with sharp features and parametric control? arXiv preprint arXiv:2401.06437 (2024) 12, 13

work page arXiv 2024
[60]

Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., Zhang, H., Gonzalez, J.E., Stoica, I.: Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. In: Adv. Neural Inform. Process. Syst. (2023) 11

work page 2023
[61]

arXiv preprint arXiv:2508.04002 (2025) 11

Zhou,Z.,Han,J.,Du,L.,Fang,N.,Qiu,L.,Zhang,S.:CAD-Judge:Towardefficient morphological grading and verification for text-to-CAD generation. arXiv preprint arXiv:2508.04002 (2025) 11

work page arXiv 2025