ArtiCAD: Articulated CAD Assembly Design via Multi-Agent Code Generation
Pith reviewed 2026-05-10 15:32 UTC · model grok-4.3
The pith
A training-free multi-agent system generates editable articulated CAD assemblies from text or images by predicting connectors early.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ArtiCAD is the first training-free multi-agent system capable of generating editable, articulated CAD assemblies directly from text or images. It divides the task among Design, Generation, Assembly, and Review agents, predicts assembly relationships via a Connector during the initial design stage to bypass LLM spatial reasoning limits, applies validation and cross-stage rollback for error correction, and maintains a self-evolving experience store for ongoing improvement.
What carries the argument
The Connector object, which explicitly records attachment points and joint parameters and is predicted in the design stage before any geometry code is written.
If this is right
- Requirement-driven conceptual design becomes possible for products with moving parts.
- Generated assemblies can be exported for physical prototyping workflows.
- URDF export supplies ready training assets for embodied AI simulation.
- Repeated use improves future outputs through the self-evolving experience store.
Where Pith is reading between the lines
- The early-connector pattern could apply to other code-generation tasks that need spatial or relational structure.
- Combining the rollback mechanism with external CAD validation libraries would allow fully automatic repair loops.
- The experience store could be seeded with domain-specific templates to accelerate adoption in narrow industries.
Load-bearing premise
Large language and vision models, when given agent roles and early connector instructions, will produce correct geometry code and joint settings without spatial reasoning errors.
What would settle it
Run a prompt requiring two parts to join at a precise offset or angle; inspect whether the generated model contains a valid joint parameter and remains editable in a CAD tool without manual fixes.
Figures
read the original abstract
Parametric Computer-Aided Design (CAD) of articulated assemblies is essential for product development, yet generating these multi-part, movable models from high-level descriptions remains unexplored. To address this, we propose ArtiCAD, the first training-free multi-agent system capable of generating editable, articulated CAD assemblies directly from text or images. Our system divides this complex task among four specialized agents: Design, Generation, Assembly, and Review. One of our key insights is to predict assembly relationships during the initial design stage rather than the assembly stage. By utilizing a Connector that explicitly defines attachment points and joint parameters, ArtiCAD determines these relationships before geometry generation, effectively bypassing the limited spatial reasoning capabilities of current LLMs and VLMs. To further ensure high-quality outputs, we introduce validation steps in the generation and assembly stages, accompanied by a cross-stage rollback mechanism that accurately isolates and corrects design- and code-level errors. Additionally, a self-evolving experience store accumulates design knowledge to continuously improve performance on future tasks. Extensive evaluations on three datasets (ArtiCAD-Bench, CADPrompt, and ACD) validate the effectiveness of our approach. We further demonstrate the applicability of ArtiCAD in requirement-driven conceptual design, physical prototyping, and the generation of embodied AI training assets through URDF export.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ArtiCAD, a training-free multi-agent system with Design, Generation, Assembly, and Review agents that generates editable, parametric, articulated CAD assemblies from text or image inputs. The central technical claim is that early prediction of a Connector (attachment points and joint parameters) during the Design stage bypasses the spatial-reasoning limitations of current LLMs and VLMs; this is augmented by cross-stage validation, rollback for error correction, and a self-evolving experience store. Evaluations are reported on ArtiCAD-Bench, CADPrompt, and ACD, with demonstrations for conceptual design, physical prototyping, and URDF export for embodied AI.
Significance. If the bypass mechanism and end-to-end correctness hold, the work would constitute a practical engineering advance in automated CAD generation by enabling complex articulated models without task-specific training or fine-tuning. The multi-agent decomposition, explicit Connector abstraction, and iterative rollback are reusable ideas that could transfer to other parametric design tasks; the URDF export path is a concrete strength for downstream robotics applications.
major comments (3)
- [§3.2] §3.2 (Design Agent) and §3.1 (Connector definition): The assertion that early Connector prediction 'effectively bypasses' spatial-reasoning failures is not substantiated. Determining collision-free attachment points and kinematic parameters from text or a single image still requires 3D spatial inference—the exact capability the paper states current VLMs lack. No evidence is given that the Design agent succeeds at this step where later-stage assembly would fail.
- [§4.3] §4.3 (Ablation studies) and §5 (Quantitative results): No ablation isolates Connector prediction accuracy from overall success rate, nor compares early versus late Connector prediction. Without this, the load-bearing claim that the Design-stage placement is the key enabler remains untested; downstream validation/rollback can only detect local syntactic or geometric errors, not global kinematic inconsistency.
- [§5] §5 (Evaluation and failure analysis): The reported metrics on the three datasets are not accompanied by per-category error breakdowns or qualitative failure cases for articulated motion (e.g., joint axis misalignment, inter-part collisions after assembly). This makes it impossible to assess whether the rollback mechanism actually corrects the spatial issues the authors identify as central.
minor comments (3)
- [§3.1] The Connector data structure is introduced informally; a concise formal specification (fields, constraints, serialization) would improve reproducibility.
- [Figures 3-5] Figure captions and legends should explicitly label which elements correspond to predicted Connectors versus generated geometry.
- [§3.4] A few sentences clarifying how the self-evolving experience store is initialized and updated (e.g., what constitutes a successful experience) would remove ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which helps us strengthen the presentation of our contributions. We respond to each major comment below, agreeing where the evaluation can be improved and outlining specific revisions.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Design Agent) and §3.1 (Connector definition): The assertion that early Connector prediction 'effectively bypasses' spatial-reasoning failures is not substantiated. Determining collision-free attachment points and kinematic parameters from text or a single image still requires 3D spatial inference—the exact capability the paper states current VLMs lack. No evidence is given that the Design agent succeeds at this step where later-stage assembly would fail.
Authors: We agree that the manuscript would benefit from more direct evidence for the bypass claim. The current argument rests on the architectural choice of specifying Connectors (attachment points and joint parameters) explicitly in the Design stage before any geometry is generated, which is intended to avoid implicit 3D spatial reasoning during assembly. However, we did not provide a head-to-head comparison against a late-prediction baseline. In the revised manuscript we will add such a comparison, measuring success rates when Connector prediction is performed early versus deferred to the Assembly stage. revision: yes
-
Referee: [§4.3] §4.3 (Ablation studies) and §5 (Quantitative results): No ablation isolates Connector prediction accuracy from overall success rate, nor compares early versus late Connector prediction. Without this, the load-bearing claim that the Design-stage placement is the key enabler remains untested; downstream validation/rollback can only detect local syntactic or geometric errors, not global kinematic inconsistency.
Authors: The referee is correct that the existing ablations do not isolate the timing of Connector prediction. We will add a new ablation study that separately reports Connector prediction accuracy and directly compares the early-prediction pipeline against a late-prediction variant. This will allow readers to assess whether early placement contributes to avoiding global kinematic inconsistencies beyond what validation and rollback can correct. revision: yes
-
Referee: [§5] §5 (Evaluation and failure analysis): The reported metrics on the three datasets are not accompanied by per-category error breakdowns or qualitative failure cases for articulated motion (e.g., joint axis misalignment, inter-part collisions after assembly). This makes it impossible to assess whether the rollback mechanism actually corrects the spatial issues the authors identify as central.
Authors: We acknowledge the value of more granular failure analysis. The current section reports aggregate success rates and selected qualitative examples but does not provide systematic per-category breakdowns for articulated-motion errors. In the revision we will include error breakdowns by category (joint-axis misalignment, inter-part collisions, kinematic inconsistency, etc.) together with additional qualitative cases that illustrate both the failures and the corrections performed by the rollback mechanism. revision: yes
Circularity Check
No circularity: engineering system with no derivations or fitted predictions
full rationale
The paper presents ArtiCAD as a training-free multi-agent architecture (Design, Generation, Assembly, Review agents plus Connector prediction and rollback) for text/image-to-articulated-CAD. No equations, parameters, or first-principles derivations appear; the early-Connector insight is an explicit design choice to address stated LLM spatial-reasoning limits rather than a result derived from or reducing to its own inputs. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing manner. Validation occurs on external datasets (ArtiCAD-Bench, CADPrompt, ACD), rendering the construction self-contained and non-circular.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Current large language and vision-language models possess sufficient code-generation and planning capability to implement the four-agent workflow when provided with the Connector abstraction.
invented entities (2)
-
Connector
no independent evidence
-
Self-evolving experience store
no independent evidence
Reference graph
Works this paper leans on
- [1]
-
[2]
Anthropic: Claude Opus 4.6 system card.https://www- cdn.anthropic.com/ 0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf(February 2026), system card listed as February 2026. Accessed: 2026-03-05 11
work page 2026
- [3]
-
[4]
CadQuery Contributors: CadQuery: A python parametric CAD scripting frame- work based on OCCT.https://github.com/CadQuery/cadquery(2024), accessed: 2026-02-17 5
work page 2024
- [5]
- [6]
- [7]
- [8]
- [9]
-
[10]
Elistratov,M.,Barannikov,M.,Ivanov,G.,Khrulkov,V.,Konushin,A.,Kuznetsov, A., Zhemchuzhnikov, D.: Cadevolve: Creating realistic cad via program evolution (2026) 4
work page 2026
-
[11]
Integrated Computer-Aided Engineering32(2025) 4 ArtiCAD 17
Fan, R., He, F., Liu, Y., Song, Y., Fan, L., Yan, X.: A parametric and feature- based CAD dataset to support human-computer interaction for advanced 3D shape learning. Integrated Computer-Aided Engineering32(2025) 4 ArtiCAD 17
work page 2025
-
[12]
Fan, Z.: Robot viewer: A web-based URDF visualizer.https://github.com/fan- ziqi/robot_viewer(2024), accessed: 2026-03-10 15
work page 2024
-
[13]
freecad.org/(2024), version 1.0
FreeCAD Community: FreeCAD: Your own 3D parametric modeler.https://www. freecad.org/(2024), version 1.0. Accessed: 2026-02-17 3, 5
work page 2024
-
[14]
Google DeepMind: Gemini 3 Flash model card.https://storage.googleapis. com/deepmind-media/Model-Cards/Gemini-3-Flash-Model-Card.pdf(Decem- ber 2025), published December 2025. Accessed: 2026-03-05 10, 11
work page 2025
-
[15]
Google DeepMind: Gemini 3 Pro model card.https://storage.googleapis. com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf(December 2025), model card update: December 2025. Accessed: 2026-03-05 11
work page 2025
-
[16]
Govindarajan, P., Baldelli, D., Pathak, J., Fournier, Q., Chandar, S.: CADmium: Fine-tuning code language models for text-driven sequential CAD design. Trans. Mach. Learn. Res. (2026) 2, 4
work page 2026
-
[17]
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Guan, Y., Wang, X., Ming, X., Zhang, J., Xu, D., Yu, Q.: CAD-coder: Text- to-CAD generation with chain-of-thought and geometric reward. arXiv preprint arXiv:2505.19713 (2025) 1, 2, 4, 5
work page internal anchor Pith review arXiv 2025
-
[18]
Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., Zhou, L., Ran, C., Xiao, L., Wu, C., Schmidhuber, J.: MetaGPT: Meta programming for a multi-agent collaborative framework. In: Proc. Int. Conf. Learn. Represent. (2024) 5
work page 2024
- [19]
- [20]
-
[21]
Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data7(3), 535–547 (2019) 10
work page 2019
- [22]
- [23]
-
[24]
Khan, M.S., Sinha, S., Uddin, T., Stricker, D., Ali, S.A., Afzal, M.Z.: Text2CAD: Generating sequential CAD designs from beginner-to-expert level text prompts. In: Adv. Neural Inform. Process. Syst. vol. 37, pp. 7552–7579 (2024) 2, 4
work page 2024
-
[25]
Kolodiazhnyi, M., Tarasov, D., Zhemchuzhnikov, D., Nikulin, A., Zisman, I., Vorontsova, A., Konushin, A., Kurenkov, V., Rukhovich, D.: Cadrille: Multi- modal CAD reconstruction with online reinforcement learning. arXiv preprint arXiv:2505.22914 (2025) 4, 5
- [26]
-
[27]
Le, T., Nguyen, K., Huang, B., Ta, T.D., Nguyen, A.: Cadknitter: Compositional cad generation from text and geometry guidance (2025) 4
work page 2025
-
[28]
Lei, J., Deng, C., Shen, W.B., Guibas, L.J., Daniilidis, K.: NAP: Neural 3D artic- ulated object prior. In: Adv. Neural Inform. Process. Syst. vol. 36 (2023) 5 18 Y. Shui et al
work page 2023
-
[29]
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Adv. Neural Inform. Process. Syst. vol. 33, pp. 9459–9474 (2020) 5
work page 2020
- [30]
-
[31]
Li,X.,Li,J.,Song,Y.,Lou,Y.,Zhou,X.:Seek-CAD:Aself-refinedgenerativemod- eling for 3D parametric CAD using local inference via DeepSeek. arXiv preprint arXiv:2505.17702 (2025) 2, 4, 12, 13
- [32]
- [33]
- [34]
-
[35]
Pact: Part-decomposed single-view articulated object genera- tion,
Liu, Q., Yao, X., Zhang, S., Deng, Y., Liu, G., Liu, Z., Jia, K.: PAct: Part-decomposed single-view articulated object generation. arXiv preprint arXiv:2602.14965 (2026) 2, 5, 13, 14
- [36]
- [37]
-
[38]
Computer-Aided Design188, 103926 (2025).https://doi
Lv, C., Bao, J.: Cadinstruct: A multimodal dataset for natural language-guided cad program synthesis. Computer-Aided Design188, 103926 (2025).https://doi. org/10.1016/j.cad.2025.1039264
- [39]
-
[40]
NVIDIA: Nvidia isaac sim.https://developer.nvidia.com/isaac/sim, accessed: 2026-04-12 15
work page 2026
-
[41]
OpenAI: Update to GPT-5 system card: GPT-5.2.https://cdn.openai.com/ pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf(De- cember 2025), published December 11, 2025. Accessed: 2026-03-05 11
work page 2025
-
[42]
https://github.com/openscad/openscad(2024), accessed: 2026-02-17 5
OpenSCAD Contributors: OpenSCAD: The programmers solid 3D CAD modeller. https://github.com/openscad/openscad(2024), accessed: 2026-02-17 5
work page 2024
-
[43]
arXiv preprint arXiv:2510.11631 (2025) 2, 4
Preintner, T., Yuan, W., König, A., Bäck, T., Raponi, E., van Stein, N.: EvoCAD: Evolutionary CAD code generation with vision language models. arXiv preprint arXiv:2510.11631 (2025) 2, 4
- [44]
- [45]
- [46]
-
[47]
Shinn,N.,Cassano,F.,Gopinath,A.,Narasimhan,K.,Yao,S.:Reflexion:Language agents with verbal reinforcement learning. In: Adv. Neural Inform. Process. Syst. vol. 36, pp. 8634–8652 (2023) 5
work page 2023
- [48]
- [49]
-
[50]
Willis, K.D., Jayaraman, P.K., Chu, H., Tian, Y., Li, Y., Grandi, D., Sanghi, A., Tran, L., Lambourne, J.G., Solar-Lezama, A., Matusik, W.: JoinABLe: Learning bottom-up assembly of parametric CAD joints. In: Proc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog. pp. 15828–15839 (2022) 4, 10
work page 2022
- [51]
-
[52]
In: First Conference on Language Modeling (2024) 5
Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., et al.: AutoGen: Enabling next-gen LLM applications via multi-agent conversations. In: First Conference on Language Modeling (2024) 5
work page 2024
- [53]
- [54]
-
[55]
Xie, H., Ju, F.: Text-to-CadQuery: A new paradigm for CAD generation with scalable large model capabilities. arXiv preprint arXiv:2505.06507 (2025) 2, 4, 5
-
[56]
Xu, J., Wang, C., Zhao, Z., Liu, W., Ma, Y., Gao, S.: CAD-MLLM: Uni- fying multimodality-conditioned CAD generation with MLLM. arXiv preprint arXiv:2411.04954 (2024) 1, 4
- [57]
- [58]
- [59]
-
[60]
Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., Zhang, H., Gonzalez, J.E., Stoica, I.: Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. In: Adv. Neural Inform. Process. Syst. (2023) 11
work page 2023
-
[61]
arXiv preprint arXiv:2508.04002 (2025) 11
Zhou,Z.,Han,J.,Du,L.,Fang,N.,Qiu,L.,Zhang,S.:CAD-Judge:Towardefficient morphological grading and verification for text-to-CAD generation. arXiv preprint arXiv:2508.04002 (2025) 11
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.