pith. machine review for the scientific record. sign in

arxiv: 2604.10992 · v2 · submitted 2026-04-13 · 💻 cs.CV

ArtiCAD: Articulated CAD Assembly Design via Multi-Agent Code Generation

Pith reviewed 2026-05-10 15:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords articulated CADmulti-agent systemcode generationCAD assemblytext-to-CADimage-to-CADconnector prediction
0
0 comments X

The pith

A training-free multi-agent system generates editable articulated CAD assemblies from text or images by predicting connectors early.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents ArtiCAD as a way to create multi-part, movable CAD models directly from high-level descriptions without any model training. It splits the work across four agents that handle design, code generation, assembly, and review, with the key step of defining attachment points and joint parameters at the very start rather than after geometry exists. Validation steps and a rollback mechanism catch errors at code or design level, while an accumulating experience store lets the system improve on repeated tasks. The result is usable output for conceptual design, physical builds, and AI training data export.

Core claim

ArtiCAD is the first training-free multi-agent system capable of generating editable, articulated CAD assemblies directly from text or images. It divides the task among Design, Generation, Assembly, and Review agents, predicts assembly relationships via a Connector during the initial design stage to bypass LLM spatial reasoning limits, applies validation and cross-stage rollback for error correction, and maintains a self-evolving experience store for ongoing improvement.

What carries the argument

The Connector object, which explicitly records attachment points and joint parameters and is predicted in the design stage before any geometry code is written.

If this is right

  • Requirement-driven conceptual design becomes possible for products with moving parts.
  • Generated assemblies can be exported for physical prototyping workflows.
  • URDF export supplies ready training assets for embodied AI simulation.
  • Repeated use improves future outputs through the self-evolving experience store.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The early-connector pattern could apply to other code-generation tasks that need spatial or relational structure.
  • Combining the rollback mechanism with external CAD validation libraries would allow fully automatic repair loops.
  • The experience store could be seeded with domain-specific templates to accelerate adoption in narrow industries.

Load-bearing premise

Large language and vision models, when given agent roles and early connector instructions, will produce correct geometry code and joint settings without spatial reasoning errors.

What would settle it

Run a prompt requiring two parts to join at a precise offset or angle; inspect whether the generated model contains a valid joint parameter and remains editable in a CAD tool without manual fixes.

Figures

Figures reproduced from arXiv: 2604.10992 by Dong Xu, Jing Zhang, Juncheng Hu, Qian Yu, Yandong Guan, Yuan Shui, Zhanwei Zhang.

Figure 1
Figure 1. Figure 1: Top: CAD Assemblies generated by ArtiCAD across three task categories: Static, Articulated, and Industrial. All outputs are editable. Bottom: An example application. Given a user requirement, ArtiCAD generates an articulated CAD assem￾bly with functional components (e.g., an enclosure, rods/handles, and player pieces). The components are then fabricated using a 3D printer (Bambu Lab P1S) and assem￾bled int… view at source ↗
Figure 2
Figure 2. Figure 2: Early vs. late assembly relationship prediction. Top: early prediction (ours) specifies connectors at design time; assembly reduces to deterministic frame alignment. Bottom: deferring connection decisions to assembly stage forces a second planning pass that must parse all generated code, infer coordinate systems, and resolve cross￾part dimensions—a task with long context and high failure rate. stage. Simil… view at source ↗
Figure 3
Figure 3. Figure 3: The five core kinematic joint types utilized in ArtiCAD. Each joint connects two parts at a shared coordinate frame; the specific degrees of freedom (DOF) constraints determine the allowed relative motion, which is subsequently resolved by FreeCAD’s Assembly solver. Representation (B-rep) solid. This solid comprises a set of topological entities Ti (e.g., faces, edges, vertices). As will be detailed in Sec… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the ArtiCAD pipeline. A Design Agent decomposes multimodal input into components and connectors; Generation Agents generate per-part FreeCAD scripts through a generate–execute–repair loop with VLM validation; a deterministic Assembly Agent aligns parts and verifies the result via VLM and LLM judges; a Review Agent scores the output and records the case into the partitioned experience store. Cro… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results comparing ArtiCAD with Single-VLM Loop on our bench. 5.4 Comparison with Articulated Object Methods We compare ArtiCAD against three representative articulated object methods on the ACD dataset [19]: first, SINGAPO [34] predicts part attributes and kinematics from a single image via diffusion, subsequently assembling the object through mesh retrieval; second, Articulate-Anything [26] em… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparisons between ArtiCAD and SINGAPO, Articulate￾Anything, and PAct on the ACD dataset. Black arrows indicate prismatic (transla￾tional) joints, and red arrows indicate revolute (rotational) joints. as a hollowed-out component, whereas the baseline often collapses it into a solid block, ignoring expected manufacturing structures. Furthermore, for articulated objects, our results exhibit more… view at source ↗
Figure 7
Figure 7. Figure 7: URDF export verification for embodied AI applications. Top: exported assem￾blies loaded in Robot Viewer. Bottom: the same models with joint coordinate frames visualized. The exported URDFs preserve the intended joint structure, axis directions, and motion limits. 6 Applications Since ArtiCAD generates parametric assemblies with typed joints and motion limits, its outputs serve use cases beyond static 3D co… view at source ↗
read the original abstract

Parametric Computer-Aided Design (CAD) of articulated assemblies is essential for product development, yet generating these multi-part, movable models from high-level descriptions remains unexplored. To address this, we propose ArtiCAD, the first training-free multi-agent system capable of generating editable, articulated CAD assemblies directly from text or images. Our system divides this complex task among four specialized agents: Design, Generation, Assembly, and Review. One of our key insights is to predict assembly relationships during the initial design stage rather than the assembly stage. By utilizing a Connector that explicitly defines attachment points and joint parameters, ArtiCAD determines these relationships before geometry generation, effectively bypassing the limited spatial reasoning capabilities of current LLMs and VLMs. To further ensure high-quality outputs, we introduce validation steps in the generation and assembly stages, accompanied by a cross-stage rollback mechanism that accurately isolates and corrects design- and code-level errors. Additionally, a self-evolving experience store accumulates design knowledge to continuously improve performance on future tasks. Extensive evaluations on three datasets (ArtiCAD-Bench, CADPrompt, and ACD) validate the effectiveness of our approach. We further demonstrate the applicability of ArtiCAD in requirement-driven conceptual design, physical prototyping, and the generation of embodied AI training assets through URDF export.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces ArtiCAD, a training-free multi-agent system with Design, Generation, Assembly, and Review agents that generates editable, parametric, articulated CAD assemblies from text or image inputs. The central technical claim is that early prediction of a Connector (attachment points and joint parameters) during the Design stage bypasses the spatial-reasoning limitations of current LLMs and VLMs; this is augmented by cross-stage validation, rollback for error correction, and a self-evolving experience store. Evaluations are reported on ArtiCAD-Bench, CADPrompt, and ACD, with demonstrations for conceptual design, physical prototyping, and URDF export for embodied AI.

Significance. If the bypass mechanism and end-to-end correctness hold, the work would constitute a practical engineering advance in automated CAD generation by enabling complex articulated models without task-specific training or fine-tuning. The multi-agent decomposition, explicit Connector abstraction, and iterative rollback are reusable ideas that could transfer to other parametric design tasks; the URDF export path is a concrete strength for downstream robotics applications.

major comments (3)
  1. [§3.2] §3.2 (Design Agent) and §3.1 (Connector definition): The assertion that early Connector prediction 'effectively bypasses' spatial-reasoning failures is not substantiated. Determining collision-free attachment points and kinematic parameters from text or a single image still requires 3D spatial inference—the exact capability the paper states current VLMs lack. No evidence is given that the Design agent succeeds at this step where later-stage assembly would fail.
  2. [§4.3] §4.3 (Ablation studies) and §5 (Quantitative results): No ablation isolates Connector prediction accuracy from overall success rate, nor compares early versus late Connector prediction. Without this, the load-bearing claim that the Design-stage placement is the key enabler remains untested; downstream validation/rollback can only detect local syntactic or geometric errors, not global kinematic inconsistency.
  3. [§5] §5 (Evaluation and failure analysis): The reported metrics on the three datasets are not accompanied by per-category error breakdowns or qualitative failure cases for articulated motion (e.g., joint axis misalignment, inter-part collisions after assembly). This makes it impossible to assess whether the rollback mechanism actually corrects the spatial issues the authors identify as central.
minor comments (3)
  1. [§3.1] The Connector data structure is introduced informally; a concise formal specification (fields, constraints, serialization) would improve reproducibility.
  2. [Figures 3-5] Figure captions and legends should explicitly label which elements correspond to predicted Connectors versus generated geometry.
  3. [§3.4] A few sentences clarifying how the self-evolving experience store is initialized and updated (e.g., what constitutes a successful experience) would remove ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which helps us strengthen the presentation of our contributions. We respond to each major comment below, agreeing where the evaluation can be improved and outlining specific revisions.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Design Agent) and §3.1 (Connector definition): The assertion that early Connector prediction 'effectively bypasses' spatial-reasoning failures is not substantiated. Determining collision-free attachment points and kinematic parameters from text or a single image still requires 3D spatial inference—the exact capability the paper states current VLMs lack. No evidence is given that the Design agent succeeds at this step where later-stage assembly would fail.

    Authors: We agree that the manuscript would benefit from more direct evidence for the bypass claim. The current argument rests on the architectural choice of specifying Connectors (attachment points and joint parameters) explicitly in the Design stage before any geometry is generated, which is intended to avoid implicit 3D spatial reasoning during assembly. However, we did not provide a head-to-head comparison against a late-prediction baseline. In the revised manuscript we will add such a comparison, measuring success rates when Connector prediction is performed early versus deferred to the Assembly stage. revision: yes

  2. Referee: [§4.3] §4.3 (Ablation studies) and §5 (Quantitative results): No ablation isolates Connector prediction accuracy from overall success rate, nor compares early versus late Connector prediction. Without this, the load-bearing claim that the Design-stage placement is the key enabler remains untested; downstream validation/rollback can only detect local syntactic or geometric errors, not global kinematic inconsistency.

    Authors: The referee is correct that the existing ablations do not isolate the timing of Connector prediction. We will add a new ablation study that separately reports Connector prediction accuracy and directly compares the early-prediction pipeline against a late-prediction variant. This will allow readers to assess whether early placement contributes to avoiding global kinematic inconsistencies beyond what validation and rollback can correct. revision: yes

  3. Referee: [§5] §5 (Evaluation and failure analysis): The reported metrics on the three datasets are not accompanied by per-category error breakdowns or qualitative failure cases for articulated motion (e.g., joint axis misalignment, inter-part collisions after assembly). This makes it impossible to assess whether the rollback mechanism actually corrects the spatial issues the authors identify as central.

    Authors: We acknowledge the value of more granular failure analysis. The current section reports aggregate success rates and selected qualitative examples but does not provide systematic per-category breakdowns for articulated-motion errors. In the revision we will include error breakdowns by category (joint-axis misalignment, inter-part collisions, kinematic inconsistency, etc.) together with additional qualitative cases that illustrate both the failures and the corrections performed by the rollback mechanism. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering system with no derivations or fitted predictions

full rationale

The paper presents ArtiCAD as a training-free multi-agent architecture (Design, Generation, Assembly, Review agents plus Connector prediction and rollback) for text/image-to-articulated-CAD. No equations, parameters, or first-principles derivations appear; the early-Connector insight is an explicit design choice to address stated LLM spatial-reasoning limits rather than a result derived from or reducing to its own inputs. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing manner. Validation occurs on external datasets (ArtiCAD-Bench, CADPrompt, ACD), rendering the construction self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the unproven premise that LLMs/VLMs can execute the assigned agent roles reliably once the Connector abstraction is introduced; no free parameters are named, but the Connector itself functions as an invented structuring device whose effectiveness is asserted rather than derived.

axioms (1)
  • domain assumption Current large language and vision-language models possess sufficient code-generation and planning capability to implement the four-agent workflow when provided with the Connector abstraction.
    Invoked in the description of how the Design agent predicts relationships before geometry generation.
invented entities (2)
  • Connector no independent evidence
    purpose: Explicitly defines attachment points and joint parameters to bypass LLM spatial reasoning limits.
    New data structure introduced to front-load assembly decisions; no independent evidence of its sufficiency is supplied in the abstract.
  • Self-evolving experience store no independent evidence
    purpose: Accumulates design knowledge to improve future performance.
    Memory mechanism whose update rules and retrieval are not detailed; effectiveness asserted without external validation.

pith-pipeline@v0.9.0 · 5537 in / 1500 out tokens · 47673 ms · 2026-05-10T15:32:34.592695+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 1 internal anchor

  1. [1]

    In: Proc

    Alrashedy, K., Tambwekar, P., Zaidi, Z.H., Langwasser, M., Xu, W., Gombolay, M.: Generating CAD code with vision-language models for 3D designs. In: Proc. Int. Conf. Learn. Represent. (2025) 2, 4, 5, 10, 12, 13

  2. [2]

    Accessed: 2026-03-05 11

    Anthropic: Claude Opus 4.6 system card.https://www- cdn.anthropic.com/ 0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf(February 2026), system card listed as February 2026. Accessed: 2026-03-05 11

  3. [3]

    In: Proc

    Asai, A., Wu, Z., Wang, Y., Sil, A., Hajishirzi, H.: Self-RAG: Learning to retrieve, generate, and critique through self-reflection. In: Proc. Int. Conf. Learn. Represent. (2024) 5

  4. [4]

    CadQuery Contributors: CadQuery: A python parametric CAD scripting frame- work based on OCCT.https://github.com/CadQuery/cadquery(2024), accessed: 2026-02-17 5

  5. [5]

    In: Proc

    Cao, Z., Hong, F., Chen, Z., Pan, L., Liu, Z.: Simulation-ready physical 3D assets from single image. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2026) 2, 5

  6. [6]

    In: Proc

    Chen, C., Wei, J., Chen, T., Zhang, C., Yang, X., Zhang, S., Yang, B., Foo, C.S., Lin, G., Huang, Q., Liu, F.: CADCrafter: Generating computer-aided design mod- els from unconstrained images. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. pp. 11073–11082 (2025) 4, 5

  7. [7]

    In: Proc

    Chen, D., Chen, R., Zhang, S., Liu, Y., Wang, Y., Zhou, H., Zhang, Q., Zhou, P., Wan, Y., Sun, L.: MLLM-as-a-judge: Assessing multimodal LLM-as-a-judge with vision-language benchmark. In: Proc. Int. Conf. Mach. Learn. pp. 6562–6595 (2024) 5, 11

  8. [8]

    In: Proc

    Chen, X., Lin, M., Schärli, N., Zhou, D.: Teaching large language models to self- debug. In: Proc. Int. Conf. Learn. Represent. (2024) 5

  9. [9]

    In: Proc

    Dupont, E., Cherenkova, K., Mallis, D., Gusev, G., Kacem, A., Aouada, D.: Tran- sCAD: A hierarchical transformer for CAD sequence inference from point clouds. In: Proc. Eur. Conf. Comput. Vis. pp. 19–36 (2024) 4

  10. [10]

    Elistratov,M.,Barannikov,M.,Ivanov,G.,Khrulkov,V.,Konushin,A.,Kuznetsov, A., Zhemchuzhnikov, D.: Cadevolve: Creating realistic cad via program evolution (2026) 4

  11. [11]

    Integrated Computer-Aided Engineering32(2025) 4 ArtiCAD 17

    Fan, R., He, F., Liu, Y., Song, Y., Fan, L., Yan, X.: A parametric and feature- based CAD dataset to support human-computer interaction for advanced 3D shape learning. Integrated Computer-Aided Engineering32(2025) 4 ArtiCAD 17

  12. [12]

    Fan, Z.: Robot viewer: A web-based URDF visualizer.https://github.com/fan- ziqi/robot_viewer(2024), accessed: 2026-03-10 15

  13. [13]

    freecad.org/(2024), version 1.0

    FreeCAD Community: FreeCAD: Your own 3D parametric modeler.https://www. freecad.org/(2024), version 1.0. Accessed: 2026-02-17 3, 5

  14. [14]

    com/deepmind-media/Model-Cards/Gemini-3-Flash-Model-Card.pdf(Decem- ber 2025), published December 2025

    Google DeepMind: Gemini 3 Flash model card.https://storage.googleapis. com/deepmind-media/Model-Cards/Gemini-3-Flash-Model-Card.pdf(Decem- ber 2025), published December 2025. Accessed: 2026-03-05 10, 11

  15. [15]

    com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf(December 2025), model card update: December 2025

    Google DeepMind: Gemini 3 Pro model card.https://storage.googleapis. com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf(December 2025), model card update: December 2025. Accessed: 2026-03-05 11

  16. [16]

    Govindarajan, P., Baldelli, D., Pathak, J., Fournier, Q., Chandar, S.: CADmium: Fine-tuning code language models for text-driven sequential CAD design. Trans. Mach. Learn. Res. (2026) 2, 4

  17. [17]

    CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward

    Guan, Y., Wang, X., Ming, X., Zhang, J., Xu, D., Yu, Q.: CAD-coder: Text- to-CAD generation with chain-of-thought and geometric reward. arXiv preprint arXiv:2505.19713 (2025) 1, 2, 4, 5

  18. [18]

    In: Proc

    Hong, S., Zhuge, M., Chen, J., Zheng, X., Cheng, Y., Wang, J., Zhang, C., Wang, Z., Yau, S.K.S., Lin, Z., Zhou, L., Ran, C., Xiao, L., Wu, C., Schmidhuber, J.: MetaGPT: Meta programming for a multi-agent collaborative framework. In: Proc. Int. Conf. Learn. Represent. (2024) 5

  19. [19]

    In: Proc

    Iliash, D., Jiang, H., Zhang, Y., Savva, M., Chang, A.X.: S2O: Static to openable enhancement for articulated 3D objects. In: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (2026) 4, 5, 10, 13

  20. [20]

    In: Proc

    Jiang, Z., Hsu, C.C., Zhu, Y.: Ditto: Building digital twins of articulated objects from interaction. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. pp. 5616–5626 (2022) 5

  21. [21]

    IEEE Trans

    Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data7(3), 535–547 (2019) 10

  22. [22]

    ACM Trans

    Jones, B., Hildreth, D., Chen, D., Baran, I., Kim, V.G., Schulz, A.: AutoMate: A dataset and learning approach for automatic mating of CAD assemblies. ACM Trans. Graph.40(6), 1–18 (2021) 4

  23. [23]

    In: Proc

    Khan, M.S., Dupont, E., Ali, S.A., Cherenkova, K., Kacem, A., Aouada, D.: CAD- SIGNet: CADlanguage inference frompoint clouds using layer-wisesketch instance guided attention. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. pp. 4713–4722 (2024) 4

  24. [24]

    Khan, M.S., Sinha, S., Uddin, T., Stricker, D., Ali, S.A., Afzal, M.Z.: Text2CAD: Generating sequential CAD designs from beginner-to-expert level text prompts. In: Adv. Neural Inform. Process. Syst. vol. 37, pp. 7552–7579 (2024) 2, 4

  25. [25]

    cadrille: Multi-modal cad reconstruc- tion with online reinforcement learning.arXiv preprint arXiv:2505.22914, 2025

    Kolodiazhnyi, M., Tarasov, D., Zhemchuzhnikov, D., Nikulin, A., Zisman, I., Vorontsova, A., Konushin, A., Kurenkov, V., Rukhovich, D.: Cadrille: Multi- modal CAD reconstruction with online reinforcement learning. arXiv preprint arXiv:2505.22914 (2025) 4, 5

  26. [26]

    In: Proc

    Le, L., Xie, J., Liang, W., Wang, H.J., Yang, Y., Ma, Y.J., Vedder, K., Krishna, A., Jayaraman, D., Eaton, E.: Articulate-anything: Automatic modeling of artic- ulated objects via a vision-language foundation model. In: Proc. Int. Conf. Learn. Represent. (2025) 2, 5, 13, 14

  27. [27]

    Le, T., Nguyen, K., Huang, B., Ta, T.D., Nguyen, A.: Cadknitter: Compositional cad generation from text and geometry guidance (2025) 4

  28. [28]

    Lei, J., Deng, C., Shen, W.B., Guibas, L.J., Daniilidis, K.: NAP: Neural 3D artic- ulated object prior. In: Adv. Neural Inform. Process. Syst. vol. 36 (2023) 5 18 Y. Shui et al

  29. [29]

    Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Adv. Neural Inform. Process. Syst. vol. 33, pp. 9459–9474 (2020) 5

  30. [30]

    In: Proc

    Li, J., Ma, W., Li, X., Lou, Y., Zhou, G., Zhou, X.: CAD-llama: Leveraging large language models for computer-aided design parametric 3D model generation. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. pp. 18563–18573 (2025) 4

  31. [31]

    Seek-cad: A self-refined generative modeling for 3d parametric cad using local inference via deepseek.arXiv preprint arXiv:2505.17702, 2025c

    Li,X.,Li,J.,Song,Y.,Lou,Y.,Zhou,X.:Seek-CAD:Aself-refinedgenerativemod- eling for 3D parametric CAD using local inference via DeepSeek. arXiv preprint arXiv:2505.17702 (2025) 2, 4, 12, 13

  32. [32]

    In: Proc

    Liang,F.,Zhao,H.,Quan,Y.,Fang,W.,Shi,C.:Customizinggraphneuralnetwork for CAD assembly recommendation. In: Proc. ACM SIGKDD Conf. Knowl. Discov. Data Mining. pp. 1746–1757 (2024) 4

  33. [33]

    In: Proc

    Liu, J., Mahdavi-Amiri, A., Savva, M.: PARIS: Part-level reconstruction and mo- tion analysis for articulated objects. In: Proc. IEEE/CVF Int. Conf. Comput. Vis. pp. 352–363 (2023) 5

  34. [34]

    In: Proc

    Liu, J., Zhan, D., Wang, Q., Shao, P., Liu, S., Kuo, T.Y., Savva, M.: SINGAPO: Single image controlled generation of articulated parts in objects. In: Proc. Int. Conf. Learn. Represent. (2025) 2, 5, 13, 14

  35. [35]

    Pact: Part-decomposed single-view articulated object genera- tion,

    Liu, Q., Yao, X., Zhang, S., Deng, Y., Liu, G., Liu, Z., Jia, K.: PAct: Part-decomposed single-view articulated object generation. arXiv preprint arXiv:2602.14965 (2026) 2, 5, 13, 14

  36. [36]

    In: Proc

    Liu, Y., Iter, D., Xu, Y., Wang, S., Xu, R., Zhu, C.: G-Eval: NLG evaluation using GPT-4 with better human alignment. In: Proc. Conf. Empirical Methods Natural Language Process. pp. 2511–2522 (2023) 11

  37. [37]

    In: Proc

    Liu, Y., Jia, B., Lu, R., Ni, J., Zhu, S.C., Huang, S.: Building interactable replicas of complex articulated objects via Gaussian splatting. In: Proc. Int. Conf. Learn. Represent. (2025) 2, 5

  38. [38]

    Computer-Aided Design188, 103926 (2025).https://doi

    Lv, C., Bao, J.: Cadinstruct: A multimodal dataset for natural language-guided cad program synthesis. Computer-Aided Design188, 103926 (2025).https://doi. org/10.1016/j.cad.2025.1039264

  39. [39]

    In: Proc

    Mo, K., Zhu, S., Chang, A.X., Yi, L., Tripathi, S., Guibas, L.J., Su, H.: PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object un- derstanding. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2019) 5

  40. [40]

    NVIDIA: Nvidia isaac sim.https://developer.nvidia.com/isaac/sim, accessed: 2026-04-12 15

  41. [41]

    Accessed: 2026-03-05 11

    OpenAI: Update to GPT-5 system card: GPT-5.2.https://cdn.openai.com/ pdf/3a4153c8-c748-4b71-8e31-aecbde944f8d/oai_5_2_system-card.pdf(De- cember 2025), published December 11, 2025. Accessed: 2026-03-05 11

  42. [42]

    https://github.com/openscad/openscad(2024), accessed: 2026-02-17 5

    OpenSCAD Contributors: OpenSCAD: The programmers solid 3D CAD modeller. https://github.com/openscad/openscad(2024), accessed: 2026-02-17 5

  43. [43]

    arXiv preprint arXiv:2510.11631 (2025) 2, 4

    Preintner, T., Yuan, W., König, A., Bäck, T., Raponi, E., van Stein, N.: EvoCAD: Evolutionary CAD code generation with vision language models. arXiv preprint arXiv:2510.11631 (2025) 2, 4

  44. [44]

    In: Proc

    Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., Yang, C., Chen, W., Su, Y., Cong, X., et al.: ChatDev: Communicative agents for software development. In: Proc. 62nd Annu. Meet. Assoc. Comput. Linguist. pp. 15174–15186 (2024) 5

  45. [45]

    In: Proc

    Rukhovich, D., Dupont, E., Mallis, D., Cherenkova, K., Kacem, A., Aouada, D.: CAD-recode: Reverse engineering CAD code from point clouds. In: Proc. IEEE/CVF Int. Conf. Comput. Vis. pp. 9801–9811 (2025) 2, 4, 5 ArtiCAD 19

  46. [46]

    In: Proc

    Shen, L., Zhang, S., Li, H., Yang, P., Huang, Z., Zhang, Z., Zhao, H.: GaussianArt: Unified modeling of geometry and motion for articulated objects. In: Proc. Int. Conf. 3D Vision (3DV) (2026) 2, 5

  47. [47]

    Shinn,N.,Cassano,F.,Gopinath,A.,Narasimhan,K.,Yao,S.:Reflexion:Language agents with verbal reinforcement learning. In: Adv. Neural Inform. Process. Syst. vol. 36, pp. 8634–8652 (2023) 5

  48. [48]

    In: Proc

    Wang, S., Chen, C., Le, X., Xu, Q., Xu, L., Zhang, Y., Yang, J.: CAD-GPT: Syn- thesising CAD construction sequence with spatial reasoning-enhanced multimodal LLMs. In: Proc. AAAI Conf. Artif. Intell. vol. 39, pp. 7880–7888 (2025) 1, 4

  49. [49]

    In: Proc

    Wang, X., Chen, Y., Yuan, L., Zhang, Y., Li, Y., Peng, H., Ji, H.: Executable code actions elicit better LLM agents. In: Proc. Int. Conf. Mach. Learn. (2024) 5

  50. [50]

    In: Proc

    Willis, K.D., Jayaraman, P.K., Chu, H., Tian, Y., Li, Y., Grandi, D., Sanghi, A., Tran, L., Lambourne, J.G., Solar-Lezama, A., Matusik, W.: JoinABLe: Learning bottom-up assembly of parametric CAD joints. In: Proc. IEEE/CVF Conf. Com- put. Vis. Pattern Recog. pp. 15828–15839 (2022) 4, 10

  51. [51]

    ACM Trans

    Willis, K.D., Pu, Y., Luo, J., Chu, H., Du, T., Lambourne, J.G., Solar-Lezama, A., Matusik, W.: Fusion 360 gallery: A dataset and environment for programmatic CAD construction from human design sequences. ACM Trans. Graph.40(4), 1–24 (2021) 4

  52. [52]

    In: First Conference on Language Modeling (2024) 5

    Wu, Q., Bansal, G., Zhang, J., Wu, Y., Li, B., Zhu, E., Jiang, L., Zhang, X., Zhang, S., Liu, J., et al.: AutoGen: Enabling next-gen LLM applications via multi-agent conversations. In: First Conference on Language Modeling (2024) 5

  53. [53]

    In: Proc

    Wu, R., Xiao, C., Zheng, C.: DeepCAD: A deep generative network for computer- aided design models. In: Proc. IEEE/CVF Int. Conf. Comput. Vis. pp. 6772–6782 (2021) 4

  54. [54]

    In: Proc

    Xiang, F., Qin, Y., Mo, K., Xia, Y., Zhu, H., Liu, F., Liu, M., Jiang, H., Yuan, Y., Wang, H., Yi, L., Chang, A.X., Guibas, L.J., Su, H.: SAPIEN: A simulated part- based interactive environment. In: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog. (2020) 5, 15

  55. [55]

    arXiv:2505.06507 [cs.AI] https://arxiv.org/abs/2505.06507 Xiang Xu, Pradeep Kumar Jayaraman, Joseph G Lambourne, Karl DD Willis, and Yasutaka Furukawa

    Xie, H., Ju, F.: Text-to-CadQuery: A new paradigm for CAD generation with scalable large model capabilities. arXiv preprint arXiv:2505.06507 (2025) 2, 4, 5

  56. [56]

    Cad-mllm: Unifying multimodality- conditioned cad generation with mllm.arXiv preprint arXiv:2411.04954, 2024

    Xu, J., Wang, C., Zhao, Z., Liu, W., Ma, Y., Gao, S.: CAD-MLLM: Uni- fying multimodality-conditioned CAD generation with MLLM. arXiv preprint arXiv:2411.04954 (2024) 1, 4

  57. [57]

    In: Proc

    Xu, X., Willis, K.D., Lambourne, J.G., Cheng, C.Y., Jayaraman, P.K., Furukawa, Y.: SkexGen: Autoregressive generation of CAD construction sequences with dis- entangled codebooks. In: Proc. Int. Conf. Mach. Learn. pp. 24698–24724 (2022) 4

  58. [58]

    In: Proc

    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: ReAct: Synergizing reasoning and acting in language models. In: Proc. Int. Conf. Learn. Represent. (2023) 5

  59. [59]

    Yuan, Z., Lan, H., Zou, Q., Zhao, J.: 3D-PreMise: Can large language models generate 3D shapes with sharp features and parametric control? arXiv preprint arXiv:2401.06437 (2024) 12, 13

  60. [60]

    Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., Zhang, H., Gonzalez, J.E., Stoica, I.: Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. In: Adv. Neural Inform. Process. Syst. (2023) 11

  61. [61]

    arXiv preprint arXiv:2508.04002 (2025) 11

    Zhou,Z.,Han,J.,Du,L.,Fang,N.,Qiu,L.,Zhang,S.:CAD-Judge:Towardefficient morphological grading and verification for text-to-CAD generation. arXiv preprint arXiv:2508.04002 (2025) 11