arxiv: 2605.04003 · v1 · submitted 2026-05-05 · 💻 cs.MA · cs.AI· cs.IR

Recognition: unknown

Physics-Grounded Multi-Agent Architecture for Traceable, Risk-Aware Human-AI Decision Support in Manufacturing

Christopher Miller, Danny Hoang, David Gorsich, Farhad Imani, Matthew P. Castanier, Nasir Mannan, Ruby ElKharboutly, Ryan Matthiessen

Authors on Pith no claims yet

Pith reviewed 2026-05-07 03:31 UTC · model grok-4.3

classification 💻 cs.MA cs.AIcs.IR

keywords multi-agent systemsdecision supportmanufacturingdigital twinCNC machiningtraceabilityrisk-aware AIaerospace components

0 comments

The pith

MAKA multi-agent architecture improves tool execution success by up to 87.5 percentage points in manufacturing workflows while enabling traceable compensations that simulations predict will reduce surface deviations by an order of magnitude

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MAKA, a multi-agent system for human-AI decision support in high-precision manufacturing. It separates the workflow into intent routing, quantitative tool use, knowledge retrieval, and a critic agent that verifies physical plausibility and traceability before human review. This structure is tested on compensating deviations in machined titanium rotor blades using fused simulation and inspection data. A sympathetic reader cares because current LLMs struggle with reliable multi-step numerical tasks in safety-critical settings, and MAKA demonstrates concrete gains in execution reliability and error reduction within digital twin environments. The approach grounds AI outputs in physics-based models rather than relying on text generation alone.

Core claim

MAKA decomposes high-stakes manufacturing decisions into specialized agents for routing, analysis, retrieval and verification. The critic enforces physical plausibility, safety bounds and complete provenance. On a Ti-6Al-4V blade testbed fusing path-tracking errors, cutting forces, deflections and 3D scans from 16 parts, the system breaks deviations into pathing, wear, compliance and variability components. Benchmarks show up to 87.5 percentage point gains in successful multi-step tool execution over single-model baselines. Digital twin what-if studies indicate traceable compensations can shrink predicted surface deviations from order 10^{-2} inches to about ±10^{-3} inches across most of a

What carries the argument

The multi-agent knowledge analysis (MAKA) architecture, consisting of intent routing, tools-only quantitative analysis, knowledge graph retrieval, and critic-based verification that enforces physical plausibility, safety bounds, and provenance completeness.

Load-bearing premise

The critic-based verification step can reliably enforce physical plausibility, safety bounds, and provenance completeness across complex multi-step workflows without missing edge cases or real-world discrepancies.

What would settle it

Performing actual CNC machining of blades using the MAKA-recommended compensations and measuring the resulting surface deviations to determine if they match the simulated reduction from 10^{-2} to 10^{-3} inches.

Figures

Figures reproduced from arXiv: 2605.04003 by Christopher Miller, Danny Hoang, David Gorsich, Farhad Imani, Matthew P. Castanier, Nasir Mannan, Ruby ElKharboutly, Ryan Matthiessen.

**Figure 1.** Figure 1: Multi-agent knowledge framework for CNC manufacturing. The Central, Knowledge Graph, Analysis, and view at source ↗

**Figure 2.** Figure 2: Illustration of the experimental workflow from part design, to manufacturing simulation, manufacturing using view at source ↗

**Figure 3.** Figure 3: Experimental setup where (a) the spindle is set at an angle of view at source ↗

**Figure 4.** Figure 4: Comparison between MAKA and a single large language model in terms of pass rate (%) for correct tool view at source ↗

**Figure 5.** Figure 5: Critic agent ablation across six base models comparing critic-enabled and no-critic execution. (a) Average view at source ↗

**Figure 6.** Figure 6: Knowledge graph evaluation across six base models using paired runs with and without KG retrieval. (a) view at source ↗

**Figure 7.** Figure 7: Deflection results. (a) The location where force calculations are applied. (b) Blade deflection for a spindle view at source ↗

**Figure 8.** Figure 8: Successive MAKA-driven compensations for toolpath adjustment, deflection-focused parameter tuning, and view at source ↗

read the original abstract

High-precision CNC machining of free-form aerospace components requires bounded compensations informed by inspection, simulation, and process knowledge. Off-the-shelf large language model (LLM) assistants can generate text, but they do not reliably execute risk-constrained multi-step numerical workflows or provide auditable provenance for high-stakes decisions. We present multi-agent knowledge analysis (MAKA), a human-in-the-loop decision-support architecture that separates intent routing, tools-only quantitative analysis, knowledge graph retrieval, and critic-based verification that enforces physical plausibility, safety bounds, and provenance completeness before recommendations are surfaced for human approval. MAKA is instantiated on a Ti-6Al-4V rotor blade machining testbed by fusing virtual-machining path-tracking error fields, cutting-force and deflection simulations, and scan-based 3D inspection deviation maps from 16 blades. The analysis decomposes deviation into an evidence-linked pathing component, a drift-based wear proxy capturing systematic evolution across parts, a residual systematic compliance term, and a variability proxy for instability-aware escalation. In a three-level tool-orchestration benchmark (single-step through $\geq$3-step stateful sequences), MAKA improves successful tool execution by up to 87.5 percentage points relative to an unstructured single-model interaction pattern with identical tool access. Digital twin what-if studies show MAKA can coordinate traceable compensation candidates that reduce predicted surface deviation from order $10^{-2}$in to approximately $\pm 10^{-3}$in over most of the blade within the simulation environment, providing a pre-deployment verification signal for risk-aware human decision-making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MAKA gives a clean four-agent split for traceable machining decisions and shows solid simulation gains, but the performance numbers rest on virtual benchmarks with no real-part confirmation.

read the letter

MAKA splits the workflow into intent routing, tools-only analysis, knowledge-graph retrieval, and a critic that checks physical plausibility plus provenance before anything reaches the human. They instantiate it on Ti-6Al-4V blade data by feeding virtual-machining path errors, force simulations, and 3-D scan maps from 16 blades into a four-part deviation breakdown: pathing, drift-based wear proxy, compliance term, and variability proxy. The architecture is a direct response to the limits of raw LLMs on multi-step numerical tasks in high-stakes manufacturing.

Referee Report

3 major / 2 minor

Summary. The manuscript presents MAKA, a physics-grounded multi-agent architecture for human-in-the-loop decision support in high-precision CNC machining of free-form aerospace components such as Ti-6Al-4V rotor blades. It separates intent routing, quantitative tool analysis, knowledge graph retrieval, and critic-based verification to enforce physical plausibility, safety, and provenance. The architecture is evaluated on a testbed fusing virtual-machining simulations and 3D scan data from 16 blades, claiming up to 87.5 percentage point improvement in successful tool execution over an unstructured single-model baseline in a three-level benchmark, and simulated deviation reduction from order 10^{-2} in to ±10^{-3} in via traceable compensations.

Significance. If the reported gains in tool orchestration and deviation prediction hold under real-world conditions, MAKA could provide a valuable framework for traceable, risk-aware AI assistance in manufacturing, particularly by decomposing deviations into interpretable components (pathing, wear proxy, compliance, variability) and enforcing critic verification. The use of digital twins for what-if studies and the focus on provenance are strengths that align with needs in safety-critical domains.

major comments (3)

[Abstract] Abstract: The abstract reports an 87.5 percentage point improvement in successful tool execution and deviation reduction to approximately ±10^{-3} in, but supplies no information on the number of trials, statistical tests, baseline implementation details, or how the three-level benchmark (single-step through ≥3-step stateful sequences) was designed and executed. This information is load-bearing for assessing the central performance claim.
[Digital Twin Studies] Digital twin what-if studies: The deviation decomposition into pathing, drift-based wear proxy, residual systematic compliance term, and variability proxy, along with the predicted reduction from order 10^{-2} in to ±10^{-3} in, is performed entirely within the same virtual-machining and scan-based simulation environment used to train the agents. No post-machining 3-D scan data, force measurements, or surface-metrology results from blades actually cut under MAKA recommendations are presented, so the critic-enforced compensations are never stress-tested against real process noise, tool wear, or fixture compliance.
[Architecture Description] Critic-based verification: The architecture asserts that the critic step reliably enforces physical plausibility, safety bounds, and provenance completeness across complex multi-step workflows. This is central to the risk-aware human decision support claim, yet the manuscript provides only benchmark and simulation assertions without detailed validation against edge cases or real-world outcomes.

minor comments (2)

[Methods] The terms 'drift-based wear proxy', 'residual systematic compliance term', and 'variability proxy' are introduced in the abstract without explicit mathematical definitions or equations; including these in the methods section would improve clarity and reproducibility.
[Related Work] Ensure that prior work on multi-agent systems for manufacturing and digital twin applications in CNC is adequately cited to contextualize the novelty of the MAKA architecture.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. We agree with several points regarding the need for additional details and clarifications, and we will incorporate revisions to address them. The work is primarily simulation-based, which limits certain real-world validations at this stage.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract reports an 87.5 percentage point improvement in successful tool execution and deviation reduction to approximately ±10^{-3} in, but supplies no information on the number of trials, statistical tests, baseline implementation details, or how the three-level benchmark (single-step through ≥3-step stateful sequences) was designed and executed. This information is load-bearing for assessing the central performance claim.

Authors: We acknowledge that the abstract, as a concise summary, does not include these experimental details. The full manuscript provides the benchmark design in Section 4, including the use of 16 blades for calibration and multiple sequence trials for the three levels. However, to make the abstract more informative, we will revise it to briefly mention the benchmark structure (three levels with stateful sequences), the baseline (unstructured LLM with tool access), and note that improvements are statistically significant based on repeated trials. We will also ensure the number of trials and any statistical measures are highlighted. revision: yes
Referee: [Digital Twin Studies] Digital twin what-if studies: The deviation decomposition into pathing, drift-based wear proxy, residual systematic compliance term, and variability proxy, along with the predicted reduction from order 10^{-2} in to ±10^{-3} in, is performed entirely within the same virtual-machining and scan-based simulation environment used to train the agents. No post-machining 3-D scan data, force measurements, or surface-metrology results from blades actually cut under MAKA recommendations are presented, so the critic-enforced compensations are never stress-tested against real process noise, tool wear, or fixture compliance.

Authors: The referee correctly identifies that the what-if studies and deviation reductions are evaluated within the digital twin simulation, which incorporates real scan data from 16 blades but does not include new physical machining under MAKA-guided compensations. This is a deliberate design choice to enable safe, repeatable exploration of compensation strategies and risk assessment prior to deployment. We will revise the manuscript to more explicitly discuss this as a limitation of the current study and outline future work involving physical validation on the CNC testbed. The decomposition and critic enforcement are validated against the existing scan data and simulations. revision: partial
Referee: [Architecture Description] Critic-based verification: The architecture asserts that the critic step reliably enforces physical plausibility, safety bounds, and provenance completeness across complex multi-step workflows. This is central to the risk-aware human decision support claim, yet the manuscript provides only benchmark and simulation assertions without detailed validation against edge cases or real-world outcomes.

Authors: We agree that more detailed validation of the critic component would strengthen the claims. The current evaluation shows its impact through improved tool execution success in the multi-step benchmark and consistent deviation predictions in the digital twin studies. In the revision, we will add a new subsection or appendix detailing specific edge cases tested (e.g., conflicting tool recommendations, boundary condition violations) and how the critic handles them, including failure modes observed in simulations. We will also expand the discussion on real-world applicability. revision: yes

standing simulated objections not resolved

Real-world post-machining validation data from blades machined using MAKA recommendations, as conducting new physical experiments is outside the scope of the current simulation-focused study.

Circularity Check

0 steps flagged

No circularity: empirical gains are direct baseline comparisons in simulation

full rationale

The paper defines the MAKA multi-agent architecture independently (intent routing, tools-only analysis, knowledge graph, critic verification) and then reports performance via explicit comparison to an unstructured single-model baseline using identical tools, plus digital-twin what-if studies on scanned blades. No equations, fitted parameters renamed as predictions, self-citations as load-bearing premises, or ansatz smuggling appear in the derivation. The 87.5 pp success improvement and simulated deviation reduction are generated by applying the architecture inside the same virtual environment used for evaluation, but this is a standard empirical protocol rather than a definitional reduction; the architecture itself is not derived from the benchmark outcomes.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 4 invented entities

The central claim rests on the premise that separating agent functions plus critic verification produces traceable, physically plausible outputs, and that the four-component deviation decomposition is both valid and evidence-linked; these are introduced without independent external validation or first-principles derivation.

axioms (2)

domain assumption The multi-agent separation into intent routing, tools-only quantitative analysis, knowledge-graph retrieval, and critic verification reliably enforces physical plausibility, safety bounds, and provenance completeness.
This is the core design premise of MAKA as stated in the abstract.
domain assumption Machining deviations can be decomposed into an evidence-linked pathing component, a drift-based wear proxy, a residual systematic compliance term, and a variability proxy.
Explicitly presented as the analysis method applied to the 16-blade testbed data.

invented entities (4)

MAKA multi-agent architecture no independent evidence
purpose: To deliver traceable, risk-aware human-AI decision support in manufacturing
Newly proposed system whose performance is the central claim.
drift-based wear proxy no independent evidence
purpose: To capture systematic evolution of deviations across multiple parts
Introduced as one of the four deviation components without external corroboration.
residual systematic compliance term no independent evidence
purpose: To account for material bending and deflection effects
Part of the proposed four-component decomposition.
variability proxy no independent evidence
purpose: To represent instability for escalation decisions
Introduced as the fourth component of the deviation analysis.

pith-pipeline@v0.9.0 · 5627 in / 1963 out tokens · 84739 ms · 2026-05-07T03:31:38.786956+00:00 · methodology

Review history (11 revisions) →

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 4 canonical work pages

[1]

X. Zhao, L. Zheng, M. Shi, X. Zhang, Y . Zhang, Unified modelling for continuous–discrete hybrid adaptive machining cps of large thin-walled parts, International Journal of Production Research 62 (2024) 3099–3119

2024
[2]

L. Chen, H. Xu, Q. Huang, P. Wang, An integrated method for compensating and correcting nonlinear error in five-axis machining utilizing cutter contacting point data, Scientific Reports 14 (2024) 8763

2024
[3]

Williams, R

B. Williams, R. A. Awad, C. Mulkey, G. Ciocarlie, M. Ismail, K. Saleeby, Securing smart manufacturing: Detection of cyber-physical attacks in cnc-based systems, in: 2025 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), IEEE, 2025, pp. 428–438

2025
[4]

N. G. Markatos, A. Mousavi, Manufacturing quality assessment in the industry 4.0 era: a review, Total Quality Management & Business Excellence 34 (2023) 1655–1681

2023
[5]

J. Liao, Z. Huang, Data model-based toolpath generation techniques for cnc milling machines, Frontiers in Mechanical Engineering 10 (2024) 1358061

2024
[6]

Cao, Multisensor data fusion-driven digital twins in computer numerical control machining: A review., Machines 13 (2025)

Y . Cao, Multisensor data fusion-driven digital twins in computer numerical control machining: A review., Machines 13 (2025)

2025
[7]

D. K. Mohanta, B. Sahoo, A. M. Mohanty, Experimental analysis for optimization of process parameters in machining using coated tools, Journal of Engineering and Applied Science 71 (2024) 38

2024
[8]

C. A. Escobar, M. E. McGovern, R. Morales-Menendez, Quality 4.0: a review of big data challenges in manufacturing, Journal of Intelligent Manufacturing 32 (2021) 2319–2334

2021
[9]

Hoang, R

D. Hoang, R. Chen, G. Bollas, F. Imani, Hyperdimensional computing for explainable information fusion and multi-task adaptation in advanced manufacturing, Information Fusion (2025) 103898

2025
[10]

Z. Chen, D. Hoang, F. J. Piran, R. Chen, F. Imani, Federated hyperdimensional computing for hierarchical and distributed quality monitoring in smart manufacturing, Internet of Things 31 (2025) 101568. 20 Physics-Grounded Agents for Trustworthy Manufacturing

2025
[11]

X. Wang, Q. Bai, S. Gao, L. Zhao, K. Cheng, A toolpath planning method for optical freeform surface ultra- precision turning based on nurbs surface curvature, Machines 11 (2023) 1017

2023
[12]

R. P. Singh, Y . Chen, Curvature-adoptive cnc machining of freeform optics via dynamic tangential toolpath optimization, Materials 18 (2025) 5153

2025
[13]

E. Li, J. Zhou, C. Yang, J. Zhao, Z. Li, S. Zhang, M. Wang, Part machining deformation prediction based on spatial-temporal correlation learning of geometry and cutting loads, Journal of Manufacturing Processes 92 (2023) 397–411

2023
[14]

Kukreja, S

A. Kukreja, S. S. Pande, Optimal toolpath planning strategy prediction using machine learning technique, Engineering Applications of Artificial Intelligence 123 (2023) 106464

2023
[15]

U. H. Garba, T. Wang, J. Dong, Y . Tian, J. Kang, C. Tian, Enhancing propeller design with freeform contours through nurbs interpolation for 2d fabrication, cad/cam for 3d production, optimized with taguchi method and artificial neural network, Results in Engineering (2025) 107069

2025
[16]

Y . Chen, J. Wang, Q. Tang, J. Li, A study on the coarse-to-fine error decomposition and compensation method of free-form surface machining, Applied Sciences 14 (2024) 9044

2024
[17]

J. Yao, L. Zhang, J. Huang, Evaluation of large language model-driven automl in data and model management from human-centered perspective, Frontiers in Artificial Intelligence 8 (2025) 1590105

2025
[18]

Hoang, D

D. Hoang, D. Gorsich, M. P. Castanier, F. Imani, Knowledge graph fusion with large language models for accurate, explainable manufacturing process planning, arXiv preprint arXiv:2506.13026 (2025)

work page arXiv 2025
[19]

Z. Chen, F. Imani, A multi-expert framework for enhancing multimodal large language models in industrial anomaly detection, Pattern Recognition (2025) 112752

2025
[20]

Z. Chen, H. Chen, M. Imani, F. Imani, Can multimodal large language models be guided to improve industrial anomaly detection?, in: International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, volume 89213, American Society of Mechanical Engineers, 2025, p. V02BT02A051

2025
[21]

C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, J.-R. Wen, Tool learning with large language models: A survey, Frontiers of Computer Science 19 (2025) 198343

2025
[22]

W. Xu, C. Huang, S. Gao, S. Shang, Llm-based agents for tool learning: A survey: W. xu et al., Data Science and Engineering (2025) 1–31

2025
[23]

S. Li, J. Corney, Mechrag: a multimodal large language model for mechanical engineering, Communications Engineering 4 (2025) 187

2025
[24]

H. Fan, C. Liu, N. E. Janvisloo, S. Bian, J. Y . H. Fuh, W. F. Lu, B. Li, Mavila: Unlocking new potentials in smart manufacturing through vision language models, Journal of Manufacturing Systems 80 (2025) 258–271

2025
[25]

J. Jeon, Y . Sim, H. Lee, C. Han, D. Yun, E. Kim, S. L. Nagendra, M. B. Jun, Y . Kim, S. W. Lee, et al., Chatcnc: Conversational machine monitoring via large language model and real-time data retrieval augmented generation, Journal of Manufacturing Systems 79 (2025) 504–514

2025
[26]

X. Chen, Y . Lei, Y . Li, S. Parkinson, X. Li, J. Liu, F. Lu, H. Wang, Z. Wang, B. Yang, et al., Large models for machine monitoring and fault diagnostics: Opportunities, challenges, and future direction, Journal of Dynamics, Monitoring and Diagnostics 4 (2025) 76–90

2025
[27]

Y . Xiao, S. Zheng, J. Shi, X. Du, J. Hong, Knowledge graph-based manufacturing process planning: A state-of- the-art review, Journal of Manufacturing Systems 70 (2023) 417–435

2023
[28]

Hossfeld, A

M. Hossfeld, A. Wortmann, A universal framework for skill-based cyber-physical production systems, Journal of Manufacturing and Materials Processing 8 (2024) 221

2024
[29]

Y . Li, H. Zhao, H. Jiang, Y . Pan, Z. Liu, Z. Wu, P. Shu, J. Tian, T. Yang, S. Xu, et al., Large language models for manufacturing, arXiv preprint arXiv:2410.21418 (2024)

work page arXiv 2024
[30]

Kernan Freire, C

S. Kernan Freire, C. Wang, M. Foosherian, S. Wellsandt, S. Ruiz-Arenas, E. Niforatos, Knowledge sharing in manufacturing using llm-powered tools: user study and model benchmarking, Frontiers in Artificial intelligence 7 (2024) 1293084

2024
[31]

J. Lim, B. V ogel-Heuser, I. Kovalenko, Large language model-enabled multi-agent manufacturing systems, in: 2024 IEEE 20th International Conference on Automation Science and Engineering (CASE), IEEE, 2024, pp. 3940–3946

2024
[32]

M. Ni, T. Wang, J. Leng, C. Chen, L. Cheng, A large language model-based manufacturing process planning approach under industry 5.0, International Journal of Production Research (2025) 1–20. 21 Physics-Grounded Agents for Trustworthy Manufacturing

2025
[33]

T. Mao, S. Yang, B. Fu, A multi-agent framework for multi-source manufacturing knowledge integration and question answering, in: Companion Proceedings of the ACM on Web Conference 2025, 2025, pp. 1687–1695

2025
[34]

K. Šket, D. Potoˇcnik, M. Brezocnik, M. Ficko, S. Klanˇcnik, Large language models for g-code generation in cnc machining: A comparison of chatgpt-3.5 and chatgpt-4o, Advances in Production Engineering & Management 20 (2025) 224–238

2025
[35]

Gllm: Self-corrective g-code generation using large language models with user feedback.arXiv preprint arXiv:2501.17584, 2025

M. Abdelaal, S. Lokadjaja, G. Engert, Gllm: Self-corrective g-code generation using large language models with user feedback, arXiv preprint arXiv:2501.17584 (2025)

work page arXiv 2025
[36]

H. Yang, H. Wang, Q. Huang, X. Wu, W. Ji, Z. Li, X. Han, Aero-engine blade error distributions predictions using novel machine learning models, International Journal of Mechanical Sciences (2025) 110262

2025
[37]

Jignasu, K

A. Jignasu, K. Marshall, B. Ganapathysubramanian, A. Balu, C. Hegde, A. Krishnamurthy, Towards foundational ai models for additive manufacturing: Language models for g-code debugging, manipulation, and comprehension, arXiv preprint arXiv:2309.02465 (2023)

work page arXiv 2023
[38]

Hossain, M

S. Hossain, M. Z. Abedin, R. K. Saha, M. Touhiduzzaman, M. J. Hossen, Optimization of cutting temperature and surface roughness in cnc turning of ti-6al-4v alloy using response surface methodology, Heliyon 11 (2025)

2025
[39]

Ingle, D

S. Ingle, D. Raut, Evaluation of tool wears mechanism considering machining parameters and performance parameters for titanium alloy in turning operation on cnc, Advances in Materials and Processing Technologies 10 (2024) 1380–1400

2024
[40]

D. V . P. Ramena, K. A. Vikram, R. Chebolu, P. Barmavatu, V . S. Sikarwar, J. Giri, T. Sathish, Sustainable green cutting fluid for interpreting optimization of process variables while machining on various cnc manufacturing systems—an experimental approach for exploring, The International Journal of Advanced Manufacturing Technology 136 (2025) 329–342

2025
[41]

D. Wu, H. Wang, J. Peng, K. Zhang, J. Yu, Y . Li, M. Wang, X. Zhang, Analysis of machining deformation for adaptive cnc machining technology of near-net-shaped jet engine blade, The International Journal of Advanced Manufacturing Technology 104 (2019) 3383–3400. 22

2019