pith. sign in

arxiv: 2512.23292 · v3 · pith:JDWG4XQWnew · submitted 2025-12-29 · 💻 cs.AI · cs.LG

Agentic Physical AI toward a Domain-Specific Foundation Model for Nuclear Reactor Control

Pith reviewed 2026-05-21 16:53 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords nuclear reactor controlagentic physical AIdomain-specific foundation modelsphysics-based validationsynthetic data scalingclosed-loop reliabilitypolicy distillationactuation strategy
0
0 comments X

The pith

Scaling synthetic nuclear reactor scenarios from 1,000 to 100,000 examples lets a 360-million-parameter model achieve reliable closed-loop control by rejecting most options and locking onto one actuation strategy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that general-purpose AI models fall short for physical control tasks because they imitate perceptual patterns instead of ensuring safe physical outcomes. It proposes a shift to compact language models trained as Agentic Physical AI, where optimization comes from validating executed actions against physics constraints in simulation rather than from pattern matching. Scaling the synthetic dataset for nuclear reactor control produces clear gains in reliability under nominal conditions, including sharp drops in performance variance and an emergent focus on a single effective strategy. This pathway could matter for building trustworthy controllers in safety-critical domains if the simulation results hold up.

Core claim

Training a 360-million-parameter language model on synthetic nuclear reactor control scenarios, with dataset size scaled from 10^3 to 10^5 examples, produces strong gains in closed-loop reliability under nominal simulated conditions. These include variance collapse by a factor of approximately 500 and smooth stabilization at strict tolerances. Despite balanced exposure to four actuation families, the model autonomously rejects roughly 70 percent of the training distribution and concentrates 95 percent of runtime execution on a single-bank strategy. This emergent policy distillation occurs without reinforcement learning or reward engineering and is driven solely by outcome-level success under

What carries the argument

Agentic Physical AI: compact language models whose policy optimization is driven by physics-based validation of executed actions rather than by perceptual inference or imitation.

If this is right

  • Larger training sets induce variance collapse and stabilize execution behavior within the sampled distribution.
  • The model develops an emergent preference for one actuation family even when all four are presented equally during training.
  • Reliability improves steeply but smoothly once dataset size passes a threshold, replacing high-variance tail excursions with consistent performance.
  • Outcome-level physical validation alone suffices to produce policy distillation without any reinforcement learning step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the same scaling pattern appears in other physical domains, domain-specific compact models could become a practical alternative to general foundation models for control tasks.
  • The autonomous rejection of most training options may reflect an internal filtering mechanism that identifies and avoids physically risky paths based on execution outcomes.
  • Testing the model on scenarios that deliberately violate the original training distribution could reveal whether the distilled strategy remains robust or collapses outside the sampled regime.

Load-bearing premise

The synthetic nuclear reactor control scenarios used for training and validation accurately reflect the dynamics, constraints, and safety requirements of actual reactors so that success in simulation implies reliable real-world control performance.

What would settle it

Deploy the trained model on a higher-fidelity simulator that injects realistic unmodeled effects such as sensor noise or unexpected reactivity transients and measure whether closed-loop safety violations remain near zero or increase sharply.

Figures

Figures reproduced from arXiv: 2512.23292 by Jay Yoo, Kazuma Kobayashi, Sajedul Talukder, Samrendra Roy, Seid Koric, Souvik Chakraborty, Syed Bahauddin Alam, Yoon Pyo Lee.

Figure 1
Figure 1. Figure 1: Integrated framework for Agentic Physical AI in nuclear reactor control. Three interconnected paradigms form a coherent system: (Left) Agentic AI: the compact 360M-parameter model optimizes runtime policy away from balanced training distribution (KL divergence increases with scale: 0.18→0.31 nats), concentrating 76% of actions on single_b2 strategies despite only 30% training frequency, and deploying britt… view at source ↗
Figure 2
Figure 2. Figure 2: Experimental workflow for Agentic Physical AI in nuclear reactor control. The pipeline consists of three stages: (Stage 1) Data generation: KOMODO simulator generates synthetic corpora at three scales (1K, 10K, 100K) with balanced actuation families (single-bank, simultaneous, sequential) to prevent trivial dataset bias. (Stage 2) Two-phase curriculum training: SmolLM2-360M backbone undergoes Phase 1 conti… view at source ↗
Figure 3
Figure 3. Figure 3: Scaling of validation success and regime robustness with dataset size. a. Validation success rates across tolerance bands show a sharp improvement between 10K and 100K, revealing the emergence of a stable, high-precision control policy, with sub-1% accuracy jumping from 26.2 to 92%. b. Performance stratified by power-change bins shows that the 100K model achieves regime-consistent precision that is absent … view at source ↗
Figure 4
Figure 4. Figure 4: Terminal power error distributions across dataset scales. a. CDF curves highlight tail-risk collapse at 100K scale. b. Violin plots demonstrate narrowing uncertainty and emergence of a stable, low-variance control policy. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Benchmarking Agentic AI against classical PID and direct learning baselines. a. Overall success rates (plus or minus 5% tolerance). The Proposed (100K) model achieves 97.4% success, significantly outperforming the naive PID baseline (43.8%) which is limited by single-bank saturation. b. Success rates stratified by power change magnitude. PID performance collapses in the Large regime due to physical limits,… view at source ↗
Figure 6
Figure 6. Figure 6: Runtime actuation patterns and success rates across model scales. a. The proportional distribution of actuation patterns in the 1K, 10K, and 100K training datasets, engineered to be balanced. b. The distribution of actuation patterns executed during 2,000 simulator-run evaluations for each model scale. The pronounced divergence between (a) and (b), particularly the twofold enrichment of single-bank strateg… view at source ↗
Figure 7
Figure 7. Figure 7: Validation case count and success rate by actuation pattern and model scale. Each panel shows the total number of runtime attempts and the number of successful cases (plus or minus 5% tolerance) for each actuation pattern, revealing how scaling induces strategic specialization. The evolution from uniform low success (1K) to selective excellence (10K) to near-perfect discrimination (100K) demonstrates that … view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of severe failures (greater than 10% error) across actuation patterns and dataset scales. Scaling from 1K to 100K induces a collapse of catastrophic outliers, a prerequisite for agent-level reliability in safety￾critical domains. on it consistently, rediscovering through data-driven learning the principles of safe reactor control that took human operators decades to learn. However, this safe m… view at source ↗
Figure 9
Figure 9. Figure 9: Performance comparison of generalization (PyRK) and architectural extension (Variable Window) models. a. Parsing success rates reveal that the Adapter Only approach fails to maintain valid syntax (48%), whereas PyRK and Extended models achieve 100%. b. Validation success rates across tolerance bands show PyRK’s superior precision, while the Extended model trades strict accuracy for flexible time-window han… view at source ↗
read the original abstract

The prevailing paradigm in AI for physical systems (scaling general-purpose foundation models toward universal multimodal reasoning) confronts a fundamental barrier at the control interface. Recent benchmarks show that even frontier vision--language models achieve only 50--53% accuracy on basic quantitative physics tasks, behaving as approximate guessers that preserve semantic plausibility by violating physical constraints. This input unfaithfulness is not a scaling deficiency but a structural limitation: perception-centric architectures optimize parameter-space imitation, whereas safety-critical control demands outcome-space guarantees over executed actions. Here, we present a fundamentally different pathway "toward" domain-specific foundation models by introducing compact language models operating as Agentic Physical AI, in which policy optimization is driven by physics-based validation rather than perceptual inference. We train a 360-million-parameter model on synthetic nuclear reactor control scenarios, scaling the dataset from 10^3 to 10^5 examples. Scaling induces strong improvements in closed-loop reliability under nominal simulated conditions, with a steep but smooth gain at strict tolerances: small-scale systems exhibit high-variance imitation with severe tail excursions, while large-scale models undergo variance collapse (approximately 500times reduction), stabilizing execution-level behavior within the sampled distribution. Despite balanced exposure to four actuation families, the model autonomously rejects approximately 70\% of the training distribution, concentrating 95% of runtime execution on a single-bank strategy. This emergent policy distillation arises without reinforcement learning or reward engineering, driven solely by outcome-level success under physical execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Agentic Physical AI as compact language models for domain-specific foundation models in nuclear reactor control. A 360-million-parameter model is trained on synthetic scenarios with dataset scaling from 10^3 to 10^5 examples. Scaling is claimed to produce strong gains in closed-loop reliability under nominal simulated conditions, including ~500x variance reduction, stabilization within the distribution, and emergent policy distillation (autonomous rejection of ~70% of the training distribution with 95% concentration on a single-bank strategy) driven solely by physics-based outcome validation rather than RL or perceptual imitation.

Significance. If the simulation accurately captures reactor dynamics and the scaling results generalize, the approach could provide a useful alternative to general foundation models for safety-critical physical control by prioritizing outcome-space guarantees. The reported variance collapse and autonomous policy concentration without reward engineering represent potentially interesting empirical observations in AI for physical systems.

major comments (2)
  1. [Abstract and Results] Abstract and Results: The quantitative claims of approximately 500 times variance reduction, 70% distribution rejection, and 95% concentration on single-bank actuation are presented without accompanying methods, statistical baselines, error analysis, ablation studies, or verification that these effects arise from the proposed physics-validation mechanism rather than simulator-specific artifacts or data properties.
  2. [Experimental setup and validation sections] Experimental setup and validation sections: All closed-loop reliability, variance reduction, and policy distillation results are obtained exclusively inside a nominal synthetic simulator. No cross-validation against real plant data, no injection of model mismatch or unmodeled effects, and no off-nominal test regimes are reported. This is load-bearing for the central claim that dataset scaling produces reliable agentic physical control.
minor comments (2)
  1. [Abstract] Abstract: '500times' is a typographical error and should read '500 times'.
  2. [Introduction] The term 'Agentic Physical AI' is introduced without a precise formal definition or comparison to related concepts in control theory or agentic systems.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments. We address each major point below with clarifications and indicate where revisions will be made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results: The quantitative claims of approximately 500 times variance reduction, 70% distribution rejection, and 95% concentration on single-bank actuation are presented without accompanying methods, statistical baselines, error analysis, ablation studies, or verification that these effects arise from the proposed physics-validation mechanism rather than simulator-specific artifacts or data properties.

    Authors: The variance reduction is computed as the ratio of standard deviations in closed-loop setpoint tracking error between the 10^3 and 10^5 scale models across 1000 held-out episodes, with distribution rejection and concentration measured by the fraction of runtime actions falling outside the original training support and the share allocated to the dominant single-bank policy. We will add an appendix containing statistical baselines (random policy and non-scaled model), bootstrap error estimates, and an ablation that removes the physics-outcome filter to isolate its contribution versus data or simulator properties. revision: yes

  2. Referee: [Experimental setup and validation sections] Experimental setup and validation sections: All closed-loop reliability, variance reduction, and policy distillation results are obtained exclusively inside a nominal synthetic simulator. No cross-validation against real plant data, no injection of model mismatch or unmodeled effects, and no off-nominal test regimes are reported. This is load-bearing for the central claim that dataset scaling produces reliable agentic physical control.

    Authors: We agree the results are limited to nominal synthetic conditions and have revised the Discussion to state this restriction explicitly. The synthetic simulator was chosen to enable controlled scaling experiments that isolate physics-validation effects. Real-plant cross-validation is not feasible at present owing to regulatory and proprietary barriers on operational nuclear data; we therefore cannot supply mismatch or off-nominal results in the current revision. revision: partial

standing simulated objections not resolved
  • Cross-validation against real nuclear plant data or injection of unmodeled dynamics cannot be performed in this study due to access, safety, and regulatory constraints.

Circularity Check

0 steps flagged

No circularity: empirical scaling observations in simulation are self-contained

full rationale

The manuscript reports an empirical scaling experiment: a 360M-parameter model is trained on synthetic nuclear reactor control scenarios with dataset size increased from 10^3 to 10^5 examples, after which closed-loop reliability metrics (variance reduction, policy concentration) are measured inside the same nominal simulator. No equations, derivations, or first-principles predictions appear in the provided text. The observed improvements are direct experimental outcomes rather than quantities fitted to a subset and then relabeled as predictions. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked to justify the central claims. Because the results consist of measured performance deltas under stated simulation conditions, they do not reduce to their inputs by construction and remain independent of the circularity patterns enumerated in the guidelines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the assumption that synthetic scenarios provide faithful physics validation and that observed simulation success will transfer; no explicit free parameters or invented entities beyond the high-level paradigm are detailed in the abstract.

axioms (1)
  • domain assumption Synthetic nuclear reactor scenarios sufficiently capture real dynamics and safety constraints for policy transfer
    All training and evaluation occur in simulation with success defined by physical execution outcomes.
invented entities (1)
  • Agentic Physical AI no independent evidence
    purpose: Compact language models whose policy optimization is driven by physics-based validation instead of perceptual inference
    New term and pathway introduced to contrast with general foundation model scaling.

pith-pipeline@v0.9.0 · 5823 in / 1293 out tokens · 54411 ms · 2026-05-21T16:53:10.829055+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 10 internal anchors

  1. [1]

    Imron, Zuhair, P

    M. Imron, Zuhair, P. M. Udiyani, and T. M. Sembiring. Development of open reactor simulator KOMODO based on open-source platform.Journal of Physics: Conference Series, 1198(2):022049, 2019

  2. [2]

    Kathryn D. Huff. PyRK: A python package for nuclear reactor kinetics, 2020. Available at https://github. com/pyrk/pyrk

  3. [3]

    Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems.Scientific Reports, 14(1):2101, 2024

    Kazuma Kobayashi and Syed Bahauddin Alam. Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems.Scientific Reports, 14(1):2101, 2024. Published in Nature Portfolio

  4. [4]

    Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators.npj Materials Degradation, 9(1):13, 2025

    Raisa Bentay Hossain, Farid Ahmed, Kazuma Kobayashi, Seid Koric, Diab Abueidda, and Syed Bahauddin Alam. Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators.npj Materials Degradation, 9(1):13, 2025. 1,400× speedup over CFD for thermal-hydraulic predictions

  5. [5]

    A model predictive controller for the core power control system of a lead-cooled fast reactor.Frontiers in Energy Research, 10:893528, 2022

    Yuxiang Hu, Li Liang, Li Chen, and Wen Zeng. A model predictive controller for the core power control system of a lead-cooled fast reactor.Frontiers in Energy Research, 10:893528, 2022

  6. [6]

    Model predictive power control of a heat pipe cooled reactor.Frontiers in Energy Research, 10:984007, 2023

    Jiasheng Huang, Pu Sun, and Shujie Pu. Model predictive power control of a heat pipe cooled reactor.Frontiers in Energy Research, 10:984007, 2023

  7. [7]

    K. M. Mostafa et al. Improved intelligent model predictive controller for the nuclear power reactor system. Kerntechnik, 89:764–773, 2024

  8. [8]

    G. Lee, S. J. Lee, and C. Lee. A convolutional neural network model for abnormality diagnosis in a nuclear power plant.Applied Soft Computing, 99:106874, 2021

  9. [9]

    Y . H. Chae, C. Lee, S. M. Han, and P. H. Seong. Graph neural network based multiple accident diagnosis in nuclear power plants: Data optimization to represent the system configuration.Nuclear Engineering and Technology, 54:2859–2870, 2022

  10. [10]

    Kazuma Kobayashi and Syed Bahauddin Alam. Explainable, interpretable, and trustworthy ai for an intelligent digital twin: A case study on remaining useful life.Engineering Applications of Artificial Intelligence, 129:107620, 2024. 38 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT

  11. [11]

    Sensor degradation in nuclear reactor pressure vessels: The overlooked factor in remaining useful life prediction.npj Materials Degradation, 8(1):71, 2024

    Raisa Bentay Hossain, Kazuma Kobayashi, and Syed Bahauddin Alam. Sensor degradation in nuclear reactor pressure vessels: The overlooked factor in remaining useful life prediction.npj Materials Degradation, 8(1):71, 2024

  12. [12]

    Al Rashdan et al

    A. Al Rashdan et al. Scalable methods to automate manual work management activities using artificial intelligence. Nuclear Engineering and Technology, 2024

  13. [13]

    Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

    Yoonpyo Lee, Jemin Cha, Yong Yu, and Seung Gyu Kim. Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

  14. [14]

    Executive order on genesis mission for AI-powered scientific discovery

    The White House. Executive order on genesis mission for AI-powered scientific discovery. Presidential Executive Order, November 2025. Available athttps://www.whitehouse.gov

  15. [15]

    Genesis mission fact sheet: Accelerating scientific breakthroughs through AI and foundation models, November 2025

    The White House Office of Science and Technology Policy. Genesis mission fact sheet: Accelerating scientific breakthroughs through AI and foundation models, November 2025. Available at https://www.whitehouse. gov/ostp

  16. [16]

    Crawford, Syed Bahauddin Alam, Marta D’Elia, Krishna Garikipati, Shirley Ho, Scott H

    Dona L. Crawford, Syed Bahauddin Alam, Marta D’Elia, Krishna Garikipati, Shirley Ho, Scott H. Holan, Michael Kearns, Petros Koumoutsakos, Brian Kulis, Daniel I. Meiron, and Nathaniel Trask.Foundation Models for Scientific Discovery and Innovation: Opportunities Across the Department of Energy and the Scientific Enterprise

  17. [17]

    Possibilities of reinforcement learning for nuclear power plants: evidence on current applications and beyond.Nuclear Engineering and Technology, 56:1959–1974, 2024

    Aobo Gong, Yu Chen, Jia Zhang, and Xiaoyu Li. Possibilities of reinforcement learning for nuclear power plants: evidence on current applications and beyond.Nuclear Engineering and Technology, 56:1959–1974, 2024

  18. [18]

    Radaideh

    Luke Tunkle, Karrar Abdulraheem, Lixiang Lin, and Majid I. Radaideh. Nuclear microreactor transient and load-following control with deep reinforcement learning.Energy Conversion and Management: X, page 101090, 2025

  19. [19]

    Radaideh et al

    Majid I. Radaideh et al. Multistep criticality search and power shaping in nuclear microreactors with deep reinforcement learning.Nuclear Science and Engineering, pages 1–13, 2025

  20. [20]

    Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602:414–419, 2022

    Jonas Degrave et al. Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602:414–419, 2022

  21. [21]

    A neural network predictive control method for power control of small pressurized water reactors.Annals of nuclear energy, 169:108946, 2022

    Kai Xiao, Qiaofeng Wu, Jie Chen, Xiaofei Pu, Ying Zhang, and Pengcheng Yang. A neural network predictive control method for power control of small pressurized water reactors.Annals of nuclear energy, 169:108946, 2022

  22. [22]

    Ying Yin, Zhijun Yuan, Bo Pang, Yu Xiao, and Yong Deng. Design and assessment of a core-power controller for a lithium-cooled space nuclear reactor based on the concept of fuzzy model predictive control.Frontiers in Energy Research, 10:1067892, 2023

  23. [23]

    QuantiPhy: A quantitative benchmark evaluating physical reasoning abilities of vision-language models.arXiv preprint arXiv:2512.19526, 2024

    Puyin Li, Tiange Xiang, Ella Mao, Shirley Wei, Xinye Chen, Adnan Masood, Fei-Fei Li, and Ehsan Adeli. QuantiPhy: A quantitative benchmark evaluating physical reasoning abilities of vision-language models.arXiv preprint arXiv:2512.19526, 2024

  24. [24]

    PhysBench: Benchmarking and enhancing vision-language models for physical world understanding.arXiv preprint arXiv:2501.16411, 2025

    Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, and Yue Wang. PhysBench: Benchmarking and enhancing vision-language models for physical world understanding.arXiv preprint arXiv:2501.16411, 2025. ICLR 2025

  25. [25]

    Star: A benchmark for situated reasoning in real-world videos.arXiv preprint arXiv:2405.09711, 2024

    Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, and Chuang Gan. STAR: A benchmark for situated reasoning in real-world videos.arXiv preprint arXiv:2405.09711, 2024

  26. [26]

    Tenenbaum, and Chuang Gan

    Zhicheng Zheng, Xin Yan, Zhenfang Chen, Jingzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, and Chuang Gan. ContPhy: Continuum physical concept learning and reasoning from videos. InInternational Conference on Machine Learning. PMLR, 2024

  27. [27]

    Videophy-2: A challenging action-centric physical commonsense evaluation in video generation.arXiv preprint arXiv:2503.06800, 2025

    Hritik Bansal, Clark Peng, Yonatan Bitton, Roman Goldenberg, Aditya Grover, and Kai-Wei Chang. VideoPhy- 2: A challenging action-centric physical commonsense evaluation in video generation.arXiv preprint arXiv:2503.06800, 2025

  28. [28]

    Karlsson, Ziming Wang, Tengtao Song, Qi Zhu, Jun Song, Zhiming Ding, and Bo Zheng

    Xinrun Xu, Pi Bu, Ye Wang, Börje F. Karlsson, Ziming Wang, Tengtao Song, Qi Zhu, Jun Song, Zhiming Ding, and Bo Zheng. DeepPHY: Benchmarking agentic vision-language models on physical reasoning.arXiv preprint arXiv:2508.05405, 2025

  29. [29]

    Tenenbaum

    Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, and Joshua B. Tenenbaum. CLEVRER: Collision events for video representation and reasoning. InInternational Conference on Learning Representations, 2020

  30. [30]

    The Rise and Potential of Large Language Model Based Agents: A Survey

    Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, 39 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT Yicheng Zhao, Wen Yi, Shihan Zhang, Tao Gui, Qi Zhang, and Xuanjing Huang. The...

  31. [31]

    Practices for governing agentic AI systems.arXiv preprint arXiv:2310.11842, 2023

    Yonadav Shavit, Sandhini Agarwal, Miles Brundage, Steven Adler, Cullen O’Keefe, Gillian Hadfield, Noam Kolt, Laura Weidinger, Markus Anderljung, Rumman Chowdhury, Iason Gabriel, Alan Krendl, Tahu Kukutai, Jonas Schuett, Mona Sloane, Bryce Wiernik, and Jack Clark. Practices for governing agentic AI systems.arXiv preprint arXiv:2310.11842, 2023

  32. [32]

    Large language models for robotics: A survey.arXiv preprint arXiv:2408.03543, 2024

    Fanlong Zeng, Weiju Gan, Yongbin Wang, Ning Liu, and Xiaojun Gao. Large language models for robotics: A survey.arXiv preprint arXiv:2408.03543, 2024

  33. [33]

    Generalized grounded temporal reasoning for robot instruction following by combining large pre-trained models

    Riya Arora, Niveditha Narendranath, Aman Tambi, Sandeep S Zachariah, Souvik Chakraborty, and Rohan Paul. Generalized grounded temporal reasoning for robot instruction following by combining large pre-trained models. arXiv preprint arXiv:2410.07494, 2024

  34. [34]

    Phyplan: Generalizable and rapid physical task planning with physics informed skill networks for robot manipulators.arXiv preprint arXiv:2406.00001, 2024

    Mudit Chopra, Abhinav Barnawal, Harshil Vagadia, Tamajit Banerjee, Shreshth Tuli, Souvik Chakraborty, and Rohan Paul. Phyplan: Generalizable and rapid physical task planning with physics informed skill networks for robot manipulators.arXiv preprint arXiv:2406.00001, 2024

  35. [35]

    A Survey on Vision-Language-Action Models for Embodied AI

    Yueen Ma, Shuyu Wang, Botian Liu, Xiao Li, Yukun Chen, Shijie Xu, Haoqiang Xu, Hao Zhu, Yu Qiao, and Yong Wang. A survey on vision-language-action models for embodied AI.arXiv preprint arXiv:2405.14093, 2024

  36. [36]

    A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

    Yue Li, Zhizheng Wang, Yu Xiang, et al. A survey on vision-language-action models: An action tokenization perspective.arXiv preprint arXiv:2507.01925, 2025

  37. [37]

    Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

    Yunhao Kim, Dongyoon Lee, Jaewon Park, et al. Large VLM-based vision-language-action models for robotic manipulation: A survey.arXiv preprint arXiv:2508.13073, 2024

  38. [38]

    Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. PaLM-E: An embodied ...

  39. [39]

    RT-2: Vision-language-action models transfer web knowledge to robotic control

    Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. InConference on Robot Learning, pages 2165–2183. PMLR, 2023

  40. [40]

    RELAP5-3D code manual volume i: Code structure, system models, and solution methods

    RELAP5-3D Code Development Team. RELAP5-3D code manual volume i: Code structure, system models, and solution methods. Technical Report INL/MIS-15-36723 Rev. 4.5, Idaho National Laboratory, 2021

  41. [41]

    Improved generalization with deep neural operators for engineering systems: Path towards digital twin.Engineering Applications of Artificial Intelligence, 131:107844, 2024

    Kazuma Kobayashi, James Daniell, and Syed Bahauddin Alam. Improved generalization with deep neural operators for engineering systems: Path towards digital twin.Engineering Applications of Artificial Intelligence, 131:107844, 2024

  42. [42]

    SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

    Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíˇcek, Agustín Piqueres Lajarín, Vaibhav Srivastav, et al. Smollm2: When smol goes big–data-centric training of a small language model.arXiv preprint arXiv:2502.02737, 2025

  43. [43]

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, et al. Phi-3 technical report: A highly capable language model locally on your phone.arXiv preprint arXiv:2404.14219, 2024

  44. [44]

    Qwen2.5 Technical Report

    Qwen Team. Qwen2.5: A party of foundation models.arXiv preprint arXiv:2412.15115, 2024

  45. [45]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Represen- tations, 2022

  46. [46]

    QLoRA: Efficient finetuning of quantized LLMs

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. QLoRA: Efficient finetuning of quantized LLMs. InAdvances in Neural Information Processing Systems, volume 36, 2024

  47. [47]

    A survey on efficient training of transformers.arXiv preprint arXiv:2302.01107, 2024

    Bowen Zhu, Jing Jiao, and Takashi Hayashi. A survey on efficient training of transformers.arXiv preprint arXiv:2302.01107, 2024

  48. [48]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805, 2019

  49. [49]

    Spithourakis and Sebastian Riedel

    Georgios P. Spithourakis and Sebastian Riedel. Numeracy for language models: Evaluating and improving their ability to predict numbers. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 2104–2115, 2018. 40 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT

  50. [50]

    Do NLP models know numbers? probing numeracy in embeddings

    Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, and Matt Gardner. Do NLP models know numbers? probing numeracy in embeddings. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 5307–5315, 2019

  51. [51]

    Curriculum learning.Proceedings of the 26th International Conference on Machine Learning, pages 41–48, 2009

    Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning.Proceedings of the 26th International Conference on Machine Learning, pages 41–48, 2009

  52. [52]

    Suchin Gururangan, Ana Marasovi´c, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. Don’t stop pretraining: Adapt language models to domains and tasks. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, 2020

  53. [53]

    Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020

  54. [55]

    Jammalamadaka, et al

    Nicholas Baker, Alfredo Alexander-Katz, Sidney Yip, Sauri K. Jammalamadaka, et al. Artificial intelligence for science in quantum, atomistic, and continuum systems.arXiv preprint arXiv:2307.08423, 2023

  55. [56]

    AI for science: Accelerating discovery and prediction.Science, 384:eadm9526, 2024

    Michael Schmidt, Hod Lipson, and Max Tegmark. AI for science: Accelerating discovery and prediction.Science, 384:eadm9526, 2024

  56. [57]

    Bruinsma, Ana Lucic, Megan Stanley, Anna Vaughan, Johannes Brandstetter, Patrick Riechert, Jonathan A

    Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Anna Vaughan, Johannes Brandstetter, Patrick Riechert, Jonathan A. Weyn, Haiyu Dong, Jayesh K. Salinas, Shruti Gupta, Ankur Kumar, Clara Edwards, Freddie Kalaitzis, Daniel Robinson, Ilia Shumailov, Rose Archibald, Matthew Chantry, et al. Aurora: A foundation model of the atmosphere.arXiv prepr...

  57. [58]

    MatterGen: A generative model for inorganic materials design

    Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Ziyan Shysheya, Jonathan Crabbé, Zhilong Yao, Tuan Anh Nguyen, Serina Schulz, Sarah Lewis Edwards, Nicholas Dyer, Carly Fitzsimons, Felix Fischer, Muratahan Aykol, et al. MatterGen: A generative model for inorganic materials design. arXiv preprint arXiv:2312.03687, 2025

  58. [59]

    Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk

    Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624:80–85, 2024

  59. [60]

    MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures

    Han Fu, Zhiao Lin, et al. MatterSim: A deep learning atomistic model across elements, temperatures and pressures. arXiv preprint arXiv:2405.04967, 2024

  60. [61]

    Chemical language models for drug discovery.Nature Machine Intelligence, 6:111–120, 2024

    Francesca Grisoni and Gisbert Schneider. Chemical language models for drug discovery.Nature Machine Intelligence, 6:111–120, 2024

  61. [62]

    GPT-based models for molecular property prediction and drug discovery.Briefings in Bioinformatics, 25(2):bbad518, 2024

    Xiaohui Wang, Xinru Chen, Jinzhe Gao, and Zhiqiang Liu. GPT-based models for molecular property prediction and drug discovery.Briefings in Bioinformatics, 25(2):bbad518, 2024

  62. [63]

    Willard, Xiaowei Jia, Shaoming Xu, Michael S

    Jared D. Willard, Xiaowei Jia, Shaoming Xu, Michael S. Steinbach, and Vipin Kumar. Integrating scientific knowledge with machine learning for engineering and environmental systems.ACM Computing Surveys, 55(4):1– 37, 2024

  63. [64]

    Scientific foundation models.arXiv preprint arXiv:2406.03265, 2024

    Kexin Huang, Tianfan Xiao, Huan Li, and Yang Liu. Scientific foundation models.arXiv preprint arXiv:2406.03265, 2024

  64. [65]

    Artificial intelligence for aerospace engineering.Progress in Aerospace Sciences, 142:100966, 2024

    Aaron Taylor, Indranil Chakraborty, and Jojo Moolayil. Artificial intelligence for aerospace engineering.Progress in Aerospace Sciences, 142:100966, 2024

  65. [66]

    Machine learning for control of cyber-physical systems

    Mohammad Alizadeh, Yisong Wang, and Zhong Liu. Machine learning for control of cyber-physical systems. Annual Reviews in Control, 57:100932, 2024

  66. [67]

    Modeling of reactor kinetics and dynamics

    Matthew Johnson, Scott Lucas, and Pavel Tsvetkov. Modeling of reactor kinetics and dynamics. Technical report, Idaho National Lab.(INL), Idaho Falls, ID (United States), 2010

  67. [68]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021

  68. [69]

    Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

    Yoon Pyo Lee, Joowon Cha, Yonggyun Yu, and Seung Geun Kim. Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

  69. [70]

    M. Xian, T. Wang, S. Zhang, F. Xu, and Z. Ma. A knowledge-informed large language model framework for us nuclear power plant shutdown initiating event classification for probabilistic risk assessment.Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 2024. 41 Agentic Physical AI toward a Domain-Specific Founda...

  70. [71]

    O. H. Kwon et al. Sentiment analysis of the united states public support of nuclear power on social media using large language models.Renewable and Sustainable Energy Reviews, 200:114570, 2024

  71. [72]

    Y . Sun, H. Tsuruta, M. Kumagai, and K. Kurosaki. Japanese online discourse on nuclear energy using youtube- based topic modeling combined with llm sentiment analysis.Journal of Nuclear Science and Technology, 2025

  72. [73]

    Exploring the role of large language models in radiation emergency response.Journal of Radiological Protection, 44(1):011510, 2024

    Anirudh Chandra and Abinash Chakraborty. Exploring the role of large language models in radiation emergency response.Journal of Radiological Protection, 44(1):011510, 2024

  73. [74]

    A. J. Dave, T. N. Nguyen, and R. B. Vilim. Integrating llms for explainable fault diagnosis in complex systems. arXiv preprint arXiv:2402.06695, 2024. 42