Agentic Physical AI toward a Domain-Specific Foundation Model for Nuclear Reactor Control

Jay Yoo; Kazuma Kobayashi; Sajedul Talukder; Samrendra Roy; Seid Koric; Souvik Chakraborty; Syed Bahauddin Alam; Yoon Pyo Lee

arxiv: 2512.23292 · v3 · pith:JDWG4XQWnew · submitted 2025-12-29 · 💻 cs.AI · cs.LG

Agentic Physical AI toward a Domain-Specific Foundation Model for Nuclear Reactor Control

Yoon Pyo Lee , Samrendra Roy , Jay Yoo , Kazuma Kobayashi , Sajedul Talukder , Seid Koric , Souvik Chakraborty , Syed Bahauddin Alam This is my paper

Pith reviewed 2026-05-21 16:53 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords nuclear reactor controlagentic physical AIdomain-specific foundation modelsphysics-based validationsynthetic data scalingclosed-loop reliabilitypolicy distillationactuation strategy

0 comments

The pith

Scaling synthetic nuclear reactor scenarios from 1,000 to 100,000 examples lets a 360-million-parameter model achieve reliable closed-loop control by rejecting most options and locking onto one actuation strategy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that general-purpose AI models fall short for physical control tasks because they imitate perceptual patterns instead of ensuring safe physical outcomes. It proposes a shift to compact language models trained as Agentic Physical AI, where optimization comes from validating executed actions against physics constraints in simulation rather than from pattern matching. Scaling the synthetic dataset for nuclear reactor control produces clear gains in reliability under nominal conditions, including sharp drops in performance variance and an emergent focus on a single effective strategy. This pathway could matter for building trustworthy controllers in safety-critical domains if the simulation results hold up.

Core claim

Training a 360-million-parameter language model on synthetic nuclear reactor control scenarios, with dataset size scaled from 10^3 to 10^5 examples, produces strong gains in closed-loop reliability under nominal simulated conditions. These include variance collapse by a factor of approximately 500 and smooth stabilization at strict tolerances. Despite balanced exposure to four actuation families, the model autonomously rejects roughly 70 percent of the training distribution and concentrates 95 percent of runtime execution on a single-bank strategy. This emergent policy distillation occurs without reinforcement learning or reward engineering and is driven solely by outcome-level success under

What carries the argument

Agentic Physical AI: compact language models whose policy optimization is driven by physics-based validation of executed actions rather than by perceptual inference or imitation.

If this is right

Larger training sets induce variance collapse and stabilize execution behavior within the sampled distribution.
The model develops an emergent preference for one actuation family even when all four are presented equally during training.
Reliability improves steeply but smoothly once dataset size passes a threshold, replacing high-variance tail excursions with consistent performance.
Outcome-level physical validation alone suffices to produce policy distillation without any reinforcement learning step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same scaling pattern appears in other physical domains, domain-specific compact models could become a practical alternative to general foundation models for control tasks.
The autonomous rejection of most training options may reflect an internal filtering mechanism that identifies and avoids physically risky paths based on execution outcomes.
Testing the model on scenarios that deliberately violate the original training distribution could reveal whether the distilled strategy remains robust or collapses outside the sampled regime.

Load-bearing premise

The synthetic nuclear reactor control scenarios used for training and validation accurately reflect the dynamics, constraints, and safety requirements of actual reactors so that success in simulation implies reliable real-world control performance.

What would settle it

Deploy the trained model on a higher-fidelity simulator that injects realistic unmodeled effects such as sensor noise or unexpected reactivity transients and measure whether closed-loop safety violations remain near zero or increase sharply.

Figures

Figures reproduced from arXiv: 2512.23292 by Jay Yoo, Kazuma Kobayashi, Sajedul Talukder, Samrendra Roy, Seid Koric, Souvik Chakraborty, Syed Bahauddin Alam, Yoon Pyo Lee.

**Figure 1.** Figure 1: Integrated framework for Agentic Physical AI in nuclear reactor control. Three interconnected paradigms form a coherent system: (Left) Agentic AI: the compact 360M-parameter model optimizes runtime policy away from balanced training distribution (KL divergence increases with scale: 0.18→0.31 nats), concentrating 76% of actions on single_b2 strategies despite only 30% training frequency, and deploying britt… view at source ↗

**Figure 2.** Figure 2: Experimental workflow for Agentic Physical AI in nuclear reactor control. The pipeline consists of three stages: (Stage 1) Data generation: KOMODO simulator generates synthetic corpora at three scales (1K, 10K, 100K) with balanced actuation families (single-bank, simultaneous, sequential) to prevent trivial dataset bias. (Stage 2) Two-phase curriculum training: SmolLM2-360M backbone undergoes Phase 1 conti… view at source ↗

**Figure 3.** Figure 3: Scaling of validation success and regime robustness with dataset size. a. Validation success rates across tolerance bands show a sharp improvement between 10K and 100K, revealing the emergence of a stable, high-precision control policy, with sub-1% accuracy jumping from 26.2 to 92%. b. Performance stratified by power-change bins shows that the 100K model achieves regime-consistent precision that is absent … view at source ↗

**Figure 4.** Figure 4: Terminal power error distributions across dataset scales. a. CDF curves highlight tail-risk collapse at 100K scale. b. Violin plots demonstrate narrowing uncertainty and emergence of a stable, low-variance control policy. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Benchmarking Agentic AI against classical PID and direct learning baselines. a. Overall success rates (plus or minus 5% tolerance). The Proposed (100K) model achieves 97.4% success, significantly outperforming the naive PID baseline (43.8%) which is limited by single-bank saturation. b. Success rates stratified by power change magnitude. PID performance collapses in the Large regime due to physical limits,… view at source ↗

**Figure 6.** Figure 6: Runtime actuation patterns and success rates across model scales. a. The proportional distribution of actuation patterns in the 1K, 10K, and 100K training datasets, engineered to be balanced. b. The distribution of actuation patterns executed during 2,000 simulator-run evaluations for each model scale. The pronounced divergence between (a) and (b), particularly the twofold enrichment of single-bank strateg… view at source ↗

**Figure 7.** Figure 7: Validation case count and success rate by actuation pattern and model scale. Each panel shows the total number of runtime attempts and the number of successful cases (plus or minus 5% tolerance) for each actuation pattern, revealing how scaling induces strategic specialization. The evolution from uniform low success (1K) to selective excellence (10K) to near-perfect discrimination (100K) demonstrates that … view at source ↗

**Figure 8.** Figure 8: Distribution of severe failures (greater than 10% error) across actuation patterns and dataset scales. Scaling from 1K to 100K induces a collapse of catastrophic outliers, a prerequisite for agent-level reliability in safetycritical domains. on it consistently, rediscovering through data-driven learning the principles of safe reactor control that took human operators decades to learn. However, this safe m… view at source ↗

**Figure 9.** Figure 9: Performance comparison of generalization (PyRK) and architectural extension (Variable Window) models. a. Parsing success rates reveal that the Adapter Only approach fails to maintain valid syntax (48%), whereas PyRK and Extended models achieve 100%. b. Validation success rates across tolerance bands show PyRK’s superior precision, while the Extended model trades strict accuracy for flexible time-window han… view at source ↗

read the original abstract

The prevailing paradigm in AI for physical systems (scaling general-purpose foundation models toward universal multimodal reasoning) confronts a fundamental barrier at the control interface. Recent benchmarks show that even frontier vision--language models achieve only 50--53% accuracy on basic quantitative physics tasks, behaving as approximate guessers that preserve semantic plausibility by violating physical constraints. This input unfaithfulness is not a scaling deficiency but a structural limitation: perception-centric architectures optimize parameter-space imitation, whereas safety-critical control demands outcome-space guarantees over executed actions. Here, we present a fundamentally different pathway "toward" domain-specific foundation models by introducing compact language models operating as Agentic Physical AI, in which policy optimization is driven by physics-based validation rather than perceptual inference. We train a 360-million-parameter model on synthetic nuclear reactor control scenarios, scaling the dataset from 10^3 to 10^5 examples. Scaling induces strong improvements in closed-loop reliability under nominal simulated conditions, with a steep but smooth gain at strict tolerances: small-scale systems exhibit high-variance imitation with severe tail excursions, while large-scale models undergo variance collapse (approximately 500times reduction), stabilizing execution-level behavior within the sampled distribution. Despite balanced exposure to four actuation families, the model autonomously rejects approximately 70\% of the training distribution, concentrating 95% of runtime execution on a single-bank strategy. This emergent policy distillation arises without reinforcement learning or reward engineering, driven solely by outcome-level success under physical execution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Scaling synthetic nuclear control data produces sharp variance collapse and policy concentration in simulation, but the results stay untested against real reactor dynamics.

read the letter

The main thing here is that scaling the dataset from 10^3 to 10^5 synthetic examples for a 360-million-parameter model drives a roughly 500-fold variance reduction, makes the model reject about 70% of the training distribution, and concentrates 95% of execution on a single-bank actuation strategy. This emerges from physics-based outcome validation alone, without reinforcement learning or explicit rewards, and the small-to-large scale contrast in tail behavior is reported clearly enough to notice.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Agentic Physical AI as compact language models for domain-specific foundation models in nuclear reactor control. A 360-million-parameter model is trained on synthetic scenarios with dataset scaling from 10^3 to 10^5 examples. Scaling is claimed to produce strong gains in closed-loop reliability under nominal simulated conditions, including ~500x variance reduction, stabilization within the distribution, and emergent policy distillation (autonomous rejection of ~70% of the training distribution with 95% concentration on a single-bank strategy) driven solely by physics-based outcome validation rather than RL or perceptual imitation.

Significance. If the simulation accurately captures reactor dynamics and the scaling results generalize, the approach could provide a useful alternative to general foundation models for safety-critical physical control by prioritizing outcome-space guarantees. The reported variance collapse and autonomous policy concentration without reward engineering represent potentially interesting empirical observations in AI for physical systems.

major comments (2)

[Abstract and Results] Abstract and Results: The quantitative claims of approximately 500 times variance reduction, 70% distribution rejection, and 95% concentration on single-bank actuation are presented without accompanying methods, statistical baselines, error analysis, ablation studies, or verification that these effects arise from the proposed physics-validation mechanism rather than simulator-specific artifacts or data properties.
[Experimental setup and validation sections] Experimental setup and validation sections: All closed-loop reliability, variance reduction, and policy distillation results are obtained exclusively inside a nominal synthetic simulator. No cross-validation against real plant data, no injection of model mismatch or unmodeled effects, and no off-nominal test regimes are reported. This is load-bearing for the central claim that dataset scaling produces reliable agentic physical control.

minor comments (2)

[Abstract] Abstract: '500times' is a typographical error and should read '500 times'.
[Introduction] The term 'Agentic Physical AI' is introduced without a precise formal definition or comparison to related concepts in control theory or agentic systems.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments. We address each major point below with clarifications and indicate where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results: The quantitative claims of approximately 500 times variance reduction, 70% distribution rejection, and 95% concentration on single-bank actuation are presented without accompanying methods, statistical baselines, error analysis, ablation studies, or verification that these effects arise from the proposed physics-validation mechanism rather than simulator-specific artifacts or data properties.

Authors: The variance reduction is computed as the ratio of standard deviations in closed-loop setpoint tracking error between the 10^3 and 10^5 scale models across 1000 held-out episodes, with distribution rejection and concentration measured by the fraction of runtime actions falling outside the original training support and the share allocated to the dominant single-bank policy. We will add an appendix containing statistical baselines (random policy and non-scaled model), bootstrap error estimates, and an ablation that removes the physics-outcome filter to isolate its contribution versus data or simulator properties. revision: yes
Referee: [Experimental setup and validation sections] Experimental setup and validation sections: All closed-loop reliability, variance reduction, and policy distillation results are obtained exclusively inside a nominal synthetic simulator. No cross-validation against real plant data, no injection of model mismatch or unmodeled effects, and no off-nominal test regimes are reported. This is load-bearing for the central claim that dataset scaling produces reliable agentic physical control.

Authors: We agree the results are limited to nominal synthetic conditions and have revised the Discussion to state this restriction explicitly. The synthetic simulator was chosen to enable controlled scaling experiments that isolate physics-validation effects. Real-plant cross-validation is not feasible at present owing to regulatory and proprietary barriers on operational nuclear data; we therefore cannot supply mismatch or off-nominal results in the current revision. revision: partial

standing simulated objections not resolved

Cross-validation against real nuclear plant data or injection of unmodeled dynamics cannot be performed in this study due to access, safety, and regulatory constraints.

Circularity Check

0 steps flagged

No circularity: empirical scaling observations in simulation are self-contained

full rationale

The manuscript reports an empirical scaling experiment: a 360M-parameter model is trained on synthetic nuclear reactor control scenarios with dataset size increased from 10^3 to 10^5 examples, after which closed-loop reliability metrics (variance reduction, policy concentration) are measured inside the same nominal simulator. No equations, derivations, or first-principles predictions appear in the provided text. The observed improvements are direct experimental outcomes rather than quantities fitted to a subset and then relabeled as predictions. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked to justify the central claims. Because the results consist of measured performance deltas under stated simulation conditions, they do not reduce to their inputs by construction and remain independent of the circularity patterns enumerated in the guidelines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the assumption that synthetic scenarios provide faithful physics validation and that observed simulation success will transfer; no explicit free parameters or invented entities beyond the high-level paradigm are detailed in the abstract.

axioms (1)

domain assumption Synthetic nuclear reactor scenarios sufficiently capture real dynamics and safety constraints for policy transfer
All training and evaluation occur in simulation with success defined by physical execution outcomes.

invented entities (1)

Agentic Physical AI no independent evidence
purpose: Compact language models whose policy optimization is driven by physics-based validation instead of perceptual inference
New term and pathway introduced to contrast with general foundation model scaling.

pith-pipeline@v0.9.0 · 5823 in / 1293 out tokens · 54411 ms · 2026-05-21T16:53:10.829055+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 10 internal anchors

[1]

Imron, Zuhair, P

M. Imron, Zuhair, P. M. Udiyani, and T. M. Sembiring. Development of open reactor simulator KOMODO based on open-source platform.Journal of Physics: Conference Series, 1198(2):022049, 2019

work page 2019
[2]

Kathryn D. Huff. PyRK: A python package for nuclear reactor kinetics, 2020. Available at https://github. com/pyrk/pyrk

work page 2020
[3]

Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems.Scientific Reports, 14(1):2101, 2024

Kazuma Kobayashi and Syed Bahauddin Alam. Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems.Scientific Reports, 14(1):2101, 2024. Published in Nature Portfolio

work page 2024
[4]

Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators.npj Materials Degradation, 9(1):13, 2025

Raisa Bentay Hossain, Farid Ahmed, Kazuma Kobayashi, Seid Koric, Diab Abueidda, and Syed Bahauddin Alam. Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators.npj Materials Degradation, 9(1):13, 2025. 1,400× speedup over CFD for thermal-hydraulic predictions

work page 2025
[5]

A model predictive controller for the core power control system of a lead-cooled fast reactor.Frontiers in Energy Research, 10:893528, 2022

Yuxiang Hu, Li Liang, Li Chen, and Wen Zeng. A model predictive controller for the core power control system of a lead-cooled fast reactor.Frontiers in Energy Research, 10:893528, 2022

work page 2022
[6]

Model predictive power control of a heat pipe cooled reactor.Frontiers in Energy Research, 10:984007, 2023

Jiasheng Huang, Pu Sun, and Shujie Pu. Model predictive power control of a heat pipe cooled reactor.Frontiers in Energy Research, 10:984007, 2023

work page 2023
[7]

K. M. Mostafa et al. Improved intelligent model predictive controller for the nuclear power reactor system. Kerntechnik, 89:764–773, 2024

work page 2024
[8]

G. Lee, S. J. Lee, and C. Lee. A convolutional neural network model for abnormality diagnosis in a nuclear power plant.Applied Soft Computing, 99:106874, 2021

work page 2021
[9]

Y . H. Chae, C. Lee, S. M. Han, and P. H. Seong. Graph neural network based multiple accident diagnosis in nuclear power plants: Data optimization to represent the system configuration.Nuclear Engineering and Technology, 54:2859–2870, 2022

work page 2022
[10]

Kazuma Kobayashi and Syed Bahauddin Alam. Explainable, interpretable, and trustworthy ai for an intelligent digital twin: A case study on remaining useful life.Engineering Applications of Artificial Intelligence, 129:107620, 2024. 38 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT

work page 2024
[11]

Sensor degradation in nuclear reactor pressure vessels: The overlooked factor in remaining useful life prediction.npj Materials Degradation, 8(1):71, 2024

Raisa Bentay Hossain, Kazuma Kobayashi, and Syed Bahauddin Alam. Sensor degradation in nuclear reactor pressure vessels: The overlooked factor in remaining useful life prediction.npj Materials Degradation, 8(1):71, 2024

work page 2024
[12]

Al Rashdan et al

A. Al Rashdan et al. Scalable methods to automate manual work management activities using artificial intelligence. Nuclear Engineering and Technology, 2024

work page 2024
[13]

Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

Yoonpyo Lee, Jemin Cha, Yong Yu, and Seung Gyu Kim. Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

work page 2025
[14]

Executive order on genesis mission for AI-powered scientific discovery

The White House. Executive order on genesis mission for AI-powered scientific discovery. Presidential Executive Order, November 2025. Available athttps://www.whitehouse.gov

work page 2025
[15]

Genesis mission fact sheet: Accelerating scientific breakthroughs through AI and foundation models, November 2025

The White House Office of Science and Technology Policy. Genesis mission fact sheet: Accelerating scientific breakthroughs through AI and foundation models, November 2025. Available at https://www.whitehouse. gov/ostp

work page 2025
[16]

Crawford, Syed Bahauddin Alam, Marta D’Elia, Krishna Garikipati, Shirley Ho, Scott H

Dona L. Crawford, Syed Bahauddin Alam, Marta D’Elia, Krishna Garikipati, Shirley Ho, Scott H. Holan, Michael Kearns, Petros Koumoutsakos, Brian Kulis, Daniel I. Meiron, and Nathaniel Trask.Foundation Models for Scientific Discovery and Innovation: Opportunities Across the Department of Energy and the Scientific Enterprise

work page
[17]

Possibilities of reinforcement learning for nuclear power plants: evidence on current applications and beyond.Nuclear Engineering and Technology, 56:1959–1974, 2024

Aobo Gong, Yu Chen, Jia Zhang, and Xiaoyu Li. Possibilities of reinforcement learning for nuclear power plants: evidence on current applications and beyond.Nuclear Engineering and Technology, 56:1959–1974, 2024

work page 1959
[18]

Radaideh

Luke Tunkle, Karrar Abdulraheem, Lixiang Lin, and Majid I. Radaideh. Nuclear microreactor transient and load-following control with deep reinforcement learning.Energy Conversion and Management: X, page 101090, 2025

work page 2025
[19]

Radaideh et al

Majid I. Radaideh et al. Multistep criticality search and power shaping in nuclear microreactors with deep reinforcement learning.Nuclear Science and Engineering, pages 1–13, 2025

work page 2025
[20]

Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602:414–419, 2022

Jonas Degrave et al. Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602:414–419, 2022

work page 2022
[21]

A neural network predictive control method for power control of small pressurized water reactors.Annals of nuclear energy, 169:108946, 2022

Kai Xiao, Qiaofeng Wu, Jie Chen, Xiaofei Pu, Ying Zhang, and Pengcheng Yang. A neural network predictive control method for power control of small pressurized water reactors.Annals of nuclear energy, 169:108946, 2022

work page 2022
[22]

Ying Yin, Zhijun Yuan, Bo Pang, Yu Xiao, and Yong Deng. Design and assessment of a core-power controller for a lithium-cooled space nuclear reactor based on the concept of fuzzy model predictive control.Frontiers in Energy Research, 10:1067892, 2023

work page 2023
[23]

QuantiPhy: A quantitative benchmark evaluating physical reasoning abilities of vision-language models.arXiv preprint arXiv:2512.19526, 2024

Puyin Li, Tiange Xiang, Ella Mao, Shirley Wei, Xinye Chen, Adnan Masood, Fei-Fei Li, and Ehsan Adeli. QuantiPhy: A quantitative benchmark evaluating physical reasoning abilities of vision-language models.arXiv preprint arXiv:2512.19526, 2024

work page arXiv 2024
[24]

PhysBench: Benchmarking and enhancing vision-language models for physical world understanding.arXiv preprint arXiv:2501.16411, 2025

Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, and Yue Wang. PhysBench: Benchmarking and enhancing vision-language models for physical world understanding.arXiv preprint arXiv:2501.16411, 2025. ICLR 2025

work page arXiv 2025
[25]

Star: A benchmark for situated reasoning in real-world videos.arXiv preprint arXiv:2405.09711, 2024

Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, and Chuang Gan. STAR: A benchmark for situated reasoning in real-world videos.arXiv preprint arXiv:2405.09711, 2024

work page arXiv 2024
[26]

Tenenbaum, and Chuang Gan

Zhicheng Zheng, Xin Yan, Zhenfang Chen, Jingzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, and Chuang Gan. ContPhy: Continuum physical concept learning and reasoning from videos. InInternational Conference on Machine Learning. PMLR, 2024

work page 2024
[27]

Videophy-2: A challenging action-centric physical commonsense evaluation in video generation.arXiv preprint arXiv:2503.06800, 2025

Hritik Bansal, Clark Peng, Yonatan Bitton, Roman Goldenberg, Aditya Grover, and Kai-Wei Chang. VideoPhy- 2: A challenging action-centric physical commonsense evaluation in video generation.arXiv preprint arXiv:2503.06800, 2025

work page arXiv 2025
[28]

Karlsson, Ziming Wang, Tengtao Song, Qi Zhu, Jun Song, Zhiming Ding, and Bo Zheng

Xinrun Xu, Pi Bu, Ye Wang, Börje F. Karlsson, Ziming Wang, Tengtao Song, Qi Zhu, Jun Song, Zhiming Ding, and Bo Zheng. DeepPHY: Benchmarking agentic vision-language models on physical reasoning.arXiv preprint arXiv:2508.05405, 2025

work page arXiv 2025
[29]

Tenenbaum

Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, and Joshua B. Tenenbaum. CLEVRER: Collision events for video representation and reasoning. InInternational Conference on Learning Representations, 2020

work page 2020
[30]

The Rise and Potential of Large Language Model Based Agents: A Survey

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, 39 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT Yicheng Zhao, Wen Yi, Shihan Zhang, Tao Gui, Qi Zhang, and Xuanjing Huang. The...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[31]

Practices for governing agentic AI systems.arXiv preprint arXiv:2310.11842, 2023

Yonadav Shavit, Sandhini Agarwal, Miles Brundage, Steven Adler, Cullen O’Keefe, Gillian Hadfield, Noam Kolt, Laura Weidinger, Markus Anderljung, Rumman Chowdhury, Iason Gabriel, Alan Krendl, Tahu Kukutai, Jonas Schuett, Mona Sloane, Bryce Wiernik, and Jack Clark. Practices for governing agentic AI systems.arXiv preprint arXiv:2310.11842, 2023

work page arXiv 2023
[32]

Large language models for robotics: A survey.arXiv preprint arXiv:2408.03543, 2024

Fanlong Zeng, Weiju Gan, Yongbin Wang, Ning Liu, and Xiaojun Gao. Large language models for robotics: A survey.arXiv preprint arXiv:2408.03543, 2024

work page arXiv 2024
[33]

Generalized grounded temporal reasoning for robot instruction following by combining large pre-trained models

Riya Arora, Niveditha Narendranath, Aman Tambi, Sandeep S Zachariah, Souvik Chakraborty, and Rohan Paul. Generalized grounded temporal reasoning for robot instruction following by combining large pre-trained models. arXiv preprint arXiv:2410.07494, 2024

work page arXiv 2024
[34]

Phyplan: Generalizable and rapid physical task planning with physics informed skill networks for robot manipulators.arXiv preprint arXiv:2406.00001, 2024

Mudit Chopra, Abhinav Barnawal, Harshil Vagadia, Tamajit Banerjee, Shreshth Tuli, Souvik Chakraborty, and Rohan Paul. Phyplan: Generalizable and rapid physical task planning with physics informed skill networks for robot manipulators.arXiv preprint arXiv:2406.00001, 2024

work page arXiv 2024
[35]

A Survey on Vision-Language-Action Models for Embodied AI

Yueen Ma, Shuyu Wang, Botian Liu, Xiao Li, Yukun Chen, Shijie Xu, Haoqiang Xu, Hao Zhu, Yu Qiao, and Yong Wang. A survey on vision-language-action models for embodied AI.arXiv preprint arXiv:2405.14093, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Yue Li, Zhizheng Wang, Yu Xiang, et al. A survey on vision-language-action models: An action tokenization perspective.arXiv preprint arXiv:2507.01925, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

Yunhao Kim, Dongyoon Lee, Jaewon Park, et al. Large VLM-based vision-language-action models for robotic manipulation: A survey.arXiv preprint arXiv:2508.13073, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[38]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. PaLM-E: An embodied ...

work page 2023
[39]

RT-2: Vision-language-action models transfer web knowledge to robotic control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. InConference on Robot Learning, pages 2165–2183. PMLR, 2023

work page 2023
[40]

RELAP5-3D code manual volume i: Code structure, system models, and solution methods

RELAP5-3D Code Development Team. RELAP5-3D code manual volume i: Code structure, system models, and solution methods. Technical Report INL/MIS-15-36723 Rev. 4.5, Idaho National Laboratory, 2021

work page 2021
[41]

Improved generalization with deep neural operators for engineering systems: Path towards digital twin.Engineering Applications of Artificial Intelligence, 131:107844, 2024

Kazuma Kobayashi, James Daniell, and Syed Bahauddin Alam. Improved generalization with deep neural operators for engineering systems: Path towards digital twin.Engineering Applications of Artificial Intelligence, 131:107844, 2024

work page 2024
[42]

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíˇcek, Agustín Piqueres Lajarín, Vaibhav Srivastav, et al. Smollm2: When smol goes big–data-centric training of a small language model.arXiv preprint arXiv:2502.02737, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, et al. Phi-3 technical report: A highly capable language model locally on your phone.arXiv preprint arXiv:2404.14219, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[44]

Qwen2.5 Technical Report

Qwen Team. Qwen2.5: A party of foundation models.arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Represen- tations, 2022

work page 2022
[46]

QLoRA: Efficient finetuning of quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. QLoRA: Efficient finetuning of quantized LLMs. InAdvances in Neural Information Processing Systems, volume 36, 2024

work page 2024
[47]

A survey on efficient training of transformers.arXiv preprint arXiv:2302.01107, 2024

Bowen Zhu, Jing Jiao, and Takashi Hayashi. A survey on efficient training of transformers.arXiv preprint arXiv:2302.01107, 2024

work page arXiv 2024
[48]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2019
[49]

Spithourakis and Sebastian Riedel

Georgios P. Spithourakis and Sebastian Riedel. Numeracy for language models: Evaluating and improving their ability to predict numbers. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 2104–2115, 2018. 40 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT

work page 2018
[50]

Do NLP models know numbers? probing numeracy in embeddings

Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, and Matt Gardner. Do NLP models know numbers? probing numeracy in embeddings. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 5307–5315, 2019

work page 2019
[51]

Curriculum learning.Proceedings of the 26th International Conference on Machine Learning, pages 41–48, 2009

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning.Proceedings of the 26th International Conference on Machine Learning, pages 41–48, 2009

work page 2009
[52]

Suchin Gururangan, Ana Marasovi´c, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. Don’t stop pretraining: Adapt language models to domains and tasks. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, 2020

work page 2020
[53]

Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020

work page 1901
[55]

Jammalamadaka, et al

Nicholas Baker, Alfredo Alexander-Katz, Sidney Yip, Sauri K. Jammalamadaka, et al. Artificial intelligence for science in quantum, atomistic, and continuum systems.arXiv preprint arXiv:2307.08423, 2023

work page arXiv 2023
[56]

AI for science: Accelerating discovery and prediction.Science, 384:eadm9526, 2024

Michael Schmidt, Hod Lipson, and Max Tegmark. AI for science: Accelerating discovery and prediction.Science, 384:eadm9526, 2024

work page 2024
[57]

Bruinsma, Ana Lucic, Megan Stanley, Anna Vaughan, Johannes Brandstetter, Patrick Riechert, Jonathan A

Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Anna Vaughan, Johannes Brandstetter, Patrick Riechert, Jonathan A. Weyn, Haiyu Dong, Jayesh K. Salinas, Shruti Gupta, Ankur Kumar, Clara Edwards, Freddie Kalaitzis, Daniel Robinson, Ilia Shumailov, Rose Archibald, Matthew Chantry, et al. Aurora: A foundation model of the atmosphere.arXiv prepr...

work page arXiv 2024
[58]

MatterGen: A generative model for inorganic materials design

Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Ziyan Shysheya, Jonathan Crabbé, Zhilong Yao, Tuan Anh Nguyen, Serina Schulz, Sarah Lewis Edwards, Nicholas Dyer, Carly Fitzsimons, Felix Fischer, Muratahan Aykol, et al. MatterGen: A generative model for inorganic materials design. arXiv preprint arXiv:2312.03687, 2025

work page arXiv 2025
[59]

Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk

Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624:80–85, 2024

work page 2024
[60]

MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures

Han Fu, Zhiao Lin, et al. MatterSim: A deep learning atomistic model across elements, temperatures and pressures. arXiv preprint arXiv:2405.04967, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[61]

Chemical language models for drug discovery.Nature Machine Intelligence, 6:111–120, 2024

Francesca Grisoni and Gisbert Schneider. Chemical language models for drug discovery.Nature Machine Intelligence, 6:111–120, 2024

work page 2024
[62]

GPT-based models for molecular property prediction and drug discovery.Briefings in Bioinformatics, 25(2):bbad518, 2024

Xiaohui Wang, Xinru Chen, Jinzhe Gao, and Zhiqiang Liu. GPT-based models for molecular property prediction and drug discovery.Briefings in Bioinformatics, 25(2):bbad518, 2024

work page 2024
[63]

Willard, Xiaowei Jia, Shaoming Xu, Michael S

Jared D. Willard, Xiaowei Jia, Shaoming Xu, Michael S. Steinbach, and Vipin Kumar. Integrating scientific knowledge with machine learning for engineering and environmental systems.ACM Computing Surveys, 55(4):1– 37, 2024

work page 2024
[64]

Scientific foundation models.arXiv preprint arXiv:2406.03265, 2024

Kexin Huang, Tianfan Xiao, Huan Li, and Yang Liu. Scientific foundation models.arXiv preprint arXiv:2406.03265, 2024

work page arXiv 2024
[65]

Artificial intelligence for aerospace engineering.Progress in Aerospace Sciences, 142:100966, 2024

Aaron Taylor, Indranil Chakraborty, and Jojo Moolayil. Artificial intelligence for aerospace engineering.Progress in Aerospace Sciences, 142:100966, 2024

work page 2024
[66]

Machine learning for control of cyber-physical systems

Mohammad Alizadeh, Yisong Wang, and Zhong Liu. Machine learning for control of cyber-physical systems. Annual Reviews in Control, 57:100932, 2024

work page 2024
[67]

Modeling of reactor kinetics and dynamics

Matthew Johnson, Scott Lucas, and Pavel Tsvetkov. Modeling of reactor kinetics and dynamics. Technical report, Idaho National Lab.(INL), Idaho Falls, ID (United States), 2010

work page 2010
[68]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[69]

Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

Yoon Pyo Lee, Joowon Cha, Yonggyun Yu, and Seung Geun Kim. Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

work page 2025
[70]

M. Xian, T. Wang, S. Zhang, F. Xu, and Z. Ma. A knowledge-informed large language model framework for us nuclear power plant shutdown initiating event classification for probabilistic risk assessment.Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 2024. 41 Agentic Physical AI toward a Domain-Specific Founda...

work page 2024
[71]

O. H. Kwon et al. Sentiment analysis of the united states public support of nuclear power on social media using large language models.Renewable and Sustainable Energy Reviews, 200:114570, 2024

work page 2024
[72]

Y . Sun, H. Tsuruta, M. Kumagai, and K. Kurosaki. Japanese online discourse on nuclear energy using youtube- based topic modeling combined with llm sentiment analysis.Journal of Nuclear Science and Technology, 2025

work page 2025
[73]

Exploring the role of large language models in radiation emergency response.Journal of Radiological Protection, 44(1):011510, 2024

Anirudh Chandra and Abinash Chakraborty. Exploring the role of large language models in radiation emergency response.Journal of Radiological Protection, 44(1):011510, 2024

work page 2024
[74]

A. J. Dave, T. N. Nguyen, and R. B. Vilim. Integrating llms for explainable fault diagnosis in complex systems. arXiv preprint arXiv:2402.06695, 2024. 42

work page arXiv 2024

[1] [1]

Imron, Zuhair, P

M. Imron, Zuhair, P. M. Udiyani, and T. M. Sembiring. Development of open reactor simulator KOMODO based on open-source platform.Journal of Physics: Conference Series, 1198(2):022049, 2019

work page 2019

[2] [2]

Kathryn D. Huff. PyRK: A python package for nuclear reactor kinetics, 2020. Available at https://github. com/pyrk/pyrk

work page 2020

[3] [3]

Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems.Scientific Reports, 14(1):2101, 2024

Kazuma Kobayashi and Syed Bahauddin Alam. Deep neural operator-driven real-time inference to enable digital twin solutions for nuclear energy systems.Scientific Reports, 14(1):2101, 2024. Published in Nature Portfolio

work page 2024

[4] [4]

Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators.npj Materials Degradation, 9(1):13, 2025

Raisa Bentay Hossain, Farid Ahmed, Kazuma Kobayashi, Seid Koric, Diab Abueidda, and Syed Bahauddin Alam. Virtual sensing-enabled digital twin framework for real-time monitoring of nuclear systems leveraging deep neural operators.npj Materials Degradation, 9(1):13, 2025. 1,400× speedup over CFD for thermal-hydraulic predictions

work page 2025

[5] [5]

A model predictive controller for the core power control system of a lead-cooled fast reactor.Frontiers in Energy Research, 10:893528, 2022

Yuxiang Hu, Li Liang, Li Chen, and Wen Zeng. A model predictive controller for the core power control system of a lead-cooled fast reactor.Frontiers in Energy Research, 10:893528, 2022

work page 2022

[6] [6]

Model predictive power control of a heat pipe cooled reactor.Frontiers in Energy Research, 10:984007, 2023

Jiasheng Huang, Pu Sun, and Shujie Pu. Model predictive power control of a heat pipe cooled reactor.Frontiers in Energy Research, 10:984007, 2023

work page 2023

[7] [7]

K. M. Mostafa et al. Improved intelligent model predictive controller for the nuclear power reactor system. Kerntechnik, 89:764–773, 2024

work page 2024

[8] [8]

G. Lee, S. J. Lee, and C. Lee. A convolutional neural network model for abnormality diagnosis in a nuclear power plant.Applied Soft Computing, 99:106874, 2021

work page 2021

[9] [9]

Y . H. Chae, C. Lee, S. M. Han, and P. H. Seong. Graph neural network based multiple accident diagnosis in nuclear power plants: Data optimization to represent the system configuration.Nuclear Engineering and Technology, 54:2859–2870, 2022

work page 2022

[10] [10]

Kazuma Kobayashi and Syed Bahauddin Alam. Explainable, interpretable, and trustworthy ai for an intelligent digital twin: A case study on remaining useful life.Engineering Applications of Artificial Intelligence, 129:107620, 2024. 38 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT

work page 2024

[11] [11]

Sensor degradation in nuclear reactor pressure vessels: The overlooked factor in remaining useful life prediction.npj Materials Degradation, 8(1):71, 2024

Raisa Bentay Hossain, Kazuma Kobayashi, and Syed Bahauddin Alam. Sensor degradation in nuclear reactor pressure vessels: The overlooked factor in remaining useful life prediction.npj Materials Degradation, 8(1):71, 2024

work page 2024

[12] [12]

Al Rashdan et al

A. Al Rashdan et al. Scalable methods to automate manual work management activities using artificial intelligence. Nuclear Engineering and Technology, 2024

work page 2024

[13] [13]

Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

Yoonpyo Lee, Jemin Cha, Yong Yu, and Seung Gyu Kim. Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

work page 2025

[14] [14]

Executive order on genesis mission for AI-powered scientific discovery

The White House. Executive order on genesis mission for AI-powered scientific discovery. Presidential Executive Order, November 2025. Available athttps://www.whitehouse.gov

work page 2025

[15] [15]

Genesis mission fact sheet: Accelerating scientific breakthroughs through AI and foundation models, November 2025

The White House Office of Science and Technology Policy. Genesis mission fact sheet: Accelerating scientific breakthroughs through AI and foundation models, November 2025. Available at https://www.whitehouse. gov/ostp

work page 2025

[16] [16]

Crawford, Syed Bahauddin Alam, Marta D’Elia, Krishna Garikipati, Shirley Ho, Scott H

Dona L. Crawford, Syed Bahauddin Alam, Marta D’Elia, Krishna Garikipati, Shirley Ho, Scott H. Holan, Michael Kearns, Petros Koumoutsakos, Brian Kulis, Daniel I. Meiron, and Nathaniel Trask.Foundation Models for Scientific Discovery and Innovation: Opportunities Across the Department of Energy and the Scientific Enterprise

work page

[17] [17]

Possibilities of reinforcement learning for nuclear power plants: evidence on current applications and beyond.Nuclear Engineering and Technology, 56:1959–1974, 2024

Aobo Gong, Yu Chen, Jia Zhang, and Xiaoyu Li. Possibilities of reinforcement learning for nuclear power plants: evidence on current applications and beyond.Nuclear Engineering and Technology, 56:1959–1974, 2024

work page 1959

[18] [18]

Radaideh

Luke Tunkle, Karrar Abdulraheem, Lixiang Lin, and Majid I. Radaideh. Nuclear microreactor transient and load-following control with deep reinforcement learning.Energy Conversion and Management: X, page 101090, 2025

work page 2025

[19] [19]

Radaideh et al

Majid I. Radaideh et al. Multistep criticality search and power shaping in nuclear microreactors with deep reinforcement learning.Nuclear Science and Engineering, pages 1–13, 2025

work page 2025

[20] [20]

Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602:414–419, 2022

Jonas Degrave et al. Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602:414–419, 2022

work page 2022

[21] [21]

A neural network predictive control method for power control of small pressurized water reactors.Annals of nuclear energy, 169:108946, 2022

Kai Xiao, Qiaofeng Wu, Jie Chen, Xiaofei Pu, Ying Zhang, and Pengcheng Yang. A neural network predictive control method for power control of small pressurized water reactors.Annals of nuclear energy, 169:108946, 2022

work page 2022

[22] [22]

Ying Yin, Zhijun Yuan, Bo Pang, Yu Xiao, and Yong Deng. Design and assessment of a core-power controller for a lithium-cooled space nuclear reactor based on the concept of fuzzy model predictive control.Frontiers in Energy Research, 10:1067892, 2023

work page 2023

[23] [23]

QuantiPhy: A quantitative benchmark evaluating physical reasoning abilities of vision-language models.arXiv preprint arXiv:2512.19526, 2024

Puyin Li, Tiange Xiang, Ella Mao, Shirley Wei, Xinye Chen, Adnan Masood, Fei-Fei Li, and Ehsan Adeli. QuantiPhy: A quantitative benchmark evaluating physical reasoning abilities of vision-language models.arXiv preprint arXiv:2512.19526, 2024

work page arXiv 2024

[24] [24]

PhysBench: Benchmarking and enhancing vision-language models for physical world understanding.arXiv preprint arXiv:2501.16411, 2025

Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, and Yue Wang. PhysBench: Benchmarking and enhancing vision-language models for physical world understanding.arXiv preprint arXiv:2501.16411, 2025. ICLR 2025

work page arXiv 2025

[25] [25]

Star: A benchmark for situated reasoning in real-world videos.arXiv preprint arXiv:2405.09711, 2024

Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B. Tenenbaum, and Chuang Gan. STAR: A benchmark for situated reasoning in real-world videos.arXiv preprint arXiv:2405.09711, 2024

work page arXiv 2024

[26] [26]

Tenenbaum, and Chuang Gan

Zhicheng Zheng, Xin Yan, Zhenfang Chen, Jingzhou Wang, Qin Zhi Eddie Lim, Joshua B. Tenenbaum, and Chuang Gan. ContPhy: Continuum physical concept learning and reasoning from videos. InInternational Conference on Machine Learning. PMLR, 2024

work page 2024

[27] [27]

Videophy-2: A challenging action-centric physical commonsense evaluation in video generation.arXiv preprint arXiv:2503.06800, 2025

Hritik Bansal, Clark Peng, Yonatan Bitton, Roman Goldenberg, Aditya Grover, and Kai-Wei Chang. VideoPhy- 2: A challenging action-centric physical commonsense evaluation in video generation.arXiv preprint arXiv:2503.06800, 2025

work page arXiv 2025

[28] [28]

Karlsson, Ziming Wang, Tengtao Song, Qi Zhu, Jun Song, Zhiming Ding, and Bo Zheng

Xinrun Xu, Pi Bu, Ye Wang, Börje F. Karlsson, Ziming Wang, Tengtao Song, Qi Zhu, Jun Song, Zhiming Ding, and Bo Zheng. DeepPHY: Benchmarking agentic vision-language models on physical reasoning.arXiv preprint arXiv:2508.05405, 2025

work page arXiv 2025

[29] [29]

Tenenbaum

Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, and Joshua B. Tenenbaum. CLEVRER: Collision events for video representation and reasoning. InInternational Conference on Learning Representations, 2020

work page 2020

[30] [30]

The Rise and Potential of Large Language Model Based Agents: A Survey

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, 39 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT Yicheng Zhao, Wen Yi, Shihan Zhang, Tao Gui, Qi Zhang, and Xuanjing Huang. The...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[31] [31]

Practices for governing agentic AI systems.arXiv preprint arXiv:2310.11842, 2023

Yonadav Shavit, Sandhini Agarwal, Miles Brundage, Steven Adler, Cullen O’Keefe, Gillian Hadfield, Noam Kolt, Laura Weidinger, Markus Anderljung, Rumman Chowdhury, Iason Gabriel, Alan Krendl, Tahu Kukutai, Jonas Schuett, Mona Sloane, Bryce Wiernik, and Jack Clark. Practices for governing agentic AI systems.arXiv preprint arXiv:2310.11842, 2023

work page arXiv 2023

[32] [32]

Large language models for robotics: A survey.arXiv preprint arXiv:2408.03543, 2024

Fanlong Zeng, Weiju Gan, Yongbin Wang, Ning Liu, and Xiaojun Gao. Large language models for robotics: A survey.arXiv preprint arXiv:2408.03543, 2024

work page arXiv 2024

[33] [33]

Generalized grounded temporal reasoning for robot instruction following by combining large pre-trained models

Riya Arora, Niveditha Narendranath, Aman Tambi, Sandeep S Zachariah, Souvik Chakraborty, and Rohan Paul. Generalized grounded temporal reasoning for robot instruction following by combining large pre-trained models. arXiv preprint arXiv:2410.07494, 2024

work page arXiv 2024

[34] [34]

Phyplan: Generalizable and rapid physical task planning with physics informed skill networks for robot manipulators.arXiv preprint arXiv:2406.00001, 2024

Mudit Chopra, Abhinav Barnawal, Harshil Vagadia, Tamajit Banerjee, Shreshth Tuli, Souvik Chakraborty, and Rohan Paul. Phyplan: Generalizable and rapid physical task planning with physics informed skill networks for robot manipulators.arXiv preprint arXiv:2406.00001, 2024

work page arXiv 2024

[35] [35]

A Survey on Vision-Language-Action Models for Embodied AI

Yueen Ma, Shuyu Wang, Botian Liu, Xiao Li, Yukun Chen, Shijie Xu, Haoqiang Xu, Hao Zhu, Yu Qiao, and Yong Wang. A survey on vision-language-action models for embodied AI.arXiv preprint arXiv:2405.14093, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Yue Li, Zhizheng Wang, Yu Xiang, et al. A survey on vision-language-action models: An action tokenization perspective.arXiv preprint arXiv:2507.01925, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

Yunhao Kim, Dongyoon Lee, Jaewon Park, et al. Large VLM-based vision-language-action models for robotic manipulation: A survey.arXiv preprint arXiv:2508.13073, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[38] [38]

Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. PaLM-E: An embodied ...

work page 2023

[39] [39]

RT-2: Vision-language-action models transfer web knowledge to robotic control

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. InConference on Robot Learning, pages 2165–2183. PMLR, 2023

work page 2023

[40] [40]

RELAP5-3D code manual volume i: Code structure, system models, and solution methods

RELAP5-3D Code Development Team. RELAP5-3D code manual volume i: Code structure, system models, and solution methods. Technical Report INL/MIS-15-36723 Rev. 4.5, Idaho National Laboratory, 2021

work page 2021

[41] [41]

Improved generalization with deep neural operators for engineering systems: Path towards digital twin.Engineering Applications of Artificial Intelligence, 131:107844, 2024

Kazuma Kobayashi, James Daniell, and Syed Bahauddin Alam. Improved generalization with deep neural operators for engineering systems: Path towards digital twin.Engineering Applications of Artificial Intelligence, 131:107844, 2024

work page 2024

[42] [42]

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíˇcek, Agustín Piqueres Lajarín, Vaibhav Srivastav, et al. Smollm2: When smol goes big–data-centric training of a small language model.arXiv preprint arXiv:2502.02737, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, et al. Phi-3 technical report: A highly capable language model locally on your phone.arXiv preprint arXiv:2404.14219, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [44]

Qwen2.5 Technical Report

Qwen Team. Qwen2.5: A party of foundation models.arXiv preprint arXiv:2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[45] [45]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Represen- tations, 2022

work page 2022

[46] [46]

QLoRA: Efficient finetuning of quantized LLMs

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. QLoRA: Efficient finetuning of quantized LLMs. InAdvances in Neural Information Processing Systems, volume 36, 2024

work page 2024

[47] [47]

A survey on efficient training of transformers.arXiv preprint arXiv:2302.01107, 2024

Bowen Zhu, Jing Jiao, and Takashi Hayashi. A survey on efficient training of transformers.arXiv preprint arXiv:2302.01107, 2024

work page arXiv 2024

[48] [48]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805, 2019

work page internal anchor Pith review Pith/arXiv arXiv 2019

[49] [49]

Spithourakis and Sebastian Riedel

Georgios P. Spithourakis and Sebastian Riedel. Numeracy for language models: Evaluating and improving their ability to predict numbers. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pages 2104–2115, 2018. 40 Agentic Physical AI toward a Domain-Specific Foundation ModelPREPRINT

work page 2018

[50] [50]

Do NLP models know numbers? probing numeracy in embeddings

Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, and Matt Gardner. Do NLP models know numbers? probing numeracy in embeddings. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pages 5307–5315, 2019

work page 2019

[51] [51]

Curriculum learning.Proceedings of the 26th International Conference on Machine Learning, pages 41–48, 2009

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning.Proceedings of the 26th International Conference on Machine Learning, pages 41–48, 2009

work page 2009

[52] [52]

Suchin Gururangan, Ana Marasovi´c, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. Don’t stop pretraining: Adapt language models to domains and tasks. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360, 2020

work page 2020

[53] [53]

Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020

work page 1901

[54] [55]

Jammalamadaka, et al

Nicholas Baker, Alfredo Alexander-Katz, Sidney Yip, Sauri K. Jammalamadaka, et al. Artificial intelligence for science in quantum, atomistic, and continuum systems.arXiv preprint arXiv:2307.08423, 2023

work page arXiv 2023

[55] [56]

AI for science: Accelerating discovery and prediction.Science, 384:eadm9526, 2024

Michael Schmidt, Hod Lipson, and Max Tegmark. AI for science: Accelerating discovery and prediction.Science, 384:eadm9526, 2024

work page 2024

[56] [57]

Bruinsma, Ana Lucic, Megan Stanley, Anna Vaughan, Johannes Brandstetter, Patrick Riechert, Jonathan A

Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Anna Vaughan, Johannes Brandstetter, Patrick Riechert, Jonathan A. Weyn, Haiyu Dong, Jayesh K. Salinas, Shruti Gupta, Ankur Kumar, Clara Edwards, Freddie Kalaitzis, Daniel Robinson, Ilia Shumailov, Rose Archibald, Matthew Chantry, et al. Aurora: A foundation model of the atmosphere.arXiv prepr...

work page arXiv 2024

[57] [58]

MatterGen: A generative model for inorganic materials design

Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Ziyan Shysheya, Jonathan Crabbé, Zhilong Yao, Tuan Anh Nguyen, Serina Schulz, Sarah Lewis Edwards, Nicholas Dyer, Carly Fitzsimons, Felix Fischer, Muratahan Aykol, et al. MatterGen: A generative model for inorganic materials design. arXiv preprint arXiv:2312.03687, 2025

work page arXiv 2025

[58] [59]

Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk

Amil Merchant, Simon Batzner, Samuel S. Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery.Nature, 624:80–85, 2024

work page 2024

[59] [60]

MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures

Han Fu, Zhiao Lin, et al. MatterSim: A deep learning atomistic model across elements, temperatures and pressures. arXiv preprint arXiv:2405.04967, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[60] [61]

Chemical language models for drug discovery.Nature Machine Intelligence, 6:111–120, 2024

Francesca Grisoni and Gisbert Schneider. Chemical language models for drug discovery.Nature Machine Intelligence, 6:111–120, 2024

work page 2024

[61] [62]

GPT-based models for molecular property prediction and drug discovery.Briefings in Bioinformatics, 25(2):bbad518, 2024

Xiaohui Wang, Xinru Chen, Jinzhe Gao, and Zhiqiang Liu. GPT-based models for molecular property prediction and drug discovery.Briefings in Bioinformatics, 25(2):bbad518, 2024

work page 2024

[62] [63]

Willard, Xiaowei Jia, Shaoming Xu, Michael S

Jared D. Willard, Xiaowei Jia, Shaoming Xu, Michael S. Steinbach, and Vipin Kumar. Integrating scientific knowledge with machine learning for engineering and environmental systems.ACM Computing Surveys, 55(4):1– 37, 2024

work page 2024

[63] [64]

Scientific foundation models.arXiv preprint arXiv:2406.03265, 2024

Kexin Huang, Tianfan Xiao, Huan Li, and Yang Liu. Scientific foundation models.arXiv preprint arXiv:2406.03265, 2024

work page arXiv 2024

[64] [65]

Artificial intelligence for aerospace engineering.Progress in Aerospace Sciences, 142:100966, 2024

Aaron Taylor, Indranil Chakraborty, and Jojo Moolayil. Artificial intelligence for aerospace engineering.Progress in Aerospace Sciences, 142:100966, 2024

work page 2024

[65] [66]

Machine learning for control of cyber-physical systems

Mohammad Alizadeh, Yisong Wang, and Zhong Liu. Machine learning for control of cyber-physical systems. Annual Reviews in Control, 57:100932, 2024

work page 2024

[66] [67]

Modeling of reactor kinetics and dynamics

Matthew Johnson, Scott Lucas, and Pavel Tsvetkov. Modeling of reactor kinetics and dynamics. Technical report, Idaho National Lab.(INL), Idaho Falls, ID (United States), 2010

work page 2010

[67] [68]

On the Opportunities and Risks of Foundation Models

Rishi Bommasani. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[68] [69]

Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

Yoon Pyo Lee, Joowon Cha, Yonggyun Yu, and Seung Geun Kim. Large language model agent for nuclear reactor operation assistance.Nuclear Engineering and Technology, page 103842, 2025

work page 2025

[69] [70]

M. Xian, T. Wang, S. Zhang, F. Xu, and Z. Ma. A knowledge-informed large language model framework for us nuclear power plant shutdown initiating event classification for probabilistic risk assessment.Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability, 2024. 41 Agentic Physical AI toward a Domain-Specific Founda...

work page 2024

[70] [71]

O. H. Kwon et al. Sentiment analysis of the united states public support of nuclear power on social media using large language models.Renewable and Sustainable Energy Reviews, 200:114570, 2024

work page 2024

[71] [72]

Y . Sun, H. Tsuruta, M. Kumagai, and K. Kurosaki. Japanese online discourse on nuclear energy using youtube- based topic modeling combined with llm sentiment analysis.Journal of Nuclear Science and Technology, 2025

work page 2025

[72] [73]

Exploring the role of large language models in radiation emergency response.Journal of Radiological Protection, 44(1):011510, 2024

Anirudh Chandra and Abinash Chakraborty. Exploring the role of large language models in radiation emergency response.Journal of Radiological Protection, 44(1):011510, 2024

work page 2024

[73] [74]

A. J. Dave, T. N. Nguyen, and R. B. Vilim. Integrating llms for explainable fault diagnosis in complex systems. arXiv preprint arXiv:2402.06695, 2024. 42

work page arXiv 2024