Validation-Gated Multi-Agent Governance for Online Adaptation of Thermal-Hydraulic Surrogate Models under Operating-Regime Shift

Doyeong Lim; In Cheol Bang; Seungyoon Lee

arxiv: 2606.03321 · v1 · pith:TXEO7IS5new · submitted 2026-06-02 · 💻 cs.LG · cs.MA· cs.SY· eess.SY

Validation-Gated Multi-Agent Governance for Online Adaptation of Thermal-Hydraulic Surrogate Models under Operating-Regime Shift

Doyeong Lim , Seungyoon Lee , In Cheol Bang This is my paper

Pith reviewed 2026-06-28 11:30 UTC · model grok-4.3

classification 💻 cs.LG cs.MAcs.SYeess.SY

keywords multi-agent systemscontinual adaptationsurrogate modelsthermal-hydraulic forecastingonline learningmodel governanceregime shiftchampion-challenger

0 comments

The pith

A multi-agent council with role-separated agents and deterministic gates reduces thermal-hydraulic surrogate forecasting error by 19 percent under operating-regime shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a guarded continual-adaptation framework in which role-separated agents monitor error signatures, diagnose issues, adapt models, audit safety, and orchestrate reviews for thermal-hydraulic surrogates that must forecast beyond their original training conditions. It screens seven surrogate families on experimental loop data, selects a temporal Fourier neural operator as initial champion, and compares static deployment against rule-based adaptation, shadow learning, and a full multi-agent mode that reviews every stream step. The full multi-agent mode reaches the lowest channel-averaged MAE of 5.72 and 35.8 percent exceedance ratio, an improvement over the static baseline of 7.06 MAE and 56.8 percent exceedance. Validated promotions to other model families occur under gate control while deterministic champion-challenger logic retains final authority over replacements. This setup addresses the problem that offline-selected models become condition-locked once deployed outside their pretraining envelope.

Core claim

The MA-Full mode, in which the role-separated multi-agent council reviews every evaluated stream step, achieved the lowest mean error of 5.72 and 35.8 percent exceedance, corresponding to a 19.0 percent improvement over Static deployment. Paired bootstrap intervals against Static excluded zero, although intervals among adaptive modes overlapped and the six paired units limit broad statistical claims. Validated promotions from the neural operator to Transformer and graph neural network indicate that logged, gate-controlled adaptation can support auditable surrogate evolution while deterministic gates retain deployment authority.

What carries the argument

The validation-gated multi-agent council consisting of Monitor, Diagnosis, Adaptation, Safety-Auditor, and Orchestrator agents together with champion-challenger gates and background shadow learning that reviews every stream step while retaining deterministic final authority over model replacement.

If this is right

Role-separated agents diagnose error signatures and prioritize candidate model families for each adaptation decision.
Deterministic champion-challenger gates and shadow learning keep final deployment authority separate from agent recommendations.
Validated promotions between surrogate families such as neural operator to Transformer become possible while preserving auditability.
The framework supports second-by-second forecasting on experimental thermal-hydraulic data once models leave their pretraining envelope.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same gated multi-agent structure could be tested on other engineering time-series domains that face regime shifts, such as power-grid load forecasting.
Overlapping performance intervals among adaptive modes suggest that simpler rule-based adaptation may be sufficient when computational budget is limited.
Collecting additional transients from varied loop conditions would allow tighter statistical bounds on whether the 19 percent gain holds more broadly.

Load-bearing premise

The experimental loop transients and the chosen surrogate families are representative enough of real operating-regime shifts that the observed error reductions will generalize beyond the two held-out cases and the specific data collection setup.

What would settle it

A new held-out transient from a different regime shift in which the MA-Full mode produces higher or equal mean absolute error compared with static deployment would falsify the reported 19 percent improvement.

Figures

Figures reproduced from arXiv: 2606.03321 by Doyeong Lim, In Cheol Bang, Seungyoon Lee.

**Figure 2.** Figure 2: Thermal–hydraulic test facility used for data generation [46]. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Heat-pipe insertion configuration used as the held-out envelope-shift test condi [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The seven candidate surrogate architectures screened offline. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Offline champion selection by blocked 3-fold cross-validation. MAE is reported [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Temporal-FNO predictions on the pretraining-envelope control-rod transient. [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Multi-seed full-stream comparison across the seven modes: (a) MAE and [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Rolling MAE (Static vs. MA-Full) and serving-champion state for the heat-pipe [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Rolling MAE and serving-champion state for the second held-out split (layout [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: MA-Full validated champion replacements on the heat-pipe split: incumbent [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Offline-domain back-test after online adaptation. Channel-averaged MAE is [PITH_FULL_IMAGE:figures/full_fig_p020_11.png] view at source ↗

**Figure 6.** Figure 6: In high-shift heat-pipe intervals, Static predictions gradually depart [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 12.** Figure 12: Variable-level prediction traces (measured, Static, MA-Full) for the heat-pipe [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

**Figure 13.** Figure 13: Variable-level prediction traces (measured, Static, MA-Full) for the second [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗

**Figure 14.** Figure 14: Challenger funnel across modes: validated promotions [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗

read the original abstract

Artificial-intelligence surrogates can support second-by-second thermal-hydraulic forecasting, but models selected and frozen offline may become condition-locked once deployed outside their pretraining envelope. This study develops a guarded continual-adaptation framework for experimental thermal-hydraulic loop data in which role-separated agents - Monitor, Diagnosis, Adaptation, Safety-Auditor, and Orchestrator - diagnose error signatures, prioritize candidate model families, and review promotions, while deterministic champion-challenger gates and background shadow learning retain final authority over model replacement. Seven surrogate families were screened by blocked three-fold cross-validation, and a temporal Fourier neural operator was selected as the initial champion for 60-s-history-to-10-s-trajectory forecasting on two held-out transients, with three seeds per adaptive mode. Static deployment gave a channel-averaged MAE of 7.06 and a 56.8% warning-exceedance ratio; rule-based adaptation reduced MAE to 6.54, whereas shadow refresh alone remained close to Static. The MA-Full mode, in which the role-separated multi-agent council reviews every evaluated stream step, achieved the lowest mean error, 5.72, and 35.8% exceedance, corresponding to a 19.0% improvement over Static. Paired bootstrap intervals against Static excluded zero, although intervals among adaptive modes overlapped and the six paired units limit broad statistical claims. Validated promotions from the neural operator to Transformer and graph neural network indicate that logged, gate-controlled adaptation can support auditable surrogate evolution while deterministic gates retain deployment authority.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Modest 19% error drop from multi-agent adaptation on thermal-hydraulic surrogates, but the test set is only two transients.

read the letter

The paper's main result is that a full multi-agent council with champion-challenger gates and shadow learning cuts channel-averaged MAE from 7.06 to 5.72 on two held-out transients, a 19% improvement over static deployment. They screen seven surrogate families, settle on a temporal Fourier neural operator, and compare rule-based, shadow-only, and role-separated modes across three seeds.

The work does a few things cleanly. It gives explicit MAE and exceedance numbers, reports bootstrap intervals that exclude zero against static, and states the statistical limits up front. The agent roles (Monitor, Diagnosis, Adaptation, Safety-Auditor, Orchestrator) plus deterministic gates produce auditable promotions to other families like transformers and graph networks. No circular fitting or invented metrics.

The soft spot is the data scale. Only two transients and six paired units, with overlapping intervals among the adaptive modes. The paper itself says this limits broad claims, so it is unclear whether the agent council adds anything reliable beyond simpler refresh rules. No additional regime shifts or loop conditions are shown, which matches the stress-test note.

This is for engineers who need practical, logged adaptation of physics surrogates in safety-critical loops. The empirical comparison is honest and the caveats are visible, so the central claim holds on its own terms even if the gains are narrow.

It deserves peer review to test whether the pattern scales beyond this split.

Referee Report

1 major / 2 minor

Summary. The manuscript develops a validation-gated multi-agent governance framework for continual online adaptation of thermal-hydraulic surrogate models under operating-regime shifts. Role-separated agents (Monitor, Diagnosis, Adaptation, Safety-Auditor, Orchestrator) diagnose error signatures and review model promotions, while deterministic champion-challenger gates and shadow learning retain final authority. Seven surrogate families are screened via blocked three-fold cross-validation; a temporal Fourier neural operator is selected as initial champion for 60-s-history-to-10-s-trajectory forecasting. On two held-out transients (three seeds each), static deployment yields channel-averaged MAE 7.06 and 56.8% exceedance; MA-Full mode achieves MAE 5.72 and 35.8% exceedance (19% improvement), with bootstrap intervals excluding zero versus static but overlapping among adaptive modes. The authors note that the six paired units limit broad statistical claims.

Significance. If the observed error reductions hold under broader conditions, the work supplies concrete empirical evidence that role-separated multi-agent councils combined with deterministic gates can support auditable surrogate evolution in a safety-critical domain while retaining deployment authority. Strengths include the explicit reporting of MAE values with bootstrap intervals, the systematic screening of seven model families, and the direct comparison of static, rule-based, shadow-refresh, and full multi-agent modes on held-out transients.

major comments (1)

[Results / Abstract] Evaluation on only two held-out transients with six paired units (three seeds each) produces overlapping bootstrap intervals among adaptive modes and, as the authors themselves state, limits broad statistical claims. This small sample makes the headline 19% improvement (MAE 5.72 vs. 7.06) vulnerable to being an artifact of the particular transients rather than a general property of the role-separated agent council; additional regime-shift scenarios or larger test partitions are required to substantiate the central claim of effective adaptation under operating-regime shift.

minor comments (2)

[Results] The results presentation selects MA-Full as best after observing all modes; a pre-specified primary comparison or adjustment for multiple comparisons would reduce the appearance of post-hoc emphasis.
[Method] Notation for the champion-challenger gates and shadow-learning update rules could be formalized with explicit equations to improve reproducibility.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback and for acknowledging the strengths of our model screening and reporting practices. We address the single major comment below.

read point-by-point responses

Referee: [Results / Abstract] Evaluation on only two held-out transients with six paired units (three seeds each) produces overlapping bootstrap intervals among adaptive modes and, as the authors themselves state, limits broad statistical claims. This small sample makes the headline 19% improvement (MAE 5.72 vs. 7.06) vulnerable to being an artifact of the particular transients rather than a general property of the role-separated agent council; additional regime-shift scenarios or larger test partitions are required to substantiate the central claim of effective adaptation under operating-regime shift.

Authors: We agree that the evaluation is limited to two held-out transients (six paired units total) and that this produces overlapping bootstrap intervals among adaptive modes, as already stated in the manuscript. The reported 19% MAE reduction is therefore specific to these transients and cannot be claimed as a general property of the framework without further data. The bootstrap intervals do exclude zero versus the static baseline, providing evidence of improvement on the available experimental cases, but we accept that this does not substantiate broad claims. We will revise the abstract, results, and discussion sections to more prominently qualify the findings as preliminary and to avoid any implication of general superiority. However, the experimental thermal-hydraulic loop dataset contains only the reported transients; additional regime-shift scenarios cannot be generated without new experimental campaigns outside the scope of this work. revision: partial

standing simulated objections not resolved

Additional held-out transients or regime-shift scenarios are unavailable without conducting new experimental campaigns on the thermal-hydraulic loop.

Circularity Check

0 steps flagged

No circularity; results are direct empirical measurements on held-out transients.

full rationale

The paper reports an empirical study: surrogate families are screened via cross-validation on training data, a champion is selected, and then multiple adaptation modes (including multi-agent governance) are evaluated by measuring MAE and exceedance ratios on two held-out transients. No derivation, first-principles result, or prediction is claimed whose value is forced by the paper's own equations or by re-using fitted parameters as outputs. The 19% improvement figure is a post-hoc arithmetic comparison of independently measured errors. No self-citation chains or ansatzes are invoked to justify the central claims. The derivation chain is therefore self-contained as standard experimental reporting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the multi-agent roles and gates functioning as described without introducing selection bias, plus the assumption that the experimental data distribution matches deployment shifts.

axioms (1)

domain assumption The blocked three-fold cross-validation and held-out transients adequately represent operating-regime shifts.
Invoked when claiming generalization from the two held-out cases to broader deployment.

invented entities (1)

Role-separated agents (Monitor, Diagnosis, Adaptation, Safety-Auditor, Orchestrator) no independent evidence
purpose: Diagnose error signatures, prioritize model families, and review promotions in the adaptation loop.
New software constructs introduced by the framework; no independent evidence outside the described experiments.

pith-pipeline@v0.9.1-grok · 5831 in / 1386 out tokens · 28639 ms · 2026-06-28T11:30:20.905435+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 33 canonical work pages · 5 internal anchors

[1]

M. I. Radaideh, C. Pigg, T. Kozlowski, Y. Deng, A. Qu, Neural-based time series forecasting of loss of coolant accidents in nuclear power plants, Expert Systems with Applications 160 (2020) 113699

2020
[2]

Y. Lee, S. H. Song, J. Y. Bae, K. Song, M. R. Seo, S. J. Kim, J. I. Lee, Surrogate model for predicting severe accident progression in nuclear power plant using deep learning methods and rolling-window forecast, Annals of Nuclear Energy 208 (2024) 110816.doi:10.1016/j.anucen e.2024.110816

work page doi:10.1016/j.anucen 2024
[3]

J. Song, S. Kim, A machine learning informed prediction of severe ac- cident progressions in nuclear power plants, Nuclear Engineering and Technology 56 (6) (2024) 2266–2273.doi:10.1016/j.net.2024.01.03 5

work page doi:10.1016/j.net.2024.01.03 2024
[4]

Antonello, J

F. Antonello, J. Buongiorno, E. Zio, Physics informed neural networks for surrogate modeling of accidental scenarios in nuclear power plants, Nuclear Engineering and Technology 55 (9) (2023) 3409–3416

2023
[5]

Cheng, M

Q. Cheng, M. H. Sahadath, H. Yang, S. Pan, W. Ji, Surrogate modeling of heat transfer under flow fluctuation conditions using fourier basis-deep operator network with uncertainty quantification, Progress in Nuclear Energy 188 (2025) 105895.doi:10.1016/j.pnucene.2025.105895

work page doi:10.1016/j.pnucene.2025.105895 2025
[6]

Daniell, K

J. Daniell, K. Kobayashi, A. Alajo, S. B. Alam, Digital twin-centered hybrid data-driven multi-stage deep learning framework for enhanced nuclear reactor power prediction, Energy and AI 19 (2025) 100450.doi: 10.1016/j.egyai.2024.100450

work page doi:10.1016/j.egyai.2024.100450 2025
[7]

Kobayashi, S

K. Kobayashi, S. B. Alam, Deep neural operator-driven real-time infer- ence to enable digital twin solutions for nuclear energy systems, Scientific Reports 14 (1) (2024) 2101.doi:10.1038/s41598-024-51984-x. 27

work page doi:10.1038/s41598-024-51984-x 2024
[8]

D. Lim, Z. N. Ndum, C. Young, Y. Hassan, Y. Liu, An ai-driven thermal- fluid testbed for advanced small modular reactors: Integration of digital twin and large language models, AI Thermal Fluids (2025) 100023

2025
[9]

J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM Computing Surveys 46 (4) (2014) 44:1–44:37.doi:10.1145/2523813

work page doi:10.1145/2523813 2014
[10]

J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, G. Zhang, Learning under concept drift: A review, IEEE Transactions on Knowledge and Data Engineering 31 (12) (2019) 2346–2363.doi:10.1109/TKDE.2018.2876 857

work page doi:10.1109/tkde.2018.2876 2019
[11]

Gunasekara, B

N. Gunasekara, B. Pfahringer, H. M. Gomes, A. Bifet, Survey on on- line streaming continual learning, in: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, 2023, pp. 6628–6637.doi:10.24963/ijcai.2023/743

work page doi:10.24963/ijcai.2023/743 2023
[12]

S. A. Bidaki, A. Mohammadkhah, K. Rezaee, F. Hassani, S. Eskan- dari, M. Salahi, M. M. Ghassemi, Online continual learning: A system- atic literature review of approaches, challenges, and benchmarks, arXiv preprint arXiv:2501.04897 (2025).doi:10.48550/arXiv.2501.04897

work page doi:10.48550/arxiv.2501.04897 2025
[13]

L. Wang, X. Zhang, H. Su, J. Zhu, A comprehensive survey of con- tinual learning: Theory, method and application, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (8) (2024) 5362–5383. doi:10.1109/TPAMI.2024.3367329

work page doi:10.1109/tpami.2024.3367329 2024
[14]

E. S. Page, Continuous inspection schemes, Biometrika 41 (1/2) (1954) 100–115.doi:10.1093/biomet/41.1-2.100

work page doi:10.1093/biomet/41.1-2.100 1954
[15]

J. Gama, P. Medas, G. Castillo, P. Rodrigues, Learning with drift de- tection, in: Advances in Artificial Intelligence – SBIA 2004, Springer, 2004, pp. 286–295.doi:10.1007/978-3-540-28645-5_29

work page doi:10.1007/978-3-540-28645-5_29 2004
[16]

Bifet, R

A. Bifet, R. Gavald` a, Learning from time-changing data with adaptive windowing, in: Proceedings of the 2007 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 2007, pp. 443–448.doi:10.1137/1.9781611972771.42. 28

work page doi:10.1137/1.9781611972771.42 2007
[17]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, R. Hadsell, Overcoming catas- trophic forgetting in neural networks, Proceedings of the National Academy of Sciences 114 (13) (2017) 3521–3526.doi:10.1073/pn as.1611835114

work page doi:10.1073/pn 2017
[18]

Aljundi, L

R. Aljundi, L. Caccia, E. Belilovsky, M. Caccia, M. Lin, L. Charlin, T. Tuytelaars, Online continual learning with maximally interfered re- trieval, in: Advances in Neural Information Processing Systems, Vol. 32, 2019

2019
[19]

Buzzega, M

P. Buzzega, M. Boschini, A. Porrello, D. Abati, S. Calderara, Dark experience for general continual learning: A strong, simple baseline, in: Advances in Neural Information Processing Systems, Vol. 33, 2020, pp. 15920–15930

2020
[20]

Sculley, G

D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, D. Dennison, Hidden technical debt in machine learning systems, in: Advances in Neural Information Processing Systems, Vol. 28, 2015, pp. 2503–2511

2015
[21]

Polyzotis, S

N. Polyzotis, S. Roy, S. E. Whang, M. Zinkevich, Data lifecycle chal- lenges in production machine learning: A survey, SIGMOD Record 47 (2) (2018) 17–28.doi:10.1145/3299887.3299891

work page doi:10.1145/3299887.3299891 2018
[22]

Zhang, G

T. Zhang, G. Yan, M. Ren, L. Cheng, R. Li, G. Xie, Dynamic transfer soft sensor for concept drift adaptation, Journal of Process Control 123 (2023) 50–63.doi:10.1016/j.jprocont.2023.01.012

work page doi:10.1016/j.jprocont.2023.01.012 2023
[23]

H. Song, M. Song, X. Liu, Online autonomous calibration of digital twins using machine learning with application to nuclear power plants, Applied Energy 326 (2022) 119995.doi:10.1016/j.apenergy.2022.119995

work page doi:10.1016/j.apenergy.2022.119995 2022
[24]

Zhou, M.-j

G. Zhou, M.-j. Peng, H. Wang, D.-b. Sun, Z.-k. Li, Research on fault diagnosis method and interpretability of nuclear power plant based on hybrid transformer model, Annals of Nuclear Energy 213 (2025) 111157. doi:10.1016/j.anucene.2024.111157

work page doi:10.1016/j.anucene.2024.111157 2025
[25]

C. Tan, W. Zheng, B. Wang, S. Tan, B. Liang, J. Li, R. Han, Z. Ke, R. Tian, Weights embedding Informer prediction algorithm-based fault 29 diagnosis framework for nuclear power plant, Annals of Nuclear Energy 207 (2024) 110736.doi:10.1016/j.anucene.2024.110736

work page doi:10.1016/j.anucene.2024.110736 2024
[26]

Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, et al., Autogen: Enabling next-gen llm applications via multi-agent conversations, in: First conference on language modeling, 2024

2024
[27]

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, Y. Cao, ReAct: Synergizing reasoning and acting in language models, in: Inter- national Conference on Learning Representations (ICLR), 2023.doi: 10.48550/arXiv.2210.03629

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.03629 2023
[28]

Shinn, F

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, S. Yao, Reflexion: Language agents with verbal reinforcement learning, in: Advances in Neural Information Processing Systems, Vol. 36, 2023

2023
[29]

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, M. S. Bernstein, Generative agents: Interactive simulacra of human behavior, in: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023.doi:10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763 2023
[30]

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, J. Wang, C. Zhang, S. Yau, Z. Lin, L. Zhou, et al., Metagpt: Meta programming for a multi- agent collaborative framework, in: International Conference on Learning Representations, Vol. 2024, 2024, pp. 23247–23275

2024
[31]

Gottweis, W.-H

J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weissenberger, K. Rong, R. Tanno, et al., Towards an AI co-scientist, arXiv preprint arXiv:2502.18864 (2025).doi:10.4 8550/arXiv.2502.18864

Pith/arXiv arXiv 2025
[32]

Gridach, J

M. Gridach, J. Nanavati, K. Zine El Abidine, L. Mendes, C. Mack, Agentic AI for scientific discovery: A survey of progress, challenges, and future directions, arXiv preprint arXiv:2503.08979 (2025).doi: 10.48550/arXiv.2503.08979

work page doi:10.48550/arxiv.2503.08979 2025
[33]

T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, X. Zhang, Large language model based multi-agents: A survey of progress and challenges, arXiv preprint arXiv:2402.01680 (2024).doi: 10.48550/arXiv.2402.01680. 30

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.01680 2024
[34]

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, D. Zhou, Chain-of-thought prompting elicits reasoning in large language models, in: Advances in Neural Information Processing Systems, Vol. 35, 2022, pp. 24824–24837

2022
[35]

Y. Liu, Z. Abulawi, A. Garimidi, D. Lim, Automating data-driven modeling and analysis for engineering applications using large language model agents, Knowledge-Based Systems (2026) 115989

2026
[36]

Z. N. Ndum, D. Lim, J. Ford, S. Adu, J. Tao, Y. Hassan, Y. Liu, Large language model-assisted digital twin for remote monitoring and control of advanced reactors, Progress in Nuclear Energy 192 (2026) 106172

2026
[37]

Long short-term memory.Neural Computation, 9(8): 1735–1780, 1997

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Com- putation 9 (8) (1997) 1735–1780.doi:10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997
[38]

K. Cho, B. van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder–decoder for statistical machine translation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Pro- cessing, Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1724–1734....

work page doi:10.3115/v1/d14-1179 2014
[39]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems, Vol. 30, 2017

2017
[40]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, D. K. Duvenaud, Neu- ral ordinary differential equations, in: Advances in Neural Information Processing Systems, Vol. 31, 2018

2018
[41]

P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zam- baldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, R. Pascanu, Relational inductive biases, ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1806.01261 2018
[42]

Corso, H

G. Corso, H. Stark, S. Jegelka, T. Jaakkola, R. Barzilay, Graph neural networks, Nature Reviews Methods Primers 4 (1) (2024) 17.doi:10.1 038/s43586-024-00294-7

2024
[43]

L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nature Machine Intelligence 3 (3) (2021) 218–229.doi: 10.1038/s42256-021-00302-5

work page doi:10.1038/s42256-021-00302-5 2021
[44]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stu- art, A. Anandkumar, Fourier neural operator for parametric partial dif- ferential equations, in: International Conference on Learning Represen- tations, 2021.doi:10.48550/arXiv.2010.08895

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.08895 2021
[45]

Bootstrap methods: Another look at the jackknife,

B. Efron, Bootstrap methods: Another look at the jackknife, The Annals of Statistics 7 (1) (1979) 1–26.doi:10.1214/aos/1176344552

work page doi:10.1214/aos/1176344552 1979
[46]

K. M. Kim, I. C. Bang, Design and operation of the transparent integral effect test facility, URI-LO for nuclear innovation platform, Nuclear Engineering and Technology 53 (3) (2021) 776–792.doi: 10.1016/j.net.2020.08.006

work page doi:10.1016/j.net.2020.08.006 2021
[47]

H. J. Kim, D. Y. Lim, I. C. Bang, Feasibility study of hybrid heat pipe control rod application on nuclear power plant using unist reactor inno- vation loop (URI-LO), in: Transactions of the Korean Nuclear Society Spring Meeting, Korea, 2022

2022
[48]

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, Physics-informed machine learning, Nature Reviews Physics 3 (6) (2021) 422–440.doi:10.1038/s42254-021-00314-5

work page doi:10.1038/s42254-021-00314-5 2021
[49]

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, W. Zhang, In- former: Beyond efficient transformer for long sequence time-series fore- casting, in: Proceedings of the AAAI Conference on Artificial Intelli- gence, Vol. 35, 2021, pp. 11106–11115.doi:10.1609/aaai.v35i12.17 325

work page doi:10.1609/aaai.v35i12.17 2021
[50]

Y. Nie, N. H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth 64 words: Long-term forecasting with transformers, in: Inter- national Conference on Learning Representations (ICLR), 2023.doi: 10.48550/arXiv.2211.14730. 32

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.14730 2023

[1] [1]

M. I. Radaideh, C. Pigg, T. Kozlowski, Y. Deng, A. Qu, Neural-based time series forecasting of loss of coolant accidents in nuclear power plants, Expert Systems with Applications 160 (2020) 113699

2020

[2] [2]

Y. Lee, S. H. Song, J. Y. Bae, K. Song, M. R. Seo, S. J. Kim, J. I. Lee, Surrogate model for predicting severe accident progression in nuclear power plant using deep learning methods and rolling-window forecast, Annals of Nuclear Energy 208 (2024) 110816.doi:10.1016/j.anucen e.2024.110816

work page doi:10.1016/j.anucen 2024

[3] [3]

J. Song, S. Kim, A machine learning informed prediction of severe ac- cident progressions in nuclear power plants, Nuclear Engineering and Technology 56 (6) (2024) 2266–2273.doi:10.1016/j.net.2024.01.03 5

work page doi:10.1016/j.net.2024.01.03 2024

[4] [4]

Antonello, J

F. Antonello, J. Buongiorno, E. Zio, Physics informed neural networks for surrogate modeling of accidental scenarios in nuclear power plants, Nuclear Engineering and Technology 55 (9) (2023) 3409–3416

2023

[5] [5]

Cheng, M

Q. Cheng, M. H. Sahadath, H. Yang, S. Pan, W. Ji, Surrogate modeling of heat transfer under flow fluctuation conditions using fourier basis-deep operator network with uncertainty quantification, Progress in Nuclear Energy 188 (2025) 105895.doi:10.1016/j.pnucene.2025.105895

work page doi:10.1016/j.pnucene.2025.105895 2025

[6] [6]

Daniell, K

J. Daniell, K. Kobayashi, A. Alajo, S. B. Alam, Digital twin-centered hybrid data-driven multi-stage deep learning framework for enhanced nuclear reactor power prediction, Energy and AI 19 (2025) 100450.doi: 10.1016/j.egyai.2024.100450

work page doi:10.1016/j.egyai.2024.100450 2025

[7] [7]

Kobayashi, S

K. Kobayashi, S. B. Alam, Deep neural operator-driven real-time infer- ence to enable digital twin solutions for nuclear energy systems, Scientific Reports 14 (1) (2024) 2101.doi:10.1038/s41598-024-51984-x. 27

work page doi:10.1038/s41598-024-51984-x 2024

[8] [8]

D. Lim, Z. N. Ndum, C. Young, Y. Hassan, Y. Liu, An ai-driven thermal- fluid testbed for advanced small modular reactors: Integration of digital twin and large language models, AI Thermal Fluids (2025) 100023

2025

[9] [9]

J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM Computing Surveys 46 (4) (2014) 44:1–44:37.doi:10.1145/2523813

work page doi:10.1145/2523813 2014

[10] [10]

J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, G. Zhang, Learning under concept drift: A review, IEEE Transactions on Knowledge and Data Engineering 31 (12) (2019) 2346–2363.doi:10.1109/TKDE.2018.2876 857

work page doi:10.1109/tkde.2018.2876 2019

[11] [11]

Gunasekara, B

N. Gunasekara, B. Pfahringer, H. M. Gomes, A. Bifet, Survey on on- line streaming continual learning, in: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, 2023, pp. 6628–6637.doi:10.24963/ijcai.2023/743

work page doi:10.24963/ijcai.2023/743 2023

[12] [12]

S. A. Bidaki, A. Mohammadkhah, K. Rezaee, F. Hassani, S. Eskan- dari, M. Salahi, M. M. Ghassemi, Online continual learning: A system- atic literature review of approaches, challenges, and benchmarks, arXiv preprint arXiv:2501.04897 (2025).doi:10.48550/arXiv.2501.04897

work page doi:10.48550/arxiv.2501.04897 2025

[13] [13]

L. Wang, X. Zhang, H. Su, J. Zhu, A comprehensive survey of con- tinual learning: Theory, method and application, IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (8) (2024) 5362–5383. doi:10.1109/TPAMI.2024.3367329

work page doi:10.1109/tpami.2024.3367329 2024

[14] [14]

E. S. Page, Continuous inspection schemes, Biometrika 41 (1/2) (1954) 100–115.doi:10.1093/biomet/41.1-2.100

work page doi:10.1093/biomet/41.1-2.100 1954

[15] [15]

J. Gama, P. Medas, G. Castillo, P. Rodrigues, Learning with drift de- tection, in: Advances in Artificial Intelligence – SBIA 2004, Springer, 2004, pp. 286–295.doi:10.1007/978-3-540-28645-5_29

work page doi:10.1007/978-3-540-28645-5_29 2004

[16] [16]

Bifet, R

A. Bifet, R. Gavald` a, Learning from time-changing data with adaptive windowing, in: Proceedings of the 2007 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 2007, pp. 443–448.doi:10.1137/1.9781611972771.42. 28

work page doi:10.1137/1.9781611972771.42 2007

[17] [17]

Kirkpatrick, R

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, R. Hadsell, Overcoming catas- trophic forgetting in neural networks, Proceedings of the National Academy of Sciences 114 (13) (2017) 3521–3526.doi:10.1073/pn as.1611835114

work page doi:10.1073/pn 2017

[18] [18]

Aljundi, L

R. Aljundi, L. Caccia, E. Belilovsky, M. Caccia, M. Lin, L. Charlin, T. Tuytelaars, Online continual learning with maximally interfered re- trieval, in: Advances in Neural Information Processing Systems, Vol. 32, 2019

2019

[19] [19]

Buzzega, M

P. Buzzega, M. Boschini, A. Porrello, D. Abati, S. Calderara, Dark experience for general continual learning: A strong, simple baseline, in: Advances in Neural Information Processing Systems, Vol. 33, 2020, pp. 15920–15930

2020

[20] [20]

Sculley, G

D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, D. Dennison, Hidden technical debt in machine learning systems, in: Advances in Neural Information Processing Systems, Vol. 28, 2015, pp. 2503–2511

2015

[21] [21]

Polyzotis, S

N. Polyzotis, S. Roy, S. E. Whang, M. Zinkevich, Data lifecycle chal- lenges in production machine learning: A survey, SIGMOD Record 47 (2) (2018) 17–28.doi:10.1145/3299887.3299891

work page doi:10.1145/3299887.3299891 2018

[22] [22]

Zhang, G

T. Zhang, G. Yan, M. Ren, L. Cheng, R. Li, G. Xie, Dynamic transfer soft sensor for concept drift adaptation, Journal of Process Control 123 (2023) 50–63.doi:10.1016/j.jprocont.2023.01.012

work page doi:10.1016/j.jprocont.2023.01.012 2023

[23] [23]

H. Song, M. Song, X. Liu, Online autonomous calibration of digital twins using machine learning with application to nuclear power plants, Applied Energy 326 (2022) 119995.doi:10.1016/j.apenergy.2022.119995

work page doi:10.1016/j.apenergy.2022.119995 2022

[24] [24]

Zhou, M.-j

G. Zhou, M.-j. Peng, H. Wang, D.-b. Sun, Z.-k. Li, Research on fault diagnosis method and interpretability of nuclear power plant based on hybrid transformer model, Annals of Nuclear Energy 213 (2025) 111157. doi:10.1016/j.anucene.2024.111157

work page doi:10.1016/j.anucene.2024.111157 2025

[25] [25]

C. Tan, W. Zheng, B. Wang, S. Tan, B. Liang, J. Li, R. Han, Z. Ke, R. Tian, Weights embedding Informer prediction algorithm-based fault 29 diagnosis framework for nuclear power plant, Annals of Nuclear Energy 207 (2024) 110736.doi:10.1016/j.anucene.2024.110736

work page doi:10.1016/j.anucene.2024.110736 2024

[26] [26]

Q. Wu, G. Bansal, J. Zhang, Y. Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, et al., Autogen: Enabling next-gen llm applications via multi-agent conversations, in: First conference on language modeling, 2024

2024

[27] [27]

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, Y. Cao, ReAct: Synergizing reasoning and acting in language models, in: Inter- national Conference on Learning Representations (ICLR), 2023.doi: 10.48550/arXiv.2210.03629

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.03629 2023

[28] [28]

Shinn, F

N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, S. Yao, Reflexion: Language agents with verbal reinforcement learning, in: Advances in Neural Information Processing Systems, Vol. 36, 2023

2023

[29] [29]

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, M. S. Bernstein, Generative agents: Interactive simulacra of human behavior, in: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023.doi:10.1145/3586183.3606763

work page doi:10.1145/3586183.3606763 2023

[30] [30]

S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, J. Wang, C. Zhang, S. Yau, Z. Lin, L. Zhou, et al., Metagpt: Meta programming for a multi- agent collaborative framework, in: International Conference on Learning Representations, Vol. 2024, 2024, pp. 23247–23275

2024

[31] [31]

Gottweis, W.-H

J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weissenberger, K. Rong, R. Tanno, et al., Towards an AI co-scientist, arXiv preprint arXiv:2502.18864 (2025).doi:10.4 8550/arXiv.2502.18864

Pith/arXiv arXiv 2025

[32] [32]

Gridach, J

M. Gridach, J. Nanavati, K. Zine El Abidine, L. Mendes, C. Mack, Agentic AI for scientific discovery: A survey of progress, challenges, and future directions, arXiv preprint arXiv:2503.08979 (2025).doi: 10.48550/arXiv.2503.08979

work page doi:10.48550/arxiv.2503.08979 2025

[33] [33]

T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, X. Zhang, Large language model based multi-agents: A survey of progress and challenges, arXiv preprint arXiv:2402.01680 (2024).doi: 10.48550/arXiv.2402.01680. 30

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.01680 2024

[34] [34]

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, D. Zhou, Chain-of-thought prompting elicits reasoning in large language models, in: Advances in Neural Information Processing Systems, Vol. 35, 2022, pp. 24824–24837

2022

[35] [35]

Y. Liu, Z. Abulawi, A. Garimidi, D. Lim, Automating data-driven modeling and analysis for engineering applications using large language model agents, Knowledge-Based Systems (2026) 115989

2026

[36] [36]

Z. N. Ndum, D. Lim, J. Ford, S. Adu, J. Tao, Y. Hassan, Y. Liu, Large language model-assisted digital twin for remote monitoring and control of advanced reactors, Progress in Nuclear Energy 192 (2026) 106172

2026

[37] [37]

Long short-term memory.Neural Computation, 9(8): 1735–1780, 1997

S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Com- putation 9 (8) (1997) 1735–1780.doi:10.1162/neco.1997.9.8.1735

work page doi:10.1162/neco.1997.9.8.1735 1997

[38] [38]

K. Cho, B. van Merri¨ enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder–decoder for statistical machine translation, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Pro- cessing, Association for Computational Linguistics, Doha, Qatar, 2014, pp. 1724–1734....

work page doi:10.3115/v1/d14-1179 2014

[39] [39]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you need, in: Advances in Neural Information Processing Systems, Vol. 30, 2017

2017

[40] [40]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, D. K. Duvenaud, Neu- ral ordinary differential equations, in: Advances in Neural Information Processing Systems, Vol. 31, 2018

2018

[41] [41]

P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zam- baldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, C. Gulcehre, F. Song, A. Ballard, J. Gilmer, G. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, R. Pascanu, Relational inductive biases, ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1806.01261 2018

[42] [42]

Corso, H

G. Corso, H. Stark, S. Jegelka, T. Jaakkola, R. Barzilay, Graph neural networks, Nature Reviews Methods Primers 4 (1) (2024) 17.doi:10.1 038/s43586-024-00294-7

2024

[43] [43]

L. Lu, P. Jin, G. Pang, Z. Zhang, G. E. Karniadakis, Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators, Nature Machine Intelligence 3 (3) (2021) 218–229.doi: 10.1038/s42256-021-00302-5

work page doi:10.1038/s42256-021-00302-5 2021

[44] [44]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stu- art, A. Anandkumar, Fourier neural operator for parametric partial dif- ferential equations, in: International Conference on Learning Represen- tations, 2021.doi:10.48550/arXiv.2010.08895

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2010.08895 2021

[45] [45]

Bootstrap methods: Another look at the jackknife,

B. Efron, Bootstrap methods: Another look at the jackknife, The Annals of Statistics 7 (1) (1979) 1–26.doi:10.1214/aos/1176344552

work page doi:10.1214/aos/1176344552 1979

[46] [46]

K. M. Kim, I. C. Bang, Design and operation of the transparent integral effect test facility, URI-LO for nuclear innovation platform, Nuclear Engineering and Technology 53 (3) (2021) 776–792.doi: 10.1016/j.net.2020.08.006

work page doi:10.1016/j.net.2020.08.006 2021

[47] [47]

H. J. Kim, D. Y. Lim, I. C. Bang, Feasibility study of hybrid heat pipe control rod application on nuclear power plant using unist reactor inno- vation loop (URI-LO), in: Transactions of the Korean Nuclear Society Spring Meeting, Korea, 2022

2022

[48] [48]

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, L. Yang, Physics-informed machine learning, Nature Reviews Physics 3 (6) (2021) 422–440.doi:10.1038/s42254-021-00314-5

work page doi:10.1038/s42254-021-00314-5 2021

[49] [49]

H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, W. Zhang, In- former: Beyond efficient transformer for long sequence time-series fore- casting, in: Proceedings of the AAAI Conference on Artificial Intelli- gence, Vol. 35, 2021, pp. 11106–11115.doi:10.1609/aaai.v35i12.17 325

work page doi:10.1609/aaai.v35i12.17 2021

[50] [50]

Y. Nie, N. H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth 64 words: Long-term forecasting with transformers, in: Inter- national Conference on Learning Representations (ICLR), 2023.doi: 10.48550/arXiv.2211.14730. 32

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2211.14730 2023