Recognition: unknown
Machine learning inference of fission yields from gamma spectroscopy for very low-yield nuclear test verification
Pith reviewed 2026-05-08 16:29 UTC · model grok-4.3
The pith
Machine learning models trained on simulated gamma spectra can infer fission yields from post-test measurements with over 95 percent accuracy near a 1 kg TNT threshold.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using 66 million three-dimensional Monte Carlo particle-transport simulations of very low-yield tests in varied containments, gamma spectra measured outside the vessel are generated. Eighty-two fission-product-to-plutonium-239 peak ratios are extracted from each spectrum. XGBoost classifiers and regressors trained on these ratios classify whether yield exceeds a chosen threshold with greater than 95 percent accuracy for yields within plus or minus 100 grams of a 1 kg TNT threshold and estimate the actual yield with a mean absolute relative error of 12.4 percent when spectra are recorded between one month and one year after the test.
What carries the argument
XGBoost models trained on 82 fission-product-to-plutonium-239 peak ratios extracted from simulated gamma spectra measured outside containment vessels.
If this is right
- On-site gamma measurements can be used to verify compliance with the zero-yield standard of the Comprehensive Nuclear-Test-Ban Treaty.
- Yield estimates with roughly 12 percent relative error become available from data collected up to one year after the test.
- A yield-threshold verification regime becomes technically feasible without requiring exhaustive knowledge of every experimental parameter.
- The same simulation-to-measurement pipeline can be retrained for other yield thresholds or detection times.
Where Pith is reading between the lines
- Real debris measurements from an actual low-yield test would provide the decisive test of whether simulation-trained models transfer without retraining.
- Combining gamma-derived yields with seismic or radionuclide data could tighten uncertainty bounds beyond what either method achieves alone.
- If the approach works on real data, it could be adapted to verify subcritical experiments that stay below the fission threshold.
Load-bearing premise
The 66 million Monte Carlo scenarios cover enough real-world variability in containment geometry, detector placement, and fission-product transport that models trained only on simulation will perform similarly on actual post-test debris.
What would settle it
Measure gamma spectra after a real controlled very low-yield test whose actual fission yield is known independently, apply the trained models, and check whether the predicted yield or threshold classification matches the known value within the reported error bands.
Figures
read the original abstract
Very low-yield nuclear tests pose a major verification challenge for the zero-yield standard of the Comprehensive Nuclear-Test-Ban Treaty (CTBT). The zero-yield standard prohibits any explosive experiment that produces a self-sustaining fission chain reaction while allowing subcritical experiments. Previous research shows that on-site gamma spectroscopy of post-test debris provides useful insight into the criticality level, although it remains heavily dependent on knowledge of certain experimental settings. Here, we adopt a new approach whereby machine learning models are trained on simulated gamma spectroscopy data to infer the fission yield of a nuclear very low-yield test. Using high-fidelity 3D Monte Carlo particle transport simulations, we generated gamma spectra measured outside containment vessels after very low-yield tests for 66 million representative scenarios. From these spectra, we extracted 82 fission-product-to-plutonium-239 peak ratios, then trained ML models for two tasks: (1) binary classification of whether a test exceeded a chosen yield threshold, and (2) regression to estimate the actual yield. We find that XGBoost performs best on the classification task across the most policy-relevant yield range. The classifier achieves high accuracy even for yields near the chosen threshold (e.g., >95% for yields +-100 g around a threshold at 1 kg TNT), and the regressor presents a mean absolute relative error of 12.4% for measurements taken a month to a year after the test. These results demonstrate that using machine learning to infer the yield of a past very low-yield nuclear test from gamma spectroscopy data is feasible and accurate. This approach can support efforts to establish a robust verification protocol for the zero-yield standard and could pave the way for a future yield threshold-based verification regime that is both technically feasible and politically viable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a machine learning approach using XGBoost trained on simulated gamma spectroscopy data from 66 million Monte Carlo scenarios to classify and regress the fission yield of very low-yield nuclear tests. It reports >95% accuracy for binary classification around a 1 kg TNT yield threshold and 12.4% mean absolute relative error for regression on data from 1 month to 1 year post-test, based on 82 fission-product peak ratios. The goal is to aid verification of the CTBT zero-yield standard.
Significance. Should the simulation fidelity prove sufficient for real-world application, this could offer a practical tool for inferring yields from on-site gamma measurements, potentially strengthening verification protocols for subcritical and very low-yield tests. The scale of the simulation dataset and the focus on policy-relevant yield ranges are notable strengths.
major comments (2)
- [Results] The high performance metrics (>95% classification accuracy near the 1 kg TNT threshold and 12.4% MARE regression) are obtained exclusively from held-out simulated data. No real gamma spectra from nuclear tests or comparisons to independently measured fission yields are presented to validate transferability to actual post-test debris.
- [Methods] The description of the 66 million Monte Carlo scenarios does not include quantitative assessment or sensitivity analysis of coverage for key real-world variabilities such as containment vessel geometries, detector placements, material compositions, and fission-product transport physics. This assumption is load-bearing for the claim that models trained only on simulation will perform similarly on real measurements.
minor comments (2)
- [Abstract] The abstract could briefly specify the Monte Carlo code and physics models employed (e.g., MCNP or GEANT4) to allow readers to assess the fidelity claim.
- [Methods] Clarify how the 82 peak ratios were selected from the spectra and whether feature importance analysis was performed to justify their use over other possible observables.
Simulated Author's Rebuttal
We thank the referee for their constructive review and positive assessment of the work's significance for CTBT verification. We address each major comment below and have revised the manuscript accordingly where feasible.
read point-by-point responses
-
Referee: The high performance metrics (>95% classification accuracy near the 1 kg TNT threshold and 12.4% MARE regression) are obtained exclusively from held-out simulated data. No real gamma spectra from nuclear tests or comparisons to independently measured fission yields are presented to validate transferability to actual post-test debris.
Authors: We acknowledge that all reported metrics derive from held-out simulated data. Real gamma spectra and independently measured yields from actual nuclear tests are not publicly available due to classification restrictions. Our study is framed as a simulation-based feasibility demonstration for an ML approach that could support verification when real measurements become accessible. We have added a dedicated limitations subsection in the discussion that explicitly states the simulation-only validation and outlines future validation pathways, including surrogate experiments with known low-yield fission sources and potential use of declassified historical data if released. revision: partial
-
Referee: The description of the 66 million Monte Carlo scenarios does not include quantitative assessment or sensitivity analysis of coverage for key real-world variabilities such as containment vessel geometries, detector placements, material compositions, and fission-product transport physics. This assumption is load-bearing for the claim that models trained only on simulation will perform similarly on real measurements.
Authors: We agree that explicit quantitative coverage assessment strengthens the methods. We have revised the simulation description section to include tables summarizing the sampled ranges and distributions for containment geometries, detector placements, material compositions, and fission-product transport parameters, with references to the literature sources used to define those ranges. In addition, we performed a limited sensitivity analysis on a 1% subsample of scenarios and report the resulting variation in peak ratios in the supplementary material. These additions address the coverage concern without requiring a full recomputation of the 66 million scenarios. revision: yes
- Direct validation against real post-test gamma spectra and independently measured fission yields from nuclear tests, as such data remains classified and unavailable for this study.
Circularity Check
No circularity: standard supervised ML on independent simulated data
full rationale
The paper generates 66 million Monte Carlo scenarios with known fission yields as explicit inputs, simulates gamma spectra, extracts 82 peak ratios as features, and trains/evaluates XGBoost for classification and regression on held-out simulation runs. Reported metrics (>95% accuracy near 1 kg threshold, 12.4% MARE) are direct performance on synthetic test data whose labels were never fitted from the spectra. No equations, self-citations, or ansatzes reduce the claimed accuracy to the training labels by construction. The derivation is self-contained within the simulated ensemble; generalization risk to real debris is an external validity issue, not circularity.
Axiom & Free-Parameter Ledger
free parameters (2)
- yield threshold
- XGBoost hyperparameters
axioms (2)
- domain assumption High-fidelity 3D Monte Carlo particle transport accurately reproduces measured gamma spectra outside containment vessels
- domain assumption Fission-product gamma yields and branching ratios from nuclear databases are correct
Reference graph
Works this paper leans on
-
[1]
measured gamma spectra are subject to statistical and instrumental uncertainties, as well as background noise. III. MACHINE LEARNING METHODS FOR YIELD INFERENCE A. Machine learning applications in gamma-ray spectroscopy ML has become a powerful tool for extracting quantitative information from gamma-ray spectra, particularly when complex spectral overlaps...
1974
-
[2]
For each threshold, the model was only trained and tested on a local yield window spanning one order of magnitude on both sides of the threshold (e.g
Classification Figure 3 plots metrics for the classification task across a range of thresholds from 1 g to 100 kg TNT. For each threshold, the model was only trained and tested on a local yield window spanning one order of magnitude on both sides of the threshold (e.g. from 100 g to 10 kg for a threshold at 1 kg TNT). 11 As seen on the graph, all metrics ...
-
[3]
We observe that predictions are overall unbiased, as the cloud of predicted points is evenly distributed around the line of perfect prediction
Regression Figure 5 shows the regression model’s predicted yield against the actual yield for each dataset instance from 1 g to 100 kg TNT. We observe that predictions are overall unbiased, as the cloud of predicted points is evenly distributed around the line of perfect prediction. We also note that the model is less precise in predicting yield in the lo...
-
[4]
Classification Figure 7 shows the top 20 peak ratios by their mean absolute SHAP value for different yield thresholds. Here, the mean absolute SHAP value measures, on average, how strongly a ratio pushes XGBoost’s raw output score (which is then converted into a class probability via the sigmoid function), with larger values indicating a stronger influenc...
1988
-
[5]
Regression For the regression task, a single SHAP value ranking has been computed across the whole yield range, as presented in Figure 9. Here, the mean absolute SHAP value measures, on average, how strongly a feature pushes XGBoost’s predicted log10(yield in kg TNT), with larger values indicating a stronger influence on the predicted yield, regardless of...
-
[6]
The upper-left plot, which investigates the impact of the time after test, shows the most complex pattern
Classification Figure 11 shows the model’s misclassification rates across different parameter values and yield thresholds (one curve per threshold). The upper-left plot, which investigates the impact of the time after test, shows the most complex pattern. At the lowest threshold (0.001 kg TNT), measurements taken one year after the test are more likely to...
-
[7]
Regression Similar results are obtained for the regression model, as can be seen in Figure 12. Here, metrics such as the log 10 MAE and the percentage within a factor of x represent yield-average metrics, contrasting with the per-yield-threshold metrics presented for the classification model. The top- left plot shows the impact of the time after test and ...
-
[8]
wide-narrow
Parameters’ training ranges Figure 13 shows how the training ranges used for test and measurement parameters impact the performance of the model, both for classification and regression tasks, using respective metrics. To investigate the impact of the yield training range on the classification task, we used three different threshold-specific windows for th...
-
[9]
Adding parameters as training features In previous approaches, the peak ratios from the spectra were the only features used to train and test the model. Here, we investigate the model’s performance when parameters such as the time after test, the shielding effect, and the mass of plutonium are added as input features. Figure 14 plots the respective metric...
-
[10]
Department of State
U.S. Department of State. Nuclear Test Ban Treaty. Treaties and Other International Agree- ments Series #5433, National Archives, 1963.https://www.archives.gov/milestone-documents/ test-ban-treaty
1963
-
[11]
Department of State
U.S. Department of State. Treaty Between The United States of America and The Union of Soviet Socialist Republics on the Limitation of Underground Nuclear Weapon Tests (and Protocol Thereto) (TTBT). Bureau of Arms Control, Verification, and Compliance, n.d.https://2009-2017.state. gov/t/isn/5204.htm
2009
-
[12]
United Nations General Assembly. (1996). Comprehensive Nuclear-Test-Ban Treaty. 35 I.L.M. 1439. Adopted September 10, 1996 (A/RES/50/245). Retrieved fromhttps://treaties.un.org/pages/ ViewDetails.aspx?src=TREATY&mtdsg_no=XXVI-4&chapter=26
1996
-
[13]
The Comprehensive Nuclear Test Ban Treaty: Technical Issues Related to the Comprehensive Nuclear Test Ban Treaty
National Academy of Sciences. The Comprehensive Nuclear Test Ban Treaty: Technical Issues Related to the Comprehensive Nuclear Test Ban Treaty. The National Academies Press, Washington, DC, 2002
2002
-
[14]
Congress, Senate, Subcommittee on International Security, Proliferation, and Federal Services of the Committee on Governmental Affairs.Safety and Reliability of the U.S
U.S. Congress, Senate, Subcommittee on International Security, Proliferation, and Federal Services of the Committee on Governmental Affairs.Safety and Reliability of the U.S. Nuclear Deterrent. S. Hrg. 105-267. 105th Cong., 1st Sess. Washington, DC: U.S. Government Publishing Office, October 27, 1997
1997
-
[15]
htm, accessed April 9, 2025
Key P-5 Public Statements on CTBT Scope,https://2009-2017.state.gov/t/avc/rls/173945. htm, accessed April 9, 2025
2009
-
[16]
Department of State,Scope of the Comprehensive Nuclear Test-Ban Treaty, 2013
U.S. Department of State,Scope of the Comprehensive Nuclear Test-Ban Treaty, 2013
2013
-
[17]
Nuclear Weapon Development Without Nuclear Test- ing?
Richard L. Garwin and Vadim A. Simonenko, “Nuclear Weapon Development Without Nuclear Test- ing?” Prepared for the Pugwash Workshop on Problems in Achieving a Nuclear-Weapon-Free World, London, England, October 25–27, 1996. IBM Research Division, Thomas J. Watson Research Center and Russian Federal Nuclear Center, Institute of Technical Physics (VNIITV), ...
1996
-
[18]
New confidence- building measures can reduce tensions around subcritical tests,
Julien de Troullioud de Lanversin and Christopher Fichtlscherer, “New confidence- building measures can reduce tensions around subcritical tests,”Bulletin of the Atomic Scientists, March 7, 2024. Available at:https://thebulletin.org/premium/2024-03/ new-confidence-building-measures-can-reduce-tensions-around-subcritical-tests/, accessed July 23, 2025
2024
-
[19]
Trump’s Nuclear Test Rhetoric and Reality,
Daryl G. Kimball, “Trump’s Nuclear Test Rhetoric and Reality,”Arms Control To- day, December 2025. Available:https://www.armscontrol.org/act/2025-12/focus/ trumps-nuclear-test-rhetoric-and-reality
2025
-
[20]
Allegations of a Chinese nuclear blast may reignite weapons testing,
Richard Stone, “Allegations of a Chinese nuclear blast may reignite weapons testing,” Science, 24 February 2026. Available:https://www.science.org/content/article/ allegations-chinese-nuclear-blast-may-reignite-weapons-testing
2026
-
[21]
Trump administration discussed conducting first U.S. nuclear test in decades,
Dan Lamothe, “Trump administration discussed conducting first U.S. nuclear test in decades,”The Washington Post, May 22, 2020. Available at:https://www.washingtonpost.com/national-security/ trump-administration-discussed-conducting-first-us-nuclear-test-in-decades/2020/ 05/22/a805c904-9c5b-11ea-b60c-3be060a4f8e1_story.html, accessed July 23, 2025
2020
-
[22]
Battle Lines Being Drawn in the CTBT Debate: An Analysis of the Strate- gic Posture Commission’s Arguments against US Ratification,
Kaegan McGrath, “Battle Lines Being Drawn in the CTBT Debate: An Analysis of the Strate- gic Posture Commission’s Arguments against US Ratification,” James Martin Center for Non- 31 proliferation Studies, July 7, 2009. Available at:https://www.nti.org/analysis/articles/ ctbt-debate-analysis/, accessed July 23, 2025
2009
-
[23]
Remarks by NNSA Administrator Jill Hruby at the CTBT: Science and Technology Conference 2023,
Jill Hruby, “Remarks by NNSA Administrator Jill Hruby at the CTBT: Science and Technology Conference 2023,” National Nuclear Security Administration, U.S. Depart- ment of Energy, 19 June 2023. Available:https://www.energy.gov/nnsa/articles/ remarks-nnsa-administrator-jill-hruby-ctbt-science-and-technology-conference-2023
2023
-
[24]
University of California Press, Berkeley, 1992
Serber, Robert.The Los Alamos Primer: The First Lectures on How to Build an Atomic Bomb. University of California Press, Berkeley, 1992. Annotated by Robert Serber, Introduction by Richard Rhodes
1992
-
[25]
Very low-yield nuclear test verification with post-experiment gamma spectroscopy,
J. de Troullioud de Lanversin, C. Fichtlscherer, M. K¨ utt. “Very low-yield nuclear test verification with post-experiment gamma spectroscopy,”Physical Review Applied(2026)
2026
-
[26]
M. Zehtabvar, K. Taghandiki, N. Madani, D. Sardari, B. Bashiri, A Review on the Application of Machine Learning in Gamma Spectroscopy: Challenges and Opportunities, Spectroscopy Journal 2 (2024) 123–144. doi:10.3390/spectroscj2030008
-
[27]
Bandstra, M.S., Curtis, J.C., Ghawaly Jr., J.M., Jones, A.C., Joshi, T.H.Y., 2023. Explaining machine- learning models for gamma-ray detection and identification.PLOS ONE18(6), e0286829.https: //doi.org/10.1371/journal.pone.0286829
-
[28]
Galib, S.M., Bhowmik, P.K., Avachat, A.V., Lee, H.K., 2021. A comparative study of machine learning methods for automated identification of radioisotopes using NaI gamma-ray spectra.Nuclear Engineer- ing and Technology53(12), 4072–4079.https://doi.org/10.1016/j.net.2021.06.020
-
[29]
doi:10.1016/j.anucene.2024.110601
P´ eter Kirchknopf, B´ alint Batki, P´ eter V¨ olgyesi, Zolt´ an Kat´ o, Imre Szal´ oki, Application of machine learning methods for spent fuel characterization based on gamma spectrometry measurements, Annals of Nuclear Energy 205 (2024) 110601. doi:10.1016/j.anucene.2024.110601
-
[30]
Sun, Y., Tuo, F., Lin, W., Zhou, Q., Yang, B., 2025. Machine learning application in NaI(Tl) gamma-ray spectroscopy for radionuclide identification: A systematic review.Radiation Medicine and Protection 6(5), 251–258.https://doi.org/10.1016/j.radmp.2025.09.009
-
[31]
C. Landsmeer, G. Marcer, A. Dal Molin, M. Rebai, D. Rigamonti, B. Coriton, G. Gorini, M. Guerini Rocco, A. Kovalev, A. Muraro, M. Nocente, E. Perelli Cippo, A. Polevoi, O. Putignano, F. Scioscioli, G. Croci, M. Tardocchi, A machine learning case study in nuclear fusion: Assessment of the absolute deuterium-tritium fusion power of ITER with gamma-ray spect...
-
[32]
J. R. Romo, K. T. Nelson, M. Monterial, K. E. Nelson, S. E. Labov, A. Hecht, Classifier Comparison for Radionuclide Identification from Gamma-ray Spectra, INMM & ESDARSA Joint Virtual Annual Meeting, Vienna, Austria, August 23–September 1, 2021
2021
-
[33]
N. J. Nicholas, K. L. Coop, R. J. Estep, Capability and Limitation Study of the DDT Passive-Active Neutron Waste Assay Instrument, Technical Report LA-12237-MS, Los Alamos National Laboratory, 1992
1992
-
[34]
OpenMC: A state-of-the-art Monte Carlo code for research and develop- ment,
P. K. Romano and others, “OpenMC: A state-of-the-art Monte Carlo code for research and develop- ment,”Annals of Nuclear Energy82, 90–97 (2015)
2015
-
[35]
Simulation of Neutron and Gamma Ray Emission from Fission and Photofission,
J. M. Verbeke, C. Hagmann, and D. Wright, “Simulation of Neutron and Gamma Ray Emission from Fission and Photofission,” Tech. Rep. UCRL-AR-228518-REV-1, Lawrence Livermore National Labo- ratory (2016),https://nuclear.llnl.gov/simulation/fission.pdf, Visited on May 3, 2024
2016
-
[36]
Base Materials For Critical Applications: Requirements For Low Alloy Steel Plate, Forgings, Cast- ings, Shapes, Bars, And Heads Of Hy-80/100/130 And Hsla-80/100, Technical Report, Direction of Commander, Naval Sea Systems Command, 2012
2012
-
[37]
ONIX: An open-source depletion code.Annals of Nuclear Energy,151, 107903, 2021
de Troullioud de Lanversin, J., K¨ utt, M., and Glaser, A. ONIX: An open-source depletion code.Annals of Nuclear Energy,151, 107903, 2021
2021
-
[38]
NuDat 3.0
National Nuclear Data Center. NuDat 3.0. n.d
-
[39]
Lund, A. L. and Romano, P. K. Implementation and Validation of Photon Transport in OpenMC. Technical Report ANL/MCS-TM-381, Argonne National Laboratory, 2018
2018
-
[40]
Modeling gamma detectors in OpenMC: Validation of a newly implemented pulse-height tally.Progress in Nuclear Energy,172, 105186, 2024
Fichtlscherer, C., Miah, M., Frieß, F., G¨ ottsche, M., and K¨ utt, M. Modeling gamma detectors in OpenMC: Validation of a newly implemented pulse-height tally.Progress in Nuclear Energy,172, 105186, 2024
2024
-
[41]
Goodell, J. J. and Roberts, K. E. Investigating the practicality of a minimally defined co-axial HPGe detector model using MCNP.Journal of Radioanalytical and Nuclear Chemistry,322, 1965–1973, 2019. 32
1965
-
[42]
National Research Council,The Comprehensive Nuclear Test Ban Treaty: Technical Issues for the United States(The National Academies Press, Washington, DC, 2012)
2012
-
[43]
R. Johnson,Unfinished business: The negotiation of the CTBT and the end of nuclear testing(United Nations Institute for Disarma- ment Research, 2009).https://www.unidir.org/files/publications/pdfs/ unfinished-business-the-negotiation-of-the-ctbt-and-the-end-of-nuclear-testing-346. pdf
2009
-
[44]
Knoll,Radiation Detection and Measurement(John Wiley & Sons, Inc., New York, 2000)
G. Knoll,Radiation Detection and Measurement(John Wiley & Sons, Inc., New York, 2000). 33
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.