pith. machine review for the scientific record. sign in

arxiv: 2604.07520 · v1 · submitted 2026-04-08 · ✦ hep-ph · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Lecture notes on Machine Learning applications for global fits

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:06 UTC · model grok-4.3

classification ✦ hep-ph cs.LG
keywords machine learningglobal fitsaxion-like particleslog-likelihood surrogateboosted decision treesBelle IIB to K nu nu anomaly
0
0 comments X

The pith

Boosted decision trees can surrogate the log-likelihood to make global fits feasible for axion-like particles under Belle II constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

These lecture notes outline a practical workflow for using machine learning to carry out global statistical fits in high-energy physics when direct likelihood evaluations are too slow. The central idea is to train boosted decision trees on strategically sampled points so they approximate the log-likelihood surface accurately enough for minimization and sampling. The notes apply this method to the parameter space of axion-like particles, showing that a two-stage model can scan the space while enforcing experimental limits on decay lengths and flavor-violating couplings from the B to K neutrino antineutrino anomaly. A sympathetic reader would care because traditional fits become prohibitive as models grow more complex, and the notes supply the concrete steps needed to keep them tractable.

Core claim

The notes establish that boosted decision trees, trained via active learning and Gaussian processes, serve as reliable surrogates for the log-likelihood function in global fits. This surrogate enables efficient profile-likelihood minimization and Markov Chain Monte Carlo sampling of posterior distributions. When applied to axion-like particle models explaining the B± to K± νν̄ anomaly at Belle II, the two-stage ML model explores the relevant parameter space while automatically satisfying constraints on decay lengths and flavor-violating couplings.

What carries the argument

Boosted decision trees as surrogates for the log-likelihood function, trained on active-learning samples and combined with MCMC sampling.

If this is right

  • Global fits with many parameters and expensive predictions become computationally practical.
  • Constraints on ALP decay lengths and flavor-violating couplings are enforced automatically during the scan.
  • Posterior distributions for ALP parameters can be obtained via MCMC on the fast surrogate model.
  • SHAP values provide quantitative insight into which parameters and interactions dominate the fit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same surrogate workflow could be adapted to other rare-decay anomalies or to models with similar computational bottlenecks in flavor physics.
  • Active-learning strategies for generating training data might generalize to reduce the simulation cost of fits in cosmology or neutrino oscillation analyses.
  • Combining the boosted-tree surrogate with neural-network emulators could further improve speed or accuracy for even higher-dimensional spaces.

Load-bearing premise

The boosted decision tree surrogate must reproduce the true log-likelihood surface closely enough that the resulting parameter constraints and posteriors remain unbiased.

What would settle it

A direct numerical comparison in a simplified ALP model where both the full likelihood and the surrogate can be evaluated exhaustively shows that the best-fit values or credible intervals differ by more than the reported statistical uncertainty.

Figures

Figures reproduced from arXiv: 2604.07520 by Jorge Alda.

Figure 1
Figure 1. Figure 1: Example of the prediction of a trained Gaussian process: The line denotes [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of regression tree with four leaves. In red, application of the [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Training evolution of two xgboost models for classification tasks. In the model in the left, the loss function in the validation dataset improves at a similar rate as the training dataset, indicating that the model is able to correctly generalize. In the right, with different hyperparameters and without early stopping, the model just learns the training dataset and is not able to generalize to the validati… view at source ↗
Figure 4
Figure 4. Figure 4: Prediction models that would we necessary to train for three features in [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Local explanations of ML predictions: (Left) SHAP values arranged in a [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Global explanations of ML predictions: (Left) Averages of SHAP values, [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: MCMC chains computed with emcee: (Left) Values of the parameter θ in 50 chains from the same ensemble. The densest regions correspond to larger posterior probability. (Right) Detail of two of those chains over 3 autocorrelation times. The parameter remains almost constant during periods of order τ, showing that the points in the chain are correlated. where the stretch factor γ is drawn from the probability… view at source ↗
Figure 8
Figure 8. Figure 8: Corner plot of the posterior distribution of three functions of the parame [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Feynman diagrams for the b → sνν¯ process in the SM: penguin diagram (left) and box diagram (right). Code snippet 2.13: corner plots import corner f1_chain = [f1(x) for x in chain] f2_chain = [f2(x) for x in chain] sampled_values = np.vstack([f1_chain, f2_chain]).T _ = corner.corner( sampled_values, levels = [0.393, 0.864], # 1sigma and 2sigma in 2D smooth = 1.5, show_titles = True ) 3 Application: B + → K… view at source ↗
Figure 10
Figure 10. Figure 10: Feynman diagram for the b → sa process where the effective operator is generated by ALP interactions with t quarks. It found a branching ratio for this process which is 2.7σ larger than the SM prediction. While this result is not significant enough to claim the discovery of new physics, it has spurred a lot of interest in the phenomenology community. Here we will use the Machine Learning tools to examine … view at source ↗
read the original abstract

These lecture notes provide a comprehensive framework for performing global statistical fits in high-energy physics using modern Machine Learning (ML) surrogates. We begin by reviewing the statistical foundations of model building, including the likelihood function, Wilks' theorem, and profile likelihoods. Recognizing that the computational cost of evaluating model predictions often renders traditional minimization prohibitive, we introduce Boosted Decision Trees to approximate the log-likelihood function. The notes detail a robust ML workflow including efficient generation of training data with active learning and Gaussian processes, hyperparameter optimization, model compilation for speed-up, and interpretability through SHAP values to decode the influence of model parameters and interactions between parameters. We further discuss posterior distribution sampling using Markov Chain Monte Carlo (MCMC). These techniques are finally applied to the $B^\pm \to K^\pm \nu \bar{\nu}$ anomaly at Belle II, demonstrating how a two-stage ML model can efficiently explore the parameter space of Axion-Like Particles (ALPs) while satisfying stringent experimental constraints on decay lengths and flavor-violating couplings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. These lecture notes review the statistical foundations of global fits in high-energy physics and present a workflow for using Boosted Decision Trees (BDTs) as surrogates to approximate the log-likelihood function. The notes cover active learning for training data generation, Gaussian processes, hyperparameter optimization, model compilation, SHAP-based interpretability, and MCMC sampling for posteriors. The framework is applied to the B± → K± ν ν̄ anomaly at Belle II to constrain Axion-Like Particle (ALP) parameters while enforcing experimental constraints on decay lengths and flavor-violating couplings, claiming that a two-stage ML model enables efficient parameter space exploration.

Significance. If the BDT surrogates are shown to accurately reproduce the true log-likelihood surface, including near hard experimental boundaries, the approach would provide a practical method to accelerate computationally intensive global fits in HEP, particularly for models like ALPs with many parameters and constraints. The emphasis on interpretability via SHAP values and active learning for data efficiency strengthens the pedagogical value for practitioners.

major comments (1)
  1. [ALP application section] In the application to the B± → K± ν ν̄ anomaly and ALP constraints: the central claim that the two-stage BDT surrogate enables efficient exploration of parameter space while satisfying stringent decay-length and flavor-violation cuts requires that the surrogate faithfully reproduces the log-likelihood, including discontinuities at boundaries. No quantitative hold-out validation metrics, boundary-specific error tests, or comparisons to exact likelihood evaluations in excluded regions are reported, leaving open the possibility of biases in the MCMC-derived posteriors on flavor-violating couplings.
minor comments (1)
  1. [Abstract and application] The description of the 'two-stage ML model' in the abstract and application could be clarified with an explicit diagram or pseudocode showing how the stages interact with the active learning loop.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the pedagogical value of these lecture notes and for highlighting the importance of rigorous validation for the surrogate model in the ALP application. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [ALP application section] In the application to the B± → K± ν ν̄ anomaly and ALP constraints: the central claim that the two-stage BDT surrogate enables efficient exploration of parameter space while satisfying stringent decay-length and flavor-violation cuts requires that the surrogate faithfully reproduces the log-likelihood, including discontinuities at boundaries. No quantitative hold-out validation metrics, boundary-specific error tests, or comparisons to exact likelihood evaluations in excluded regions are reported, leaving open the possibility of biases in the MCMC-derived posteriors on flavor-violating couplings.

    Authors: We agree that quantitative validation is essential to support the claim that the two-stage BDT surrogate faithfully reproduces the log-likelihood surface, particularly near the hard experimental boundaries. The current lecture notes emphasize the overall workflow, active learning strategy, and interpretability tools, with the ALP example serving primarily as an illustration rather than a fully benchmarked case study. To address this, we will add a dedicated subsection in the revised manuscript that reports hold-out validation metrics (such as RMSE and R² on a test set of exact log-likelihood evaluations) and performs boundary-specific tests by comparing surrogate predictions against exact evaluations in regions excluded by decay-length and flavor-violation constraints. These additions will directly assess whether the MCMC posteriors on flavor-violating couplings could be biased. revision: yes

Circularity Check

0 steps flagged

Methodological ML surrogate pipeline for global fits exhibits no circularity

full rationale

The lecture notes outline a standard workflow: statistical foundations (likelihood, Wilks' theorem), BDT approximation of log-likelihood, active learning with Gaussian processes for training data, hyperparameter tuning, SHAP interpretability, and MCMC sampling. This pipeline is applied to ALP parameter exploration under experimental constraints for the B± → K± νν̄ anomaly. No load-bearing step reduces by construction to its own inputs, fitted parameters renamed as predictions, or self-citation chains; the description relies on established ML and statistical techniques without self-definitional loops or ansatz smuggling. The central claim of efficient exploration is a practical methodology, not a derivation that collapses to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The notes rely on standard statistical concepts (likelihood, Wilks' theorem, profile likelihoods) and established ML methods. No new free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5471 in / 1120 out tokens · 36152 ms · 2026-05-10T17:06:27.872772+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

4 extracted references · 1 canonical work pages

  1. [1]

    H. A. Bethe, Zur Theorie der Metalle. i. Eigenwerte und Eigenfunktionen der linearen Atomkette , Zeit. f \"u r Phys. 71 , 205 (1931), 10.1007\

  2. [2]

    Ginsparg, It was twenty years ago today

    P. Ginsparg, It was twenty years ago today... , http://arxiv.org/abs/1108.2700

  3. [3]

    , " * write output.state after.block =

    ENTRY address archive author booktitle chapter doi edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type url volume year label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 'af...

  4. [4]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...