arxiv: 2604.07520 · v1 · submitted 2026-04-08 · ✦ hep-ph · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Lecture notes on Machine Learning applications for global fits

Jorge Alda

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:06 UTC · model grok-4.3

classification ✦ hep-ph cs.LG

keywords machine learningglobal fitsaxion-like particleslog-likelihood surrogateboosted decision treesBelle IIB to K nu nu anomaly

0 comments

The pith

Boosted decision trees can surrogate the log-likelihood to make global fits feasible for axion-like particles under Belle II constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

These lecture notes outline a practical workflow for using machine learning to carry out global statistical fits in high-energy physics when direct likelihood evaluations are too slow. The central idea is to train boosted decision trees on strategically sampled points so they approximate the log-likelihood surface accurately enough for minimization and sampling. The notes apply this method to the parameter space of axion-like particles, showing that a two-stage model can scan the space while enforcing experimental limits on decay lengths and flavor-violating couplings from the B to K neutrino antineutrino anomaly. A sympathetic reader would care because traditional fits become prohibitive as models grow more complex, and the notes supply the concrete steps needed to keep them tractable.

Core claim

The notes establish that boosted decision trees, trained via active learning and Gaussian processes, serve as reliable surrogates for the log-likelihood function in global fits. This surrogate enables efficient profile-likelihood minimization and Markov Chain Monte Carlo sampling of posterior distributions. When applied to axion-like particle models explaining the B± to K± νν̄ anomaly at Belle II, the two-stage ML model explores the relevant parameter space while automatically satisfying constraints on decay lengths and flavor-violating couplings.

What carries the argument

Boosted decision trees as surrogates for the log-likelihood function, trained on active-learning samples and combined with MCMC sampling.

If this is right

Global fits with many parameters and expensive predictions become computationally practical.
Constraints on ALP decay lengths and flavor-violating couplings are enforced automatically during the scan.
Posterior distributions for ALP parameters can be obtained via MCMC on the fast surrogate model.
SHAP values provide quantitative insight into which parameters and interactions dominate the fit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surrogate workflow could be adapted to other rare-decay anomalies or to models with similar computational bottlenecks in flavor physics.
Active-learning strategies for generating training data might generalize to reduce the simulation cost of fits in cosmology or neutrino oscillation analyses.
Combining the boosted-tree surrogate with neural-network emulators could further improve speed or accuracy for even higher-dimensional spaces.

Load-bearing premise

The boosted decision tree surrogate must reproduce the true log-likelihood surface closely enough that the resulting parameter constraints and posteriors remain unbiased.

What would settle it

A direct numerical comparison in a simplified ALP model where both the full likelihood and the surrogate can be evaluated exhaustively shows that the best-fit values or credible intervals differ by more than the reported statistical uncertainty.

Figures

Figures reproduced from arXiv: 2604.07520 by Jorge Alda.

**Figure 2.** Figure 2: Example of regression tree with four leaves. In red, application of the [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Training evolution of two xgboost models for classification tasks. In the model in the left, the loss function in the validation dataset improves at a similar rate as the training dataset, indicating that the model is able to correctly generalize. In the right, with different hyperparameters and without early stopping, the model just learns the training dataset and is not able to generalize to the validati… view at source ↗

**Figure 4.** Figure 4: Prediction models that would we necessary to train for three features in [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Local explanations of ML predictions: (Left) SHAP values arranged in a [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Global explanations of ML predictions: (Left) Averages of SHAP values, [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: MCMC chains computed with emcee: (Left) Values of the parameter θ in 50 chains from the same ensemble. The densest regions correspond to larger posterior probability. (Right) Detail of two of those chains over 3 autocorrelation times. The parameter remains almost constant during periods of order τ, showing that the points in the chain are correlated. where the stretch factor γ is drawn from the probability… view at source ↗

**Figure 8.** Figure 8: Corner plot of the posterior distribution of three functions of the parame [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Feynman diagrams for the b → sνν¯ process in the SM: penguin diagram (left) and box diagram (right). Code snippet 2.13: corner plots import corner f1_chain = [f1(x) for x in chain] f2_chain = [f2(x) for x in chain] sampled_values = np.vstack([f1_chain, f2_chain]).T _ = corner.corner( sampled_values, levels = [0.393, 0.864], # 1sigma and 2sigma in 2D smooth = 1.5, show_titles = True ) 3 Application: B + → K… view at source ↗

**Figure 10.** Figure 10: Feynman diagram for the b → sa process where the effective operator is generated by ALP interactions with t quarks. It found a branching ratio for this process which is 2.7σ larger than the SM prediction. While this result is not significant enough to claim the discovery of new physics, it has spurred a lot of interest in the phenomenology community. Here we will use the Machine Learning tools to examine … view at source ↗

read the original abstract

These lecture notes provide a comprehensive framework for performing global statistical fits in high-energy physics using modern Machine Learning (ML) surrogates. We begin by reviewing the statistical foundations of model building, including the likelihood function, Wilks' theorem, and profile likelihoods. Recognizing that the computational cost of evaluating model predictions often renders traditional minimization prohibitive, we introduce Boosted Decision Trees to approximate the log-likelihood function. The notes detail a robust ML workflow including efficient generation of training data with active learning and Gaussian processes, hyperparameter optimization, model compilation for speed-up, and interpretability through SHAP values to decode the influence of model parameters and interactions between parameters. We further discuss posterior distribution sampling using Markov Chain Monte Carlo (MCMC). These techniques are finally applied to the $B^\pm \to K^\pm \nu \bar{\nu}$ anomaly at Belle II, demonstrating how a two-stage ML model can efficiently explore the parameter space of Axion-Like Particles (ALPs) while satisfying stringent experimental constraints on decay lengths and flavor-violating couplings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Lecture notes that lay out a standard BDT surrogate workflow for HEP global fits with an ALP example, but without reported validation numbers on accuracy near constraints.

read the letter

These lecture notes compile a practical pipeline for replacing slow likelihood evaluations with boosted decision tree surrogates in high-energy physics global fits, then apply it to axion-like particles in the Belle II anomaly. The core idea is to use active learning plus Gaussian processes for training data, train the BDT, add SHAP interpretability, and finish with MCMC sampling while respecting decay-length and flavor cuts via a two-stage model. That workflow is straightforward and addresses a real bottleneck in scanning constrained parameter spaces. The notes also start from basic statistical ground like Wilks' theorem and profile likelihoods, which keeps the presentation self-contained for readers who need the full picture. The inclusion of SHAP values to unpack parameter influences and interactions is a useful touch that many HEP ML efforts overlook. The two-stage handling of hard experimental boundaries is a reasonable engineering choice for this specific case. The main limitation is the absence of quantitative checks. The description mentions the surrogate approach but gives no hold-out error metrics, no direct comparison of surrogate versus exact log-likelihood values, and no targeted tests near the sharp decay-length or flavor-violation boundaries. Tree-based models can blur or misclassify those discontinuities, which risks biasing the extracted posteriors on couplings if the penalty in excluded regions is underestimated. Because this is framed as lecture notes rather than a methods paper with benchmarks, the gap is not surprising, but it does limit how far one can trust the ALP results without further work. The notes are aimed at phenomenologists and experimentalists running global fits who want a concrete, step-by-step recipe for adding ML speed-ups. A reader already familiar with BDTs and active learning in HEP will not find new techniques, but someone setting up their first surrogate fit could pick up the structure and the interpretability step. I would not send this to peer review as a journal article; it reads as pedagogical material better suited to a proceedings or arXiv notes format unless the authors add concrete validation numbers on surrogate fidelity.

Referee Report

1 major / 1 minor

Summary. These lecture notes review the statistical foundations of global fits in high-energy physics and present a workflow for using Boosted Decision Trees (BDTs) as surrogates to approximate the log-likelihood function. The notes cover active learning for training data generation, Gaussian processes, hyperparameter optimization, model compilation, SHAP-based interpretability, and MCMC sampling for posteriors. The framework is applied to the B± → K± ν ν̄ anomaly at Belle II to constrain Axion-Like Particle (ALP) parameters while enforcing experimental constraints on decay lengths and flavor-violating couplings, claiming that a two-stage ML model enables efficient parameter space exploration.

Significance. If the BDT surrogates are shown to accurately reproduce the true log-likelihood surface, including near hard experimental boundaries, the approach would provide a practical method to accelerate computationally intensive global fits in HEP, particularly for models like ALPs with many parameters and constraints. The emphasis on interpretability via SHAP values and active learning for data efficiency strengthens the pedagogical value for practitioners.

major comments (1)

[ALP application section] In the application to the B± → K± ν ν̄ anomaly and ALP constraints: the central claim that the two-stage BDT surrogate enables efficient exploration of parameter space while satisfying stringent decay-length and flavor-violation cuts requires that the surrogate faithfully reproduces the log-likelihood, including discontinuities at boundaries. No quantitative hold-out validation metrics, boundary-specific error tests, or comparisons to exact likelihood evaluations in excluded regions are reported, leaving open the possibility of biases in the MCMC-derived posteriors on flavor-violating couplings.

minor comments (1)

[Abstract and application] The description of the 'two-stage ML model' in the abstract and application could be clarified with an explicit diagram or pseudocode showing how the stages interact with the active learning loop.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of the pedagogical value of these lecture notes and for highlighting the importance of rigorous validation for the surrogate model in the ALP application. We address the major comment point by point below.

read point-by-point responses

Referee: [ALP application section] In the application to the B± → K± ν ν̄ anomaly and ALP constraints: the central claim that the two-stage BDT surrogate enables efficient exploration of parameter space while satisfying stringent decay-length and flavor-violation cuts requires that the surrogate faithfully reproduces the log-likelihood, including discontinuities at boundaries. No quantitative hold-out validation metrics, boundary-specific error tests, or comparisons to exact likelihood evaluations in excluded regions are reported, leaving open the possibility of biases in the MCMC-derived posteriors on flavor-violating couplings.

Authors: We agree that quantitative validation is essential to support the claim that the two-stage BDT surrogate faithfully reproduces the log-likelihood surface, particularly near the hard experimental boundaries. The current lecture notes emphasize the overall workflow, active learning strategy, and interpretability tools, with the ALP example serving primarily as an illustration rather than a fully benchmarked case study. To address this, we will add a dedicated subsection in the revised manuscript that reports hold-out validation metrics (such as RMSE and R² on a test set of exact log-likelihood evaluations) and performs boundary-specific tests by comparing surrogate predictions against exact evaluations in regions excluded by decay-length and flavor-violation constraints. These additions will directly assess whether the MCMC posteriors on flavor-violating couplings could be biased. revision: yes

Circularity Check

0 steps flagged

Methodological ML surrogate pipeline for global fits exhibits no circularity

full rationale

The lecture notes outline a standard workflow: statistical foundations (likelihood, Wilks' theorem), BDT approximation of log-likelihood, active learning with Gaussian processes for training data, hyperparameter tuning, SHAP interpretability, and MCMC sampling. This pipeline is applied to ALP parameter exploration under experimental constraints for the B± → K± νν̄ anomaly. No load-bearing step reduces by construction to its own inputs, fitted parameters renamed as predictions, or self-citation chains; the description relies on established ML and statistical techniques without self-definitional loops or ansatz smuggling. The central claim of efficient exploration is a practical methodology, not a derivation that collapses to tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The notes rely on standard statistical concepts (likelihood, Wilks' theorem, profile likelihoods) and established ML methods. No new free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5471 in / 1120 out tokens · 36152 ms · 2026-05-10T17:06:27.872772+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

two-stage ML model ... Boosted Decision Trees to approximate the log-likelihood function ... active learning and Gaussian processes ... SHAP values ... Markov Chain Monte Carlo (MCMC) ... ALP-aca for χ²
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

B± → K±νν̄ anomaly ... Axion-Like Particles ... decay lengths and flavor-violating couplings

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

4 extracted references · 1 canonical work pages

[1]

H. A. Bethe, Zur Theorie der Metalle. i. Eigenwerte und Eigenfunktionen der linearen Atomkette , Zeit. f \"u r Phys. 71 , 205 (1931), 10.1007\

1931
[2]

Ginsparg, It was twenty years ago today

P. Ginsparg, It was twenty years ago today... , http://arxiv.org/abs/1108.2700

work page arXiv
[3]

, " * write output.state after.block =

ENTRY address archive author booktitle chapter doi edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type url volume year label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 'af...
[4]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...