pith. machine review for the scientific record. sign in

arxiv: 2605.01363 · v1 · submitted 2026-05-02 · ✦ hep-ex · cs.LG· hep-ph· stat.ME

Recognition: unknown

Data-Driven, Geometry-Aware Optimal-Transport Calibration of Flavor Tagger

Authors on Pith no claims yet

Pith reviewed 2026-05-09 13:42 UTC · model grok-4.3

classification ✦ hep-ex cs.LGhep-phstat.ME
keywords transportcalibrationcomponentflavoraitchisonanalysesbecauseclosure
0
0 comments X

The pith

Flavor-tagger calibration is cast as an optimal transport problem on the probability simplex in isometric log-ratio coordinates, with flavor-conditional targets extracted from control data via expectation-maximization using normalizing flows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Particle physics experiments use flavor taggers to guess whether a jet came from a b-quark, c-quark, or light quark. These taggers output probabilities for each flavor, but turning those outputs into reliable numbers for real data has been limited to simple scale factors or coarse bins. The new method instead finds a smooth map that moves the entire simulated probability distribution to match data. It does this by solving an optimal transport problem, which finds the cheapest way to reshape one distribution into another. To make the geometry work, the probabilities are transformed into isometric log-ratio coordinates where ordinary Euclidean distance matches a natural distance on the simplex called Aitchison distance. The target distributions for each flavor are pulled directly from real control-region data using an expectation-maximization loop: normalizing flows model the shape of each flavor component while mixture fractions are fitted at the same time. A separate linearized analysis tracks how uncertainty in those mixture fractions affects the extracted shapes. When tested on simulated events, the calibrated outputs agree better with the control regions than previous methods.

Core claim

The simulation-based closure study demonstrates improved closure in dedicated control regions and in independent validation mixtures.

Load-bearing premise

That the flavor-conditional target distributions extracted from control regions via the joint EM fit accurately represent the true distributions that should be used for calibration in signal regions, and that the learned transport maps generalize without introducing bias.

read the original abstract

Flavor-tagging calibrations are often provided either as scale factors measured at a finite set of working points or as binned corrections to a chosen one-dimensional discriminant. However, this approach falls short of providing continuous, event-level calibration across the full multicomponent outputs of modern taggers. This limitation leads to information loss in analyses that demand high-performance flavor tagging, restricting analyses to a limited set of predefined variables. In this work, we propose a geometry-aware framework that formulates flavor-tagger calibration as an optimal transport problem on the probability simplex. The transport maps are parameterized and trained in the isometric log-ratio coordinate system. Because the quadratic Euclidean cost of Brenier transport in this coordinate system is equivalent to the Aitchison distance on the simplex, the learned map induces a minimal deformation under the Aitchison geometry. Furthermore, we extract flavor-conditional target distributions directly from control-region data using an expectation-maximization (EM) technique that simultaneously fits multiple control regions, models each flavor component with a normalizing flow, and estimates the regional mixture fractions. The extracted targets are subsequently used to learn flavor-factorized transport maps. Because the joint estimation of mixture fractions and flexible component densities admits weakly constrained directions, we further introduce a linearized feedback-operator analysis that propagates the fitted composition covariance into the extracted component densities, separating data-constrained modes from those dominated by the composition prior. The simulation-based closure study demonstrates improved closure in dedicated control regions and in independent validation mixtures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard optimal-transport theory and domain assumptions about control-region representativeness; the main technical additions are the ILR parameterization and the feedback-operator analysis, with no new physical entities postulated.

free parameters (2)
  • normalizing flow parameters
    Parameters of the flows that model each flavor component are fitted to control-region data.
  • transport map parameters
    Parameters of the maps that transport simulated distributions to the extracted targets are trained during the procedure.
axioms (2)
  • standard math The quadratic Euclidean cost of Brenier transport in isometric log-ratio coordinates is equivalent to the Aitchison distance on the simplex.
    Invoked to claim that the learned map induces minimal deformation under the Aitchison geometry.
  • domain assumption Control regions contain sufficient information to extract accurate flavor-conditional distributions via joint EM fitting of mixture fractions and component densities.
    Central premise enabling the data-driven target extraction step.

pith-pipeline@v0.9.0 · 5570 in / 1541 out tokens · 103341 ms · 2026-05-09T13:42:46.991947+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.