Pointwise Metrics Mislead: An Evaluation Protocol for Multimodal Inverse Problems
Pith reviewed 2026-05-25 05:37 UTC · model grok-4.3
The pith
Point estimators minimizing MSE or MAE always produce narrower marginal spectra than the true posterior in multimodal inverse problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that pointwise metrics are structurally misleading for inverse problems with multimodal posteriors. By the law of total variance, point estimators trained to minimize MSE or MAE produce a marginal spectrum strictly narrower than the truth. The bias is independent of architecture, training, and dataset size. A three-part protocol is proposed: CRPS for per-event accuracy, a spectrum-fidelity diagnostic for population marginals, and coverage calibration for uncertainty. On benchmarks, model rankings reverse and calibration distinguishes further.
What carries the argument
The law of total variance decomposition showing that point predictions from a multimodal posterior must have strictly smaller marginal variance than the true distribution.
If this is right
- Model rankings obtained from pointwise metrics reverse when distributional metrics are used instead.
- Calibration checks can separate models that appear equivalent under CRPS alone.
- The choice of evaluation protocol determines the final scientific conclusion about model performance.
- Downstream analyses depending on spectral features will be biased by the use of point estimators.
Where Pith is reading between the lines
- Similar compression effects likely appear in other reconstruction tasks with multimodal posteriors such as medical imaging or astronomical parameter estimation.
- The protocol offers a concrete way to test whether full posterior sampling avoids the variance loss shown for point estimators.
- Existing pipelines that rely only on point metrics may need re-examination to check how much spectral information was lost.
Load-bearing premise
Downstream scientific measurements actually depend on the full shape of the posterior including tails and modes rather than just point estimates or low-order moments.
What would settle it
A direct comparison on a problem with known analytic multimodal posterior where the marginal variance of point predictions equals the true posterior variance instead of being smaller.
Figures
read the original abstract
Evaluation in scientific reconstruction is dominated by pointwise metrics - RMSE, MAE, per-event resolution - under the implicit assumption that lower error means better reconstruction. We show that this assumption fails structurally for inverse problems with multimodal posteriors. By the law of total variance, point estimators trained to minimize MSE or MAE produce a marginal spectrum strictly narrower than the truth whenever the posterior has nonzero width. The resulting bias is independent of architecture, training, and dataset size, and it compresses precisely the spectral features - tails, modes, shapes - that downstream scientific measurements rely on. We propose a three-part evaluation protocol where each step targets a failure mode the others miss: per-event distributional accuracy via CRPS, population-level marginal accuracy via a spectrum-fidelity diagnostic, and uncertainty trustworthiness via coverage-based calibration. On a synthetic benchmark with an analytic posterior and on a realistic many-to-one inverse problem from particle physics, model rankings reverse between pointwise and distributional metrics, and calibration further separates architectures indistinguishable under CRPS. The evaluation protocol, not the model, determines the scientific conclusion.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript argues that pointwise metrics (RMSE, MAE) structurally mislead in multimodal inverse problems: by the law of total variance, point estimators produce marginal spectra narrower than the true posterior whenever posterior width is nonzero, with the bias independent of architecture/training/dataset size and compressing tails/modes/shapes needed for downstream science. It proposes a three-part protocol (CRPS for per-event distributional accuracy, spectrum-fidelity diagnostic for population marginal accuracy, coverage calibration for uncertainty) and demonstrates ranking reversals on a synthetic analytic-posterior benchmark and a particle-physics many-to-one inverse problem.
Significance. If the core claims hold, the work is significant for ML evaluation in scientific inverse problems (e.g., particle physics), where it shows that metric choice can reverse model rankings and alter scientific conclusions. The analytic posterior in the synthetic benchmark is a strength for exact verification. The independence claim, if rigorously established, would be a notable result.
major comments (3)
- [Abstract] Abstract: the claim that 'the resulting bias is independent of architecture, training, and dataset size' does not hold exactly. The law of total variance decomposition applies to the population conditional mean (MSE minimizer) but has no direct analogue for conditional medians (MAE); any finite-sample estimator only approximates the population quantity, so realized narrowing depends on N, capacity, and optimization.
- [Abstract] Abstract: no explicit formula, derivation, or definition is supplied for the 'spectrum-fidelity diagnostic' that forms the second leg of the proposed protocol; this quantity is load-bearing for the claim that the protocol targets failure modes missed by pointwise metrics.
- [Abstract] Abstract: the premise that downstream scientific measurements 'rely on' the full posterior spectrum (tails, modes, shapes) rather than low-order moments or point estimates is asserted without derivation or empirical support; this is central to the significance argument but remains an assumption.
minor comments (1)
- [Abstract] The abstract is information-dense; consider separating the protocol description from the bias argument for readability.
Simulated Author's Rebuttal
We thank the referee for these precise comments on the abstract. They highlight areas where greater rigor and explicitness will strengthen the manuscript. We address each point below and have made targeted revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'the resulting bias is independent of architecture, training, and dataset size' does not hold exactly. The law of total variance decomposition applies to the population conditional mean (MSE minimizer) but has no direct analogue for conditional medians (MAE); any finite-sample estimator only approximates the population quantity, so realized narrowing depends on N, capacity, and optimization.
Authors: We agree that the law of total variance supplies an exact population-level statement for the conditional mean (MSE minimizer) and that finite-sample estimators only approach this limit. The original wording was intended to convey that the bias is architectural- and data-size-independent once the estimator converges to the population quantity, but the phrasing was imprecise. For MAE the decomposition is not identical, though the qualitative compression of marginal spectra still occurs under multimodality. We have revised the abstract to read 'in the population limit, independent of architecture...' and added a clarifying sentence in Section 2.1 distinguishing the MSE case from the MAE case while preserving the core structural claim. revision: yes
-
Referee: [Abstract] Abstract: no explicit formula, derivation, or definition is supplied for the 'spectrum-fidelity diagnostic' that forms the second leg of the proposed protocol; this quantity is load-bearing for the claim that the protocol targets failure modes missed by pointwise metrics.
Authors: The spectrum-fidelity diagnostic is the integrated absolute difference between the empirical CDF of the reconstructed marginal and the true marginal CDF, evaluated over a fine grid of the observable. We have inserted a concise parenthetical definition and the explicit formula into the abstract and expanded the formal definition, including the discretization used in the experiments, in the revised Section 3.2. revision: yes
-
Referee: [Abstract] Abstract: the premise that downstream scientific measurements 'rely on' the full posterior spectrum (tails, modes, shapes) rather than low-order moments or point estimates is asserted without derivation or empirical support; this is central to the significance argument but remains an assumption.
Authors: The premise reflects standard practice in particle-physics unfolding and resonance extraction, where tail probabilities and spectral shapes directly enter cross-section and parameter fits. We have added a short paragraph in the introduction citing representative HEP references on the necessity of full-spectrum fidelity and included a quantitative illustration from the particle-physics benchmark showing how the compressed marginal produces a statistically significant bias in a downstream observable. While a domain-general derivation is outside the paper's scope, the revision supplies both literature grounding and empirical support. revision: partial
Circularity Check
No circularity: central claim rests on external law of total variance
full rationale
The paper derives the narrowing of the marginal spectrum for point estimators from the law of total variance, an independent mathematical identity that holds for the population conditional mean and does not reduce to any fitted parameter, self-citation, or definitional loop within the manuscript. No equations rename a known empirical pattern, smuggle an ansatz via prior work, or treat a fitted input as a prediction. The proposed evaluation protocol is introduced separately and does not depend on the variance claim for its justification. The derivation chain is therefore self-contained against external benchmarks, with any overstatement regarding finite-sample MAE behavior or dataset-size independence constituting a correctness issue rather than circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Law of total variance decomposes the marginal variance into E[Var(X|Y)] + Var(E[X|Y])
Reference graph
Works this paper leans on
-
[1]
Bishop.Pattern Recognition and Machine Learning
Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, January 2006. URL https://www.microsoft.com/en-us/research/publication/pattern-recog nition-machine-learning/
work page 2006
-
[2]
Topological reconstruction of particle physics processes using graph neural networks.Phys
Lukas Ehrke, John Andrew Raine, Knut Zoch, Manuel Guth, and Tobias Golling. Topological reconstruction of particle physics processes using graph neural networks.Phys. Rev. D, 107 (11):116019, 2023. doi: 10.1103/PhysRevD.107.116019
-
[3]
Alexander Shmakov, Michael James Fenton, Ta-Wei Ho, Shih-Chieh Hsu, Daniel Whiteson, and Pierre Baldi. SPANet: Generalized permutationless set assignment for particle physics using symmetry preserving attention.SciPost Phys., 12(5):178, 2022. doi: 10.21468/SciPostPh ys.12.5.178
-
[4]
Emil Y . Sidky and Xiaochuan Pan. Report on the aapm deep-learning sparse-view ct grand challenge.Medical Physics, 49(8):4935–4943, 2022. doi: https://doi.org/10.1002/mp.15489. URLhttps://aapm.onlinelibrary.wiley.com/doi/abs/10.1002/mp.15489
-
[5]
Zhihao Wang, Jian Chen, and Steven C. H. Hoi. Deep learning for image super-resolution: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(10):3365–3387,
-
[6]
doi: 10.1109/TPAMI.2020.2982166
-
[7]
Thomas Vandal, Evan Kodra, Sangram Ganguly, Andrew Michaelis, Ramakrishna Nemani, and Auroop R. Ganguly. Generating high resolution climate change projections through single image super-resolution: an abridged version. InProceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI’18, page 5389–5393. AAAI Press, 2018. ISBN 9780...
work page 2018
-
[8]
Geophysical inversion versus machine learning in inverse problems
Yuji Kim and Nori Nakata. Geophysical inversion versus machine learning in inverse problems. Leading Edge, 37(12):894–901, December 2018. doi: 10.1190/tle37120894.1
-
[9]
Hongyu Shen, E A Huerta, Eamonn O’Shea, Prayush Kumar, and Zhizhen Zhao. Statistically- informed deep learning for gravitational wave parameter estimation.Machine Learning: Science and Technology, 3(1):015007, November 2021. doi: 10.1088/2632-2153/ac3843. URL https://doi.org/10.1088/2632-2153/ac3843
-
[10]
Maximilian Dax, Stephen R. Green, Jonathan Gair, Jakob H. Macke, Alessandra Buonanno, and Bernhard Schölkopf. Real-time gravitational wave science with neural posterior estimation. Phys. Rev. Lett., 127:241103, December 2021. doi: 10.1103/PhysRevLett.127.241103. URL https://link.aps.org/doi/10.1103/PhysRevLett.127.241103
-
[11]
John Andrew Raine, Matthew Leigh, Knut Zoch, and Tobias Golling. Fast and improved neutrino reconstruction in multineutrino final states with conditional normalizing flows.Phys. Rev. D, 109:012005, January 2024. doi: 10.1103/PhysRevD.109.012005. URL https: //link.aps.org/doi/10.1103/PhysRevD.109.012005
-
[12]
The frontier of simulation-based inference
Kyle Cranmer, Johann Brehmer, and Gilles Louppe. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062, 2020. doi: 10.1073/ pnas.1912789117. URLhttps://www.pnas.org/doi/abs/10.1073/pnas.1912789117
-
[13]
Fastϵ -free inference of simulation models with bayesian conditional density estimation
George Papamakarios and Iain Murray. Fastϵ -free inference of simulation models with bayesian conditional density estimation. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/file /6aca...
work page 2016
-
[14]
Strictly proper scoring rules, prediction, and estimation
Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378, 2007. doi: 10.1198/016214 506000001437. URLhttps://doi.org/10.1198/016214506000001437
-
[15]
Hans Hersbach. Decomposition of the continuous ranked probability score for ensemble prediction systems.Weather and Forecasting, 15(5):559–570, 2000. doi: 10.1175/1520-043 4(2000)015<0559:DOTCRP>2.0.CO;2. URL https://journals.ametsoc.org/view/jo urnals/wefo/15/5/1520-0434_2000_015_0559_dotcrp_2_0_co_2.xml
-
[16]
Alexander Jordan, Fabian Krüger, and Sebastian Lerch. Evaluating probabilistic forecasts with scoringrules.Journal of Statistical Software, 90(12):1–37, 2019. doi: 10.18637/jss.v090.i12. URLhttps://www.jstatsoft.org/index.php/jss/article/view/v090i12
-
[17]
A trust crisis in simulation-based inference? your posterior approximations can be unfaithful, 2022
Joeri Hermans, Arnaud Delaunoy, François Rozet, Antoine Wehenkel, V olodimir Begy, and Gilles Louppe. A trust crisis in simulation-based inference? your posterior approximations can be unfaithful, 2022. URLhttps://arxiv.org/abs/2110.06581
-
[18]
Validating bayesian inference algorithms with simulation-based calibration, 2020
Sean Talts, Michael Betancourt, Daniel Simpson, Aki Vehtari, and Andrew Gelman. Validating bayesian inference algorithms with simulation-based calibration, 2020. URL https://arxiv. org/abs/1804.06788
-
[19]
On Calibration of Modern Neural Networks
Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks.CoRR, abs/1706.04599, 2017. URLhttp://arxiv.org/abs/1706.04599
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
Accurate Uncertainties for Deep Learning Using Calibrated Regression
V olodymyr Kuleshov, Nathan Fenner, and Stefano Ermon. Accurate uncertainties for deep learning using calibrated regression.CoRR, abs/1807.00263, 2018. URL http://arxiv.or g/abs/1807.00263
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[21]
Alex Gammerman, V olodya V ovk, and Vladimir Vapnik. Learning by transduction. InPro- ceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, UAI’98, page 148–155, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. ISBN 155860555X. 11
work page 1998
-
[22]
Probabilistic conformal prediction using conditional random samples
Zhendong Wang, Ruijiang Gao, Mingzhang Yin, Mingyuan Zhou, and David Blei. Probabilistic conformal prediction using conditional random samples. In Francisco Ruiz, Jennifer Dy, and Jan-Willem van de Meent, editors,Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 ofProceedings of Machine Learning Researc...
work page 2023
-
[23]
Jack Y . Araz and Michael Spannowsky. Another fit bites the dust: Conformal prediction as a calibration standard for machine learning in high-energy physics, 2025. URL https: //arxiv.org/abs/2512.17048
-
[24]
Benchmarking simulation-based inference
Jan-Matthis Lueckmann, Jan Boelts, David Greenberg, Pedro Goncalves, and Jakob Macke. Benchmarking simulation-based inference. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 ofProceedings of Machine Learning Research, pages 343–351. PMLR, May 2021. URL...
work page 2021
-
[25]
Sampling- based accuracy testing of posterior estimators for general inference
Pablo Lemos, Adam Coogan, Yashar Hezaveh, and Laurence Perreault-Levasseur. Sampling- based accuracy testing of posterior estimators for general inference. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors,Proceedings of the 40th International Conference on Machine Learning, volume 202 ofPro...
work page 2023
-
[26]
Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences
Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy Feng, Caifeng Zou, Yu Sun, Nikola Borislavov Kovachki, Zachary E Ross, Katherine Bouman, and Yisong Yue. Inversebench: Benchmarking plug-and-play diffusion priors for inverse problems in physical sciences. InThe Thirteenth International Conference on Learning Representations, 2025. U...
work page 2025
-
[27]
Jan A. Högbom. Aperture Synthesis with a Non-Regular Distribution of Interferometer Base- lines.Astron. Astrophys. Suppl. Ser., 15:417, June 1974
work page 1974
-
[28]
Algebraic approach to solve tt dilepton equations.Phys
Lars Sonnenschein. Algebraic approach to solve tt dilepton equations.Phys. Rev. D, 72:095020, November 2005. doi: 10.1103/PhysRevD.72.095020. URL https://link.aps.org/doi/1 0.1103/PhysRevD.72.095020
-
[29]
The CMS Collaboration. Enhanced reconstruction of dileptonic top quark-antiquark events using supervised machine learning methods. Technical report, CERN, Geneva, 2025. URL https://cds.cern.ch/record/2944724
-
[30]
Analyzing inverse problems with invertible neural networks
Lynton Ardizzone, Jakob Kruse, Carsten Rother, and Ullrich Köthe. Analyzing inverse problems with invertible neural networks. InInternational Conference on Learning Representations,
-
[31]
URLhttps://openreview.net/forum?id=rJed6j0cKX
-
[32]
Geoffrey Grimmett and David Stirzaker.Probability and Random Processes. OUP Oxford,
-
[33]
Tilmann Gneiting, Larissa I. Stanberry, Eric P. Grimit, Leonhard Held, and Nicholas A. Johnson. Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds.TEST, 17(2):211–235, August 2008. ISSN 1863-8260. doi: 10.1007/s11749-008-0114-x. URLhttps://doi.org/10.1007/s11749-008-0114-x
-
[34]
Springer New York, New York, NY ,
Karl Pearson.On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling, pages 11–28. Springer New York, New York, NY ,
-
[35]
doi: 10.1007/978-1-4612-4380-9_2
ISBN 978-1-4612-4380-9. doi: 10.1007/978-1-4612-4380-9_2. URL https: //doi.org/10.1007/978-1-4612-4380-9_2
-
[36]
Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. A metric for distributions with applica- tions to image databases. InSixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), pages 59–66, January 1998. doi: 10.1109/ICCV.1998.710701. 12
-
[37]
Tilmann Gneiting, Fadoua Balabdaoui, and Adrian E. Raftery. Probabilistic forecasts, calibration and sharpness.Journal of the Royal Statistical Society Series B, 69(2):243–268, 2007. URL https://EconPapers.repec.org/RePEc:bla:jorssb:v:69:y:2007:i:2:p:243-268
work page 2007
-
[38]
Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural spline flows. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/7ac71d4 33f282034e...
work page 2019
-
[39]
George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference.Journal of Machine Learning Research, 22(57):1–64, 2021. URL http://jmlr.org/papers/v22/19 -1028.html
work page 2021
-
[40]
How to unfold top decays.SciPost Phys
Luigi Favaro, Roman Kogler, Alexander Paasch, Sofia Palacios Schweitzer, Tilman Plehn, and Dennis Schwarz. How to unfold top decays.SciPost Phys. Core, 8:053, 2025. doi: 10.21468/SciPostPhysCore.8.3.053. URL https://scipost.org/10.21468/SciPostPh ysCore.8.3.053
-
[41]
The CMS Collaboration. Observation of a pseudoscalar excess at the top quark pair production threshold.Reports on Progress in Physics, 88(8):087801, August 2025. doi: 10.1088/1361-663 3/adf7d3. URLhttps://doi.org/10.1088/1361-6633/adf7d3
-
[42]
DELPHES 3, A modular framework for fast simulation of a generic collider experiment
Jerome de Favereau, Christophe Delaere, Pavel Demin, Andrea Giammanco, Vincent Lemaître, Alexandre Mertens, Michele Selvaggi, and The DELPHES 3 collaboration. Delphes 3: a modular framework for fast simulation of a generic collider experiment.Journal of High Energy Physics, 2014(2):57, 2014. doi: 10.1007/JHEP02(2014)057
work page internal anchor Pith review doi:10.1007/jhep02(2014)057 2014
-
[43]
Dileptonic ttbar neutrino regression dataset, July 2023
John Andrew Raine, Matthew Leigh, Knut Zoch, Lukas Ehrke, Debajyoti Sengupta, and Tobias Golling. Dileptonic ttbar neutrino regression dataset, July 2023. URL https://doi.org/10 .5281/zenodo.8113516
work page 2023
-
[44]
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations.Advances in Neural Information Processing Systems, 2018
work page 2018
-
[45]
Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=PqvMRDCJT9t
work page 2023
-
[46]
The landscape of unfolding with machine learning.SciPost Phys., 18:070, 2025
Nathan Huetsch, Javier Mariño Villadamigo, Alexander Shmakov, Sascha Diefenbacher, Vinicius Mikuni, Theo Heimel, Michael Fenton, Kevin Greif, Benjamin Nachman, Daniel Whiteson, Anja Butter, and Tilman Plehn. The landscape of unfolding with machine learning.SciPost Phys., 18:070, 2025. doi: 10.21468/SciPostPhys.18.2.070. URL https://scipost.org/10.21468/Sc...
-
[47]
Invertible networks or partons to detector and back again.SciPost Phys., 9:074, 2020
Marco Bellagente, Anja Butter, Gregor Kasieczka, Tilman Plehn, Armand Rousselot, Ramon Winterhalder, Lynton Ardizzone, and Ullrich Köthe. Invertible networks or partons to detector and back again.SciPost Phys., 9:074, 2020. doi: 10.21468/SciPostPhys.9.5.074. URL https://scipost.org/10.21468/SciPostPhys.9.5.074
-
[48]
Generative unfolding of jets and their substructure, 2025
Antoine Petitjean, Anja Butter, Kevin Greif, Sofia Palacios Schweitzer, Tilman Plehn, Jonas Spinner, and Daniel Whiteson. Generative unfolding of jets and their substructure, 2025. URL https://arxiv.org/abs/2510.19906
-
[49]
Calibrated reliable regression using maximum mean discrepancy
Peng Cui, Wenbo Hu, and Jun Zhu. Calibrated reliable regression using maximum mean discrepancy. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 17164–17175. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper /2020/file/c74c4bf0d...
work page 2020
-
[50]
Christopher A. T. Ferro, David S. Richardson, and Andreas P. Weigel. On the effect of ensemble size on the discrete and continuous ranked probability scores.Meteorological Applications, 15 (1):19–24, 2008. doi: https://doi.org/10.1002/met.45. URL https://rmets.onlinelibrar y.wiley.com/doi/abs/10.1002/met.45
-
[51]
Steve Baker and Robert D. Cousins. Clarification of the use of chi-square and likelihood functions in fits to histograms.Nuclear Instruments and Methods in Physics Research, 221(2): 437–442, 1984. ISSN 0167-5087. doi: https://doi.org/10.1016/0167-5087(84)90016-4. URL https://www.sciencedirect.com/science/article/pii/0167508784900164
-
[52]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Fixing weight decay regularization in adam.CoRR, abs/1711.05101, 2017. URLhttp://arxiv.org/abs/1711.05101
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[53]
Christopher M. Bishop. Mixture density networks. WorkingPaper 4288, Aston University, 1994
work page 1994
-
[54]
normflows: A pytorch package for normalizing flows.Journal of Open Source Software, 8(86):5361, 2023
Vincent Stimper, David Liu, Andrew Campbell, Vincent Berenz, Lukas Ryll, Bernhard Schölkopf, and José Miguel Hernández-Lobato. normflows: A pytorch package for normalizing flows.Journal of Open Source Software, 8(86):5361, 2023. doi: 10.21105/joss.05361. URL https://doi.org/10.21105/joss.05361
-
[55]
A Lorentz-equivariant transformer for all of the LHC.SciPost Phys., 19:108, 2025
Johann Brehmer, Víctor Bresó, Pim de Haan, Tilman Plehn, Huilin Qu, Jonas Spinner, and Jesse Thaler. A Lorentz-equivariant transformer for all of the LHC.SciPost Phys., 19:108, 2025. doi: 10.21468/SciPostPhys.19.4.108. URL https://scipost.org/10.21468/SciPostPhys.1 9.4.108
-
[56]
Lorentz-equivariant geometric algebra transformers for high-energy physics
Jonas Spinner, Victor Bresó, Pim De Haan, Tilman Plehn, Jesse Thaler, and Johann Brehmer. Lorentz-equivariant geometric algebra transformers for high-energy physics. InAdvances in Neural Information Processing Systems, volume 37, 2024. URL https://arxiv.org/abs/ 2405.14806
-
[57]
Geometric algebra trans- former
Johann Brehmer, Pim de Haan, Sönke Behrends, and Taco Cohen. Geometric algebra trans- former. InAdvances in Neural Information Processing Systems, volume 36, 2023. URL https://arxiv.org/abs/2305.18415
-
[58]
Ricky T. Q. Chen. torchdiffeq, 2018. URL https://github.com/rtqichen/torchdiffeq. A Evaluation Metrics A.1 Empirical CRPS estimator For a predictive distribution represented by N posterior samples {ˆz(k)}N k=1, the CRPS of eq. (1) is estimated as \CRPS = 1 N NX k=1 ˆz(k) −z − 1 2N2 NX k=1 NX j=1 ˆz(k) −ˆz(j) ,(8) computable in O(NlogN) via sorting [15]. T...
work page 2018
-
[59]
and TARP [24] extend coverage diagnostics to the joint setting. They are the natural choice when the protocol’s univariate CRPS is replaced by the energy score (section 4.1) for joint-posterior applications. Choosing among themWe recommend to use conformal prediction when comparing across model families (its finite-sample guarantee is family-agnostic). Us...
-
[60]
Underdetermined Kinematics:While the detector measures the transverse components of the sum of the neutrino momenta ⃗Emiss T , the individual longitudinal momenta (pz) and the specific distribution of transverse momentum between the two neutrinos are un- known. This results in a system with six unknown degrees of freedom (the three-momentum components for...
-
[61]
Combinatorial Ambiguity:In a standard event, the detector identifies two b-jets, but it is not a priori known which jet originated from the top quark and which from the anti-top quark. For n additional light-flavor jets in the event, the number of possible permutations for the final-state assignment grows factorially, creating a complex assignment problem
-
[62]
Detector Resolution and Noise:The measured momenta of jets and the ⃗Emiss T are subject to experimental uncertainties and resolution effects. Traditional analytical “kinematic fitting” methods often fail when the measured values fluctuate such that no physical solution exists for the mass constraints. D.2 Dataset details We use the public Delphes [39] Mon...
-
[63]
architecture combining a transformer condition encoder with a stack of RQS coupling layers for the discrete flow. Regression Continuous flow Discrete flow (MSE & MMD) (flow matching) (ν 2-flow style) Condition encoder Encoder blocks8 4 4 Attention heads8 8 8 Hidden dimension128 128 128 Dropout0.1 0.1 0.1 Positional encoding dim8 8 8 Flow / decoder head De...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.