pith. machine review for the scientific record. sign in

arxiv: 2604.13320 · v1 · submitted 2026-04-14 · ✦ hep-ex

Recognition: unknown

Highly boosted dielectron identification in proton-proton collisions at sqrt{s} = 13 TeV

Authors on Pith no claims yet

Pith reviewed 2026-05-10 13:26 UTC · model grok-4.3

classification ✦ hep-ex
keywords dielectron identificationboosted electronsmerged calorimeter clustersmultivariate analysisCMS detectorproton-proton collisionselectromagnetic calorimeter
0
0 comments X

The pith

CMS develops multivariate models to tag highly boosted dielectrons that merge into one electromagnetic calorimeter cluster.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a technique to reconstruct dielectron pairs produced with Lorentz boost above 20, where the two electrons deposit energy in a single merged cluster rather than two separate ones. Separate multivariate classifiers are trained for the cases with both tracks reconstructed and with only one track reconstructed. Efficiencies are measured directly in collision data, reaching 80 percent for the two-track case using boosted J/psi decays and roughly 60 percent for the single-track case using photon conversions in Z to mu mu gamma events. A dedicated energy correction for these merged candidates is derived from B to J/psi K decays. This approach matters because it recovers signal efficiency in high-momentum electron-pair searches that would otherwise lose events to the merged-cluster topology.

Core claim

A new technique is developed to identify dielectrons with Lorentz boost gamma_L greater than 20 that produce one single merged cluster in the electromagnetic calorimeter. The identification uses two multivariate models, one when both electron tracks are reconstructed and one when only a single track is reconstructed. Efficiency is measured in proton-proton data at 13 TeV: boosted J/psi to e+e- gives an overall efficiency of 80 percent for the two-track model, while Z to mu+mu-gamma events with converted photons give about 60 percent for the single-track model. A dedicated energy correction for dielectron candidates is also derived from B to J/psi K data.

What carries the argument

Two multivariate classifiers that combine track and calorimeter information to distinguish merged dielectron clusters from background, with separate training for the two-track and single-track reconstruction cases.

If this is right

  • The method recovers events in which high-pT resonances decay to electron pairs that would otherwise be lost to merged clusters.
  • The energy correction improves the mass and transverse-momentum resolution for such merged candidates.
  • Separate models for one-track and two-track cases allow the analysis to retain signal in different detector-response regimes.
  • Efficiencies are measured in data, reducing reliance on simulation for this topology.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same merged-cluster logic could be adapted to other lepton-pair or photon-pair signatures that become collinear at high boost.
  • Future runs with higher instantaneous luminosity will increase the fraction of events requiring this identification, making the data-driven efficiency measurement more valuable.
  • Cross-checks with additional control samples, such as other resonances decaying to electrons, would further test the transferability of the efficiency.

Load-bearing premise

The identification efficiency measured in boosted J/psi and converted-photon control samples transfers accurately to the signal processes of interest.

What would settle it

Applying the models to an independent control sample or to simulated signal events with known generator-level truth and finding efficiencies that differ by more than the quoted uncertainties from the 80 percent and 60 percent values.

Figures

Figures reproduced from arXiv: 2604.13320 by CMS Collaboration.

Figure 1
Figure 1. Figure 1: Visual representations of the variables αtrack, ∆u and ∆v. Cyan-colored lines depict the incoming tracks of the dielectron. The black dashed line is used to define the u and v directions. The red dashed line represents the U5×5 cluster around the closest crystal from the tracks. The cyan-colored star is the log-weighted CoG of the U5×5 cluster [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The distribution of the two most contributing variables for the (upper) two-track [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The BDT score distributions of the (left) two-track and (right) single-track models. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The signal dielectron selection efficiency as a function of [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The nominal dielectron mass distribution of J/ [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The efficiency and SF for the two-track model as a function of (left) [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The nominal Z boson candidate mass distribution in data using [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The efficiency and SF as a function of (left) [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: The distribution of the invariant mass between the U [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗
read the original abstract

A new technique is developed to identify dielectrons (e$^+$e$^-$) with Lorentz boost $\gamma_\mathrm{L}$ $\gt$ 20 that produce one single merged cluster in the electromagnetic calorimeter of the CMS detector. The identification uses two multivariate models: one for the case where both electron tracks are reconstructed, and another where only one of the tracks is reconstructed. The efficiency is determined using proton-proton collision data collected at a center-of-mass energy of 13 TeV. Boosted J/$\psi$ mesons decaying into e$^+$e$^-$ pairs are used to estimate the efficiency of the model with two tracks, yielding an overall efficiency of 80%. The Z $\to$ $\mu^+\mu^-\gamma$ events, where the photon converts into a collimated dielectron, are used for the model with a single track, yielding an efficiency of about 60%. A dedicated energy correction for dielectron candidates is also developed using B$^\pm$ $\to$ J/$\psi$K$^\pm$ $\to$ e$^+$e$^-$K$^\pm$ data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a new technique for identifying highly boosted dielectrons (γ_L > 20) that merge into a single cluster in the CMS electromagnetic calorimeter. Two multivariate models are used: one for cases with both electron tracks reconstructed and one for single-track cases. Efficiencies are extracted from proton-proton collision data at √s = 13 TeV using boosted J/ψ → e⁺e⁻ decays (overall 80%) for the two-track model and Z → μ⁺μ⁻γ events with photon conversions (~60%) for the single-track model. A dedicated energy correction for dielectron candidates is developed using B± → J/ψ K± data.

Significance. If the efficiencies transfer reliably, the method could improve reconstruction of highly boosted dielectron pairs in high-p_T searches and measurements at the LHC. The data-driven extraction of efficiencies from real collision control samples is a clear strength, as it minimizes dependence on simulation for the quoted performance figures.

major comments (2)
  1. [Efficiency determination using control samples] The central efficiencies (80% from boosted J/ψ and ~60% from Z→μμγ conversions) are measured in control samples whose kinematics (fixed masses, specific production mechanisms) differ from typical signal dielectrons at high p_T with arbitrary opening angles inside the merged cluster. Track-finding efficiency, shower-shape variables, and single-cluster merging probability are sensitive to these differences, yet no direct comparison of MV input distributions or efficiency in signal-like Monte Carlo after identical selections is provided. This assumption is load-bearing for the quoted performance.
  2. [Multivariate identification models] The two multivariate models are described as trained or tuned on the control samples; without explicit validation that their response remains stable under the broader kinematic range and mass hypotheses of the intended signal processes, the overall identification efficiency claim rests on untested extrapolation.
minor comments (2)
  1. [Abstract] The abstract omits any mention of systematic uncertainties on the efficiencies, background modeling details, or cross-checks against simulation, which would help readers assess the robustness of the 80% and 60% figures.
  2. [Energy correction development] Clarify the impact of the dedicated energy correction on the final dielectron candidate selection and whether it introduces additional uncertainties that propagate into the quoted efficiencies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments raise important points about the applicability of the efficiencies measured in control samples to the broader range of signal processes. We address each major comment below and have revised the manuscript accordingly to provide additional validation and clarification.

read point-by-point responses
  1. Referee: [Efficiency determination using control samples] The central efficiencies (80% from boosted J/ψ and ~60% from Z→μμγ conversions) are measured in control samples whose kinematics (fixed masses, specific production mechanisms) differ from typical signal dielectrons at high p_T with arbitrary opening angles inside the merged cluster. Track-finding efficiency, shower-shape variables, and single-cluster merging probability are sensitive to these differences, yet no direct comparison of MV input distributions or efficiency in signal-like Monte Carlo after identical selections is provided. This assumption is load-bearing for the quoted performance.

    Authors: We agree that the control samples have specific kinematic features, but they were selected because they provide high-purity, data-driven samples of highly boosted dielectrons that merge into single clusters—the exact topology targeted by the method. The multivariate inputs are dominated by local cluster shower shapes and track properties within the merged object, which depend primarily on the small opening angle and boost rather than the parent particle mass or production mechanism. Nevertheless, to strengthen the manuscript, we have added comparisons of the key multivariate input variable distributions between the J/ψ and Z-conversion control samples and a Monte Carlo sample of high-p_T dielectrons from a generic heavy-resonance decay process, after identical selections. We also include the measured efficiency versus p_T in simulation for both control-like and signal-like kinematics. These studies show consistency within uncertainties and support the quoted performance figures for the intended use cases. revision: yes

  2. Referee: [Multivariate identification models] The two multivariate models are described as trained or tuned on the control samples; without explicit validation that their response remains stable under the broader kinematic range and mass hypotheses of the intended signal processes, the overall identification efficiency claim rests on untested extrapolation.

    Authors: The models are trained exclusively on the data control samples to incorporate real detector response. To address the concern regarding stability across kinematics and mass hypotheses, the revised manuscript now includes an explicit study of the multivariate discriminator output and the resulting identification efficiency as functions of dielectron p_T, opening angle, and invariant mass. This validation is performed both in the control data and in simulated events covering a wider range of boosts and parent masses relevant to typical high-p_T searches. The response is found to be stable, with efficiency variations well within the systematic uncertainties already assigned. revision: yes

Circularity Check

0 steps flagged

No significant circularity: efficiencies measured directly from independent control data

full rationale

The paper's central results are empirical efficiencies extracted from separate control samples in collision data (boosted J/ψ decays for the two-track multivariate model and Z→μμγ conversions for the single-track model), plus an energy correction derived from B±→J/ψK± data. These are direct measurements on distinct processes rather than quantities fitted or derived from the signal sample itself. No equations, self-citations, or ansatze are presented that reduce the quoted 80% and ~60% efficiencies to the inputs by construction; the derivation chain consists of standard data-driven calibration steps that remain externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard CMS detector modeling and data-driven calibration with no new theoretical entities or free parameters introduced beyond typical experimental assumptions.

axioms (1)
  • domain assumption Standard assumptions about electromagnetic calorimeter response and track reconstruction in CMS for high-boost electrons
    Invoked when transferring efficiencies from control samples to signal.

pith-pipeline@v0.9.0 · 5498 in / 1109 out tokens · 45458 ms · 2026-05-10T13:26:22.460760+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 32 canonical work pages · 3 internal anchors

  1. [1]

    Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC

    CMS Collaboration, “Electron and photon reconstruction and identification with the CMS experiment at the CERN LHC”,JINST16(2021) P05014, doi:10.1088/1748-0221/16/05/P05014,arXiv:2012.06888

  2. [2]

    Four-lepton resonance at the Large Hadron Collider

    V . Barger and H.-S. Lee, “Four-lepton resonance at the Large Hadron Collider”,Phys. Rev. D85(2012) 055030,doi:10.1103/PhysRevD.85.055030,arXiv:1111.0633

  3. [3]

    Theory and phenomenology of two-Higgs-doublet models

    G. C. Branco et al., “Theory and phenomenology of two-Higgs-doublet models”,Phys. Rept.516(2012) 1,doi:10.1016/j.physrep.2012.02.002,arXiv:1106.0034

  4. [4]

    Illuminating dark photons with high-energy colliders

    D. Curtin, R. Essig, S. Gori, and J. Shelton, “Illuminating dark photons with high-energy colliders”,JHEP02(2015) 157,doi:10.1007/JHEP02(2015)157, arXiv:1412.0018

  5. [5]

    Reconstruction of decays to merged photons using end-to-end deep learning with domain continuation in the CMS detector

    CMS Collaboration, “Reconstruction of decays to merged photons using end-to-end deep learning with domain continuation in the CMS detector”,Phys. Rev. D108(2023) 052002, doi:10.1103/PhysRevD.108.052002,arXiv:2204.12313

  6. [6]

    Search for new resonances decaying to pairs of merged diphotons in proton-proton collisions at √s=13 TeV

    CMS Collaboration, “Search for new resonances decaying to pairs of merged diphotons in proton-proton collisions at √s=13 TeV”,Phys. Rev. Lett.134(2025) 041801, doi:10.1103/PhysRevLett.134.041801,arXiv:2405.00834

  7. [7]

    A search for new resonances in multiple final states with a high transverse momentum Z boson in √s=13 TeV pp collisions with the ATLAS detector

    ATLAS Collaboration, “A search for new resonances in multiple final states with a high transverse momentum Z boson in √s=13 TeV pp collisions with the ATLAS detector”, JHEP06(2023) 36,doi:10.1007/JHEP06(2023)036,arXiv:2209.15345

  8. [8]

    Search for heavy resonances decaying into four leptons with high Lorentz boosts in proton-proton collisions at √s=13 TeV

    CMS Collaboration, “Search for heavy resonances decaying into four leptons with high Lorentz boosts in proton-proton collisions at √s=13 TeV”, CMS Physics Analysis Summary CMS-PAS-EXO-24-006, 2025

  9. [9]

    The CMS experiment at the CERN LHC

    CMS Collaboration, “The CMS experiment at the CERN LHC”,JINST3(2008) S08004, doi:10.1088/1748-0221/3/08/S08004

  10. [10]

    Development of the CMS detector for the CERN LHC Run 3

    CMS Collaboration, “Development of the CMS detector for the CERN LHC Run 3”, JINST19(2024) P05064,doi:10.1088/1748-0221/19/05/P05064, arXiv:2309.05466

  11. [11]

    Performance of the CMS Level-1 trigger in proton-proton collisions at √s=13 TeV

    CMS Collaboration, “Performance of the CMS Level-1 trigger in proton-proton collisions at √s=13 TeV”,JINST15(2020) P10017, doi:10.1088/1748-0221/15/10/P10017,arXiv:2006.10165

  12. [12]

    The CMS trigger system

    CMS Collaboration, “The CMS trigger system”,JINST12(2017) P01020, doi:10.1088/1748-0221/12/01/P01020,arXiv:1609.02366

  13. [13]

    Performance of the CMS high-level trigger during LHC Run 2

    CMS Collaboration, “Performance of the CMS high-level trigger during LHC Run 2”, JINST19(2024) P11021,doi:10.1088/1748-0221/19/11/P11021, arXiv:2410.17038. References 19

  14. [14]

    Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at $\sqrt{s}=$ 13 TeV

    CMS Collaboration, “Performance of the CMS muon detector and muon reconstruction with proton-proton collisions at √s=13 TeV”,JINST13(2018) P06015, doi:10.1088/1748-0221/13/06/P06015,arXiv:1804.04528

  15. [15]

    Description and performance of track and primary-vertex reconstruction with the CMS tracker

    CMS Collaboration, “Description and performance of track and primary-vertex reconstruction with the CMS tracker”,JINST9(2014) P10009, doi:10.1088/1748-0221/9/10/P10009,arXiv:1405.6569

  16. [16]

    The CMS phase-1 pixel detector upgrade

    CMS Tracker Group Collaboration, “The CMS Phase-1 pixel detector upgrade”,JINST 16(2021) P02027,doi:10.1088/1748-0221/16/02/P02027,arXiv:2012.14304

  17. [17]

    Particle-flow reconstruction and global event description with the CMS detector

    CMS Collaboration, “Particle-flow reconstruction and global event description with the CMS detector”,JINST12(2017) P10003,doi:10.1088/1748-0221/12/10/P10003, arXiv:1706.04965

  18. [18]

    Reconstruction of electrons with the Gaussian-sum filter in the CMS tracker at the LHC

    W. Adam, R. Fr ¨uhwirth, A. Strandlie, and T. Todorov, “Reconstruction of electrons with the Gaussian-sum filter in the CMS tracker at the LHC”,J. Phys. G: Nucl. Part. Phys.31 (2005) 9,doi:10.1088/0954-3899/31/9/N01,arXiv:physics/0306087

  19. [19]

    ECAL 2016 refined calibration and Run2 summary plots

    CMS Collaboration, “ECAL 2016 refined calibration and Run2 summary plots”, CMS Detector Performance Summary CMS-DP-2020-021, 2020

  20. [20]

    Sj¨ ostrand, S

    T. Sj ¨ostrand et al., “An introduction to PYTHIA 8.2”,Comput. Phys. Commun.191(2015) 159,doi:10.1016/j.cpc.2015.01.024,arXiv:1410.3012

  21. [21]

    Alwall, R

    J. Alwall et al., “The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations”,JHEP07 (2014) 79,doi:10.1007/JHEP07(2014)079,arXiv:1405.0301

  22. [22]

    The EvtGen particle decay simulation package

    D. J. Lange, “The EvtGen particle decay simulation package”,Nucl. Instrum. Meth. A462 (2001) 152,doi:10.1016/S0168-9002(01)00089-4

  23. [23]

    MiNNLOPS: a new method to match NNLO QCD to parton showers

    P . F. Monni et al., “MiNNLOPS: a new method to match NNLO QCD to parton showers”, JHEP05(2020) 143,doi:10.1007/JHEP05(2020)143,arXiv:1908.06987. [Erratum:doi:10.1007/JHEP02(2022)031]

  24. [24]

    MiNNLO PS: optimizing 2→1 hadronic processes

    P . F. Monni, E. Re, and M. Wiesemann, “MiNNLOPS: optimizing 2→1 hadronic processes”,Eur. Phys. J. C80(2020) 1075, doi:10.1140/epjc/s10052-020-08658-5,arXiv:2006.04133

  25. [25]

    PHOTOS — a universal Monte Carlo for QED radiative corrections: version 2.0

    E. Barberio and Z. Wa ¸s, “PHOTOS — a universal Monte Carlo for QED radiative corrections: version 2.0”,Comput. Phys. Commun.79(1994) 291, doi:10.1016/0010-4655(94)90074-4

  26. [26]

    Extraction and validation of a new set of CMS PYTHIA8 tunes from underlying-event measurements

    CMS Collaboration, “Extraction and validation of a new set of CMS PYTHIA8 tunes from underlying-event measurements”,Eur. Phys. J. C80(2020) 4, doi:10.1140/epjc/s10052-019-7499-4,arXiv:1903.12179

  27. [27]

    Parton distributions from high-precision collider data

    R. D. Ball et al., “Parton distributions from high-precision collider data”,Eur. Phys. J. C 77(2017) 663,doi:10.1140/epjc/s10052-017-5199-5,arXiv:1706.00428

  28. [28]

    GEANT4 — a simulation toolkit

    GEANT4 Collaboration, “GEANT4—a simulation toolkit”,Nucl. Instrum. Meth. A506 (2003) 250,doi:10.1016/S0168-9002(03)01368-8. 20

  29. [29]

    Precision luminosity measurement in proton-proton collisions at√s=13 TeV in 2015 and 2016 at CMS

    CMS Collaboration, “Precision luminosity measurement in proton-proton collisions at√s=13 TeV in 2015 and 2016 at CMS”,Eur. Phys. J. C81(2021) 800, doi:10.1140/epjc/s10052-021-09538-2,arXiv:2104.01927

  30. [30]

    Pileup mitigation at CMS in 13 TeV data

    CMS Collaboration, “Pileup mitigation at CMS in 13 TeV data”,JINST15(2020) P09018, doi:10.1088/1748-0221/15/09/P09018,arXiv:2003.00503

  31. [31]

    CMS luminosity measurement for the 2017 data-taking period at√s=13 TeV

    CMS Collaboration, “CMS luminosity measurement for the 2017 data-taking period at√s=13 TeV”, CMS Physics Analysis Summary CMS-PAS-LUM-17-004, 2018

  32. [32]

    CMS luminosity measurement for the 2018 data-taking period at√s=13 TeV

    CMS Collaboration, “CMS luminosity measurement for the 2018 data-taking period at√s=13 TeV”, CMS Physics Analysis Summary CMS-PAS-LUM-18-002, 2019

  33. [33]

    Proceedings of the 22nd

    T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system”, inProc.22 nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD ’16, p. 785. 2016. arXiv:1603.02754.doi:10.1145/2939672.2939785

  34. [34]

    Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures

    J. Bergstra, D. Yamins, and D. Cox, “Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures”, inProc.30 th Int. Conf. on Machine Learning, volume 28, p. 115. 2013.arXiv:1209.5111

  35. [35]

    Recording and reconstructing 10 billion unbiased B hadron decays in CMS

    CMS Collaboration, “Recording and reconstructing 10 billion unbiased B hadron decays in CMS”, CMS Detector Performance Summary CMS-DP-2019-043, 2019

  36. [36]

    A Study of the Reactionsψ ′ →γγψ

    M. Oreglia, “A Study of the Reactionsψ ′ →γγψ”. PhD thesis, Stanford University, 1980. SLAC Report SLAC-R-236, see Appendix D

  37. [37]

    Performance of the CMS muon trigger system in proton-proton collisions at √s=13 TeV

    CMS Collaboration, “Performance of the CMS muon trigger system in proton-proton collisions at √s=13 TeV”,JINST16(2021) P07001, doi:10.1088/1748-0221/16/07/P07001,arXiv:2102.04790

  38. [38]

    Test of lepton flavor universality in B ± →K ±µ+µ− and B± →K ±e+e− decays in proton-proton collisions at √s=13 TeV

    CMS Collaboration, “Test of lepton flavor universality in B ± →K ±µ+µ− and B± →K ±e+e− decays in proton-proton collisions at √s=13 TeV”,Rep. Prog. Phys.87 (2024) 077802,doi:10.1088/1361-6633/ad4e65,arXiv:2401.07090. 21 A The CMS Collaboration Yerevan Physics Institute, Yerevan, Armenia A. Hayrapetyan, V . Makarenko , A. Tumasyan1 Institut f ¨ ur Hochenerg...