pith. machine review for the scientific record. sign in

arxiv: 2603.28835 · v2 · submitted 2026-03-30 · ⚛️ physics.ins-det · hep-ex

Recognition: no theorem link

Machine Learning-Based Cluster Classification to Suppress Background in a Prototype RPC Detector

Authors on Pith no claims yet

Pith reviewed 2026-05-14 01:28 UTC · model grok-4.3

classification ⚛️ physics.ins-det hep-ex
keywords resistive plate chambersmachine learningbackground suppressioncluster classificationXGBoostRPC detectorssignal discriminationhigh energy physics
0
0 comments X

The pith

Machine learning classifiers separate signal from background hits in resistive plate chamber detectors using fifteen cluster descriptors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Resistive plate chambers in high-energy physics often produce secondary hits that appear as long tails or extra peaks in time spectra, degrading resolution and efficiency especially when no external trigger is available for self-triggering operation. This work trains three classifiers on fifteen features extracted from time and ADC cluster distributions, including statistical measures like mean and width plus Gaussian fit parameters, to label hits as signal or background. Laboratory tests on a low-resistive bakelite prototype with an external scintillator trigger show strong discrimination across DNN, 1D-CNN, and XGBoost models, with XGBoost generalizing best and cluster size plus temporal shape features emerging as the strongest discriminants. If the approach holds, it would allow cleaner hit selection and faster processing in real experiments without relying on external triggers. A reader would care because it turns a common practical nuisance in RPC operation into a solvable classification task.

Core claim

We present a machine-learning-based strategy to separate signal and background hit clusters using fifteen cluster-level descriptors that encode both statistical properties (histogram mean, width, cluster size) and fit-based parameters (Gaussian-fit mean, width, amplitude, chi^2, NDF) of the time and ADC distributions. Using laboratory data collected from a single-gap low resistive RPC with a three-scintillator master trigger, we trained and evaluated three classifiers-DNN, 1D-CNN, and XGBoost-on balanced signal/background samples. All models demonstrate strong discrimination capability, with XGBoost showing the most robust generalization performance. Feature-importance analysis indicatesthat

What carries the argument

Fifteen cluster-level descriptors from time and ADC distributions, including statistical moments and Gaussian fit parameters, supplied to supervised classifiers for signal-background separation.

If this is right

  • Background suppression via cluster classification improves track reconstruction efficiency and spatial-temporal resolution in self-triggering RPC setups.
  • Processing time decreases because only signal clusters are retained for further analysis.
  • Cluster size and temporal-shape descriptors dominate discrimination, suggesting focused feature engineering for similar detectors.
  • XGBoost provides the most robust performance among the tested models on the balanced laboratory samples.
  • The compact feature set enables practical deployment without heavy computational overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same descriptors could be tested on data from other gaseous detectors that exhibit secondary hit tails.
  • Real-time implementation on FPGA or GPU hardware would be a natural next step for online triggering.
  • Performance might change if background rates or beam conditions differ substantially from the lab setup.
  • Combining these cluster labels with downstream tracking algorithms could further boost overall efficiency.

Load-bearing premise

Laboratory data collected with an external three-scintillator master trigger accurately represents the background characteristics encountered in self-triggering high-rate environments without external triggers.

What would settle it

Collecting new data in a self-triggering configuration without the external scintillator trigger and measuring whether the trained models maintain the same discrimination performance on those samples.

read the original abstract

Resistive Plate Chambers (RPCs) are widely used as tracking detectors in many high-energy physics experiments. It has been observed that low-resistive bakelite RPC prototypes frequently exhibit a secondary hit component, appearing as a long tail or an additional peak in the time-correlation spectra relative to the trigger detector. These secondary hits, which affect both the time and spatial resolution, are difficult to distinguish from genuine signals in high-rate environments without an external trigger. As a result, they can significantly degrade track reconstruction efficiency and increase processing time. We present a machine-learning-based strategy to separate signal and background hit clusters using fifteen cluster-level descriptors that encode both statistical properties (histogram mean, width, cluster size) and fit-based parameters (Gaussian-fit mean, width, amplitude, chi^2, NDF) of the time and ADC distributions. Using laboratory data collected from a single-gap low resistive RPC with a three-scintillator master trigger, we trained and evaluated three classifiers-DNN, 1D-CNN, and XGBoost-on balanced signal/background samples. All models demonstrate strong discrimination capability, with XGBoost showing the most robust generalization performance. Feature-importance analysis indicates that cluster size and temporal-shape descriptors are the dominant discriminants. These results highlight that compact, interpretable cluster-level features combined with machine-learning classifiers offer a practical and effective approach to suppress background in self-triggering low resistive RPC detectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a machine-learning strategy to classify signal versus background hit clusters in low-resistive bakelite RPC prototypes. Fifteen cluster-level descriptors (statistical moments and Gaussian-fit parameters of time and ADC distributions) are extracted from laboratory data collected with a three-scintillator external master trigger. Three classifiers (DNN, 1D-CNN, XGBoost) are trained on balanced samples; the authors report strong discrimination, with XGBoost exhibiting the most robust generalization and feature-importance analysis highlighting cluster size and temporal-shape variables as dominant. The approach is presented as a practical solution for background suppression in self-triggering, high-rate environments where external triggers are unavailable.

Significance. If validated under self-triggered conditions, the work supplies a compact, interpretable feature set and classifier pipeline that could improve time and spatial resolution and reduce processing overhead in RPC-based tracking systems for high-energy physics. The comparison across model families and the explicit feature-importance ranking constitute concrete strengths that aid physical insight and potential deployment.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (data acquisition): labels are defined exclusively by coincidence with the external three-scintillator trigger. The target application is self-triggering operation at higher instantaneous rates, where background cluster statistics (width, tails, size) are expected to differ; no rate-scaling study, untriggered test set, or ablation quantifying performance degradation under this domain shift is provided, leaving the central claim of applicability unsupported by direct evidence.
  2. [§4] §4 (results): the abstract states that “all models demonstrate strong discrimination capability” and that XGBoost shows “the most robust generalization performance,” yet the manuscript excerpt supplies no numerical values (accuracy, AUC, F1, confusion-matrix entries, or cross-validation statistics with uncertainties). Without these load-bearing metrics the quantitative strength of the reported superiority cannot be assessed.
minor comments (2)
  1. [§2] The precise definitions of the fifteen descriptors (e.g., how the Gaussian-fit amplitude, width, and χ²/NDF are computed from the time/ADC histograms) are described only in prose; explicit equations or a supplementary table would improve reproducibility.
  2. [Figure 5] Figure captions and axis labels for the feature-importance plots should explicitly state the number of trees/estimators and the exact importance metric (gain, cover, or frequency) used for XGBoost.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's insightful comments. We have carefully considered each point and made revisions to strengthen the manuscript, including adding quantitative metrics and a discussion on the applicability to self-triggering conditions.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (data acquisition): labels are defined exclusively by coincidence with the external three-scintillator trigger. The target application is self-triggering operation at higher instantaneous rates, where background cluster statistics (width, tails, size) are expected to differ; no rate-scaling study, untriggered test set, or ablation quantifying performance degradation under this domain shift is provided, leaving the central claim of applicability unsupported by direct evidence.

    Authors: We thank the referee for highlighting this important aspect. The use of an external trigger is necessary for labeling signal and background in the supervised learning setup. The secondary hits identified in the triggered data are the same background component expected in self-triggering operation. While we agree that a dedicated study at higher rates or with untriggered data would be ideal to quantify any domain shift, the current prototype setup did not allow for such measurements. In the revised manuscript, we have expanded §5 to discuss this limitation explicitly and emphasize that the selected features (cluster size, temporal shape) are intrinsic properties likely to generalize. We have also added a sentence in the abstract clarifying the scope. revision: partial

  2. Referee: [§4] §4 (results): the abstract states that “all models demonstrate strong discrimination capability” and that XGBoost shows “the most robust generalization performance,” yet the manuscript excerpt supplies no numerical values (accuracy, AUC, F1, confusion-matrix entries, or cross-validation statistics with uncertainties). Without these load-bearing metrics the quantitative strength of the reported superiority cannot be assessed.

    Authors: We agree that the quantitative metrics should be more explicitly stated. In the revised manuscript, we have added the specific performance metrics (accuracy, AUC, F1, confusion matrices, and cross-validation statistics with uncertainties) to the abstract and §4, including a summary table for clarity. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised ML evaluation on externally labeled data

full rationale

The paper extracts 15 cluster-level features from time/ADC distributions in externally triggered laboratory data, trains DNN/1D-CNN/XGBoost classifiers on balanced signal/background samples defined by the three-scintillator coincidence, and reports empirical discrimination metrics plus feature importance on held-out test data. No equations, self-citations, or parameter fits reduce the reported performance numbers to the input labels by construction; the classifiers are ordinary supervised learners whose outputs are not tautological with the trigger-based labeling. The representativeness of the triggered data for self-triggered high-rate operation is a generalization question, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the chosen cluster descriptors capture all relevant signal-background differences and that lab-triggered data generalizes to self-triggered operation.

free parameters (1)
  • XGBoost and neural-network hyperparameters
    Tuned during training to achieve reported discrimination; exact values not stated in abstract.
axioms (2)
  • domain assumption The fifteen cluster-level descriptors encode the essential statistical and shape differences between signal and background hits.
    Invoked by the feature-engineering step described in the abstract.
  • domain assumption Laboratory data with external trigger is representative of background in self-triggered high-rate running.
    Required for the generalization claim but not directly tested in the described setup.

pith-pipeline@v0.9.0 · 5554 in / 1326 out tokens · 49861 ms · 2026-05-14T01:28:38.811752+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

  1. [1]

    Santonico, R

    R. Santonico, R. Cardarelli, Development of resistive plate counters, Nuclear Instruments and Methods in Physics Research 187 (2) (1981) 377–380. doi:https://doi.org/10.1016/0029-554X(81)90363-3. URLhttps://www.sciencedirect.com/science/article/pii/0029554X81903633

  2. [2]

    rep., CERN, Geneva (2017)

    Technical Design Report for the Phase-II Upgrade of the ATLAS Muon Spectrometer, Tech. rep., CERN, Geneva (2017). URLhttps://cds.cern.ch/record/2285580

  3. [3]

    X. Y. Xie, H. L. Xu, Q. Y. Li, Y. J. Sun, A data-based machine learning approach for RPC time resolution study based on ToF reconstruction, JINST 16 (12) (2021) P12002. doi:10.1088/1748-0221/16/12/P12002

  4. [4]

    K. K. et. al, Characterization of the sts/much-xyter2, a 128-channel time and amplitude measurement ic for gas and silicon microstrip sensors, Nuclear Instruments and Methods in Physics Research – 16 – Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 908 (2018) 225–235. doi:https://doi.org/10.1016/j.nima.2018.08.076. URLhttps://w...

  5. [5]

    Chattopadhay, A

    S. Chattopadhay, A. Agarwal, E. Nandy, J. Saini, A. K. Dubey, S. A. Khan, S. Chattopadhyay, Z. Ahammed, Performance of a real-size, low resistivity resistive plate chamber at gif++ using self-trigger electronics for the muon chamber of the cbm experiment, Journal of Instrumentation 20 (03) (2025) P03009.doi:10.1088/1748-0221/20/03/P03009. URLhttps://doi.o...

  6. [6]

    Ganai, Z

    R. Ganai, Z. Ahammed, J. Saini, S. A. Khan, A. Bhattacharyya, S. Chattopadhay, S. Chattopadhyay, Development and performance studies of a real size resistive plate chamber tested at gif++, cern for cbm-much at fair, germany, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 1054 ...

  7. [7]

    Mondal, T

    M. Mondal, T. Dey, S. Chattopadhyay, J. Saini, Z. Ahammed, Performance of a prototype bakelite rpc at gif++ using self-triggered electronics for the cbm experiment at fair, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 1025 (2022) 166042.doi:https://doi.org/10.1016/j.nima.202...

  8. [8]

    Shumka, A

    E. Shumka, A. Samalan, M. Tytgat, M. El Sawy, G. Alves, F. Marujo, E. Coelho, E. Da Costa, H. Nogima, A. Santoro, S. F. De Souza, D. De Jesus Damiao, M. Thiel, K. M. Amarilo, M. B. F. Filho, A.Aleksandrov, R.Hadjiiska, P.Iaydjiev, M.Rodozov, M.Shopova, G.Soultanov, A.Dimitrov, L. Litov, B. Pavlov, P. Petkov, A. Petrov, S. Qian, H. Kou, Z.-A. Liu, J. Zhao,...

  9. [9]

    Burazin Mišura, J

    A. Burazin Mišura, J. Musić, M. Prvan, D. Lelas, Towards real-time machine learning-based signal/background selection in the cms detector using quantized neural networks and input data reduction, Applied Sciences 14 (4) (2024).doi:10.3390/app14041559. URLhttps://www.mdpi.com/2076-3417/14/4/1559

  10. [10]

    LeCun, Y

    Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–44. doi:10.1038/nature14539

  11. [11]

    Abadi, A

    M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke...

  12. [12]

    Chollet, et al., Keras,https://keras.io(2015)

    F. Chollet, et al., Keras,https://keras.io(2015)

  13. [13]

    Friedman, Greedy function approximation: A gradient boosting machine, The Annals of Statistics 29 (11 2000).doi:10.1214/aos/1013203451

    J. Friedman, Greedy function approximation: A gradient boosting machine, The Annals of Statistics 29 (11 2000).doi:10.1214/aos/1013203451

  14. [14]

    T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining, KDD’16, Association for Computing Machinery, New York, NY, USA, 2016, p. 785–794. doi:10.1145/2939672.2939785. URLhttps://doi.org/10.1145/2939672.2939785 – 18 –