arxiv: 2603.28835 · v2 · submitted 2026-03-30 · ⚛️ physics.ins-det · hep-ex

Recognition: no theorem link

Machine Learning-Based Cluster Classification to Suppress Background in a Prototype RPC Detector

Souvik Chattopadhay , Zubayer Ahammed

Authors on Pith no claims yet

Pith reviewed 2026-05-14 01:28 UTC · model grok-4.3

classification ⚛️ physics.ins-det hep-ex

keywords resistive plate chambersmachine learningbackground suppressioncluster classificationXGBoostRPC detectorssignal discriminationhigh energy physics

0 comments

The pith

Machine learning classifiers separate signal from background hits in resistive plate chamber detectors using fifteen cluster descriptors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Resistive plate chambers in high-energy physics often produce secondary hits that appear as long tails or extra peaks in time spectra, degrading resolution and efficiency especially when no external trigger is available for self-triggering operation. This work trains three classifiers on fifteen features extracted from time and ADC cluster distributions, including statistical measures like mean and width plus Gaussian fit parameters, to label hits as signal or background. Laboratory tests on a low-resistive bakelite prototype with an external scintillator trigger show strong discrimination across DNN, 1D-CNN, and XGBoost models, with XGBoost generalizing best and cluster size plus temporal shape features emerging as the strongest discriminants. If the approach holds, it would allow cleaner hit selection and faster processing in real experiments without relying on external triggers. A reader would care because it turns a common practical nuisance in RPC operation into a solvable classification task.

Core claim

We present a machine-learning-based strategy to separate signal and background hit clusters using fifteen cluster-level descriptors that encode both statistical properties (histogram mean, width, cluster size) and fit-based parameters (Gaussian-fit mean, width, amplitude, chi^2, NDF) of the time and ADC distributions. Using laboratory data collected from a single-gap low resistive RPC with a three-scintillator master trigger, we trained and evaluated three classifiers-DNN, 1D-CNN, and XGBoost-on balanced signal/background samples. All models demonstrate strong discrimination capability, with XGBoost showing the most robust generalization performance. Feature-importance analysis indicatesthat

What carries the argument

Fifteen cluster-level descriptors from time and ADC distributions, including statistical moments and Gaussian fit parameters, supplied to supervised classifiers for signal-background separation.

If this is right

Background suppression via cluster classification improves track reconstruction efficiency and spatial-temporal resolution in self-triggering RPC setups.
Processing time decreases because only signal clusters are retained for further analysis.
Cluster size and temporal-shape descriptors dominate discrimination, suggesting focused feature engineering for similar detectors.
XGBoost provides the most robust performance among the tested models on the balanced laboratory samples.
The compact feature set enables practical deployment without heavy computational overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same descriptors could be tested on data from other gaseous detectors that exhibit secondary hit tails.
Real-time implementation on FPGA or GPU hardware would be a natural next step for online triggering.
Performance might change if background rates or beam conditions differ substantially from the lab setup.
Combining these cluster labels with downstream tracking algorithms could further boost overall efficiency.

Load-bearing premise

Laboratory data collected with an external three-scintillator master trigger accurately represents the background characteristics encountered in self-triggering high-rate environments without external triggers.

What would settle it

Collecting new data in a self-triggering configuration without the external scintillator trigger and measuring whether the trained models maintain the same discrimination performance on those samples.

read the original abstract

Resistive Plate Chambers (RPCs) are widely used as tracking detectors in many high-energy physics experiments. It has been observed that low-resistive bakelite RPC prototypes frequently exhibit a secondary hit component, appearing as a long tail or an additional peak in the time-correlation spectra relative to the trigger detector. These secondary hits, which affect both the time and spatial resolution, are difficult to distinguish from genuine signals in high-rate environments without an external trigger. As a result, they can significantly degrade track reconstruction efficiency and increase processing time. We present a machine-learning-based strategy to separate signal and background hit clusters using fifteen cluster-level descriptors that encode both statistical properties (histogram mean, width, cluster size) and fit-based parameters (Gaussian-fit mean, width, amplitude, chi^2, NDF) of the time and ADC distributions. Using laboratory data collected from a single-gap low resistive RPC with a three-scintillator master trigger, we trained and evaluated three classifiers-DNN, 1D-CNN, and XGBoost-on balanced signal/background samples. All models demonstrate strong discrimination capability, with XGBoost showing the most robust generalization performance. Feature-importance analysis indicates that cluster size and temporal-shape descriptors are the dominant discriminants. These results highlight that compact, interpretable cluster-level features combined with machine-learning classifiers offer a practical and effective approach to suppress background in self-triggering low resistive RPC detectors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ML classifiers flag background clusters in RPC lab data using basic time and charge features, but missing metrics and trigger mismatch leave the real-world gain unclear.

read the letter

The paper shows that three common machine learning models can separate signal from background clusters in a low-resistive RPC using fifteen descriptors based on time and ADC distributions. It is a practical application to a known issue in detector data quality rather than an advance in the algorithms themselves. The authors do a solid job defining the feature set. They combine straightforward statistical measures such as mean, width, and cluster size with parameters from Gaussian fits including amplitude, chi-squared, and degrees of freedom. Training on data from a single-gap prototype with an external three-scintillator trigger, they compare a deep neural net, a one-dimensional convolutional net, and XGBoost. The finding that cluster size and temporal shape features carry the most weight is useful because it ties back to the physics of the secondary hits. Still, the evaluation leaves some gaps. The abstract claims strong performance across the models with XGBoost as the most robust, but it supplies no accuracy figures, no cross-validation scores, and no error bars. That makes it difficult to gauge the actual gain in efficiency or resolution. The bigger concern is the data collection method. Labels come from coincidence with the external trigger, yet the intended use is in self-triggering high-rate environments where background clusters appear without that constraint and may show different distributions in time and size. No rate-dependent test or untriggered validation set is mentioned to check how much the reported discrimination holds up. This kind of work is mainly for instrumentation physicists who are building or upgrading RPC systems in experiments that need clean tracking data at high rates. Someone looking for an off-the-shelf way to reduce background hits would get a clear recipe from the feature list and the model comparison. I would recommend sending it for peer review. The idea is straightforward and the feature engineering is transparent, but the manuscript needs the quantitative results and a direct test of the self-triggering case to stand on its own.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a machine-learning strategy to classify signal versus background hit clusters in low-resistive bakelite RPC prototypes. Fifteen cluster-level descriptors (statistical moments and Gaussian-fit parameters of time and ADC distributions) are extracted from laboratory data collected with a three-scintillator external master trigger. Three classifiers (DNN, 1D-CNN, XGBoost) are trained on balanced samples; the authors report strong discrimination, with XGBoost exhibiting the most robust generalization and feature-importance analysis highlighting cluster size and temporal-shape variables as dominant. The approach is presented as a practical solution for background suppression in self-triggering, high-rate environments where external triggers are unavailable.

Significance. If validated under self-triggered conditions, the work supplies a compact, interpretable feature set and classifier pipeline that could improve time and spatial resolution and reduce processing overhead in RPC-based tracking systems for high-energy physics. The comparison across model families and the explicit feature-importance ranking constitute concrete strengths that aid physical insight and potential deployment.

major comments (2)

[Abstract and §3] Abstract and §3 (data acquisition): labels are defined exclusively by coincidence with the external three-scintillator trigger. The target application is self-triggering operation at higher instantaneous rates, where background cluster statistics (width, tails, size) are expected to differ; no rate-scaling study, untriggered test set, or ablation quantifying performance degradation under this domain shift is provided, leaving the central claim of applicability unsupported by direct evidence.
[§4] §4 (results): the abstract states that “all models demonstrate strong discrimination capability” and that XGBoost shows “the most robust generalization performance,” yet the manuscript excerpt supplies no numerical values (accuracy, AUC, F1, confusion-matrix entries, or cross-validation statistics with uncertainties). Without these load-bearing metrics the quantitative strength of the reported superiority cannot be assessed.

minor comments (2)

[§2] The precise definitions of the fifteen descriptors (e.g., how the Gaussian-fit amplitude, width, and χ²/NDF are computed from the time/ADC histograms) are described only in prose; explicit equations or a supplementary table would improve reproducibility.
[Figure 5] Figure captions and axis labels for the feature-importance plots should explicitly state the number of trees/estimators and the exact importance metric (gain, cover, or frequency) used for XGBoost.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's insightful comments. We have carefully considered each point and made revisions to strengthen the manuscript, including adding quantitative metrics and a discussion on the applicability to self-triggering conditions.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (data acquisition): labels are defined exclusively by coincidence with the external three-scintillator trigger. The target application is self-triggering operation at higher instantaneous rates, where background cluster statistics (width, tails, size) are expected to differ; no rate-scaling study, untriggered test set, or ablation quantifying performance degradation under this domain shift is provided, leaving the central claim of applicability unsupported by direct evidence.

Authors: We thank the referee for highlighting this important aspect. The use of an external trigger is necessary for labeling signal and background in the supervised learning setup. The secondary hits identified in the triggered data are the same background component expected in self-triggering operation. While we agree that a dedicated study at higher rates or with untriggered data would be ideal to quantify any domain shift, the current prototype setup did not allow for such measurements. In the revised manuscript, we have expanded §5 to discuss this limitation explicitly and emphasize that the selected features (cluster size, temporal shape) are intrinsic properties likely to generalize. We have also added a sentence in the abstract clarifying the scope. revision: partial
Referee: [§4] §4 (results): the abstract states that “all models demonstrate strong discrimination capability” and that XGBoost shows “the most robust generalization performance,” yet the manuscript excerpt supplies no numerical values (accuracy, AUC, F1, confusion-matrix entries, or cross-validation statistics with uncertainties). Without these load-bearing metrics the quantitative strength of the reported superiority cannot be assessed.

Authors: We agree that the quantitative metrics should be more explicitly stated. In the revised manuscript, we have added the specific performance metrics (accuracy, AUC, F1, confusion matrices, and cross-validation statistics with uncertainties) to the abstract and §4, including a summary table for clarity. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised ML evaluation on externally labeled data

full rationale

The paper extracts 15 cluster-level features from time/ADC distributions in externally triggered laboratory data, trains DNN/1D-CNN/XGBoost classifiers on balanced signal/background samples defined by the three-scintillator coincidence, and reports empirical discrimination metrics plus feature importance on held-out test data. No equations, self-citations, or parameter fits reduce the reported performance numbers to the input labels by construction; the classifiers are ordinary supervised learners whose outputs are not tautological with the trigger-based labeling. The representativeness of the triggered data for self-triggered high-rate operation is a generalization question, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the chosen cluster descriptors capture all relevant signal-background differences and that lab-triggered data generalizes to self-triggered operation.

free parameters (1)

XGBoost and neural-network hyperparameters
Tuned during training to achieve reported discrimination; exact values not stated in abstract.

axioms (2)

domain assumption The fifteen cluster-level descriptors encode the essential statistical and shape differences between signal and background hits.
Invoked by the feature-engineering step described in the abstract.
domain assumption Laboratory data with external trigger is representative of background in self-triggered high-rate running.
Required for the generalization claim but not directly tested in the described setup.

pith-pipeline@v0.9.0 · 5554 in / 1326 out tokens · 49861 ms · 2026-05-14T01:28:38.811752+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Santonico, R

R. Santonico, R. Cardarelli, Development of resistive plate counters, Nuclear Instruments and Methods in Physics Research 187 (2) (1981) 377–380. doi:https://doi.org/10.1016/0029-554X(81)90363-3. URLhttps://www.sciencedirect.com/science/article/pii/0029554X81903633

work page doi:10.1016/0029-554x(81)90363-3 1981
[2]

rep., CERN, Geneva (2017)

Technical Design Report for the Phase-II Upgrade of the ATLAS Muon Spectrometer, Tech. rep., CERN, Geneva (2017). URLhttps://cds.cern.ch/record/2285580

work page arXiv 2017
[3]

X. Y. Xie, H. L. Xu, Q. Y. Li, Y. J. Sun, A data-based machine learning approach for RPC time resolution study based on ToF reconstruction, JINST 16 (12) (2021) P12002. doi:10.1088/1748-0221/16/12/P12002

work page doi:10.1088/1748-0221/16/12/p12002 2021
[4]

K. K. et. al, Characterization of the sts/much-xyter2, a 128-channel time and amplitude measurement ic for gas and silicon microstrip sensors, Nuclear Instruments and Methods in Physics Research – 16 – Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 908 (2018) 225–235. doi:https://doi.org/10.1016/j.nima.2018.08.076. URLhttps://w...

work page doi:10.1016/j.nima.2018.08.076 2018
[5]

Chattopadhay, A

S. Chattopadhay, A. Agarwal, E. Nandy, J. Saini, A. K. Dubey, S. A. Khan, S. Chattopadhyay, Z. Ahammed, Performance of a real-size, low resistivity resistive plate chamber at gif++ using self-trigger electronics for the muon chamber of the cbm experiment, Journal of Instrumentation 20 (03) (2025) P03009.doi:10.1088/1748-0221/20/03/P03009. URLhttps://doi.o...

work page doi:10.1088/1748-0221/20/03/p03009 2025
[6]

Ganai, Z

R. Ganai, Z. Ahammed, J. Saini, S. A. Khan, A. Bhattacharyya, S. Chattopadhay, S. Chattopadhyay, Development and performance studies of a real size resistive plate chamber tested at gif++, cern for cbm-much at fair, germany, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 1054 ...

work page doi:10.1016/j.nima.2023.168384 2023
[7]

Mondal, T

M. Mondal, T. Dey, S. Chattopadhyay, J. Saini, Z. Ahammed, Performance of a prototype bakelite rpc at gif++ using self-triggered electronics for the cbm experiment at fair, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 1025 (2022) 166042.doi:https://doi.org/10.1016/j.nima.202...

work page doi:10.1016/j.nima.2021.166042 2022
[8]

Shumka, A

E. Shumka, A. Samalan, M. Tytgat, M. El Sawy, G. Alves, F. Marujo, E. Coelho, E. Da Costa, H. Nogima, A. Santoro, S. F. De Souza, D. De Jesus Damiao, M. Thiel, K. M. Amarilo, M. B. F. Filho, A.Aleksandrov, R.Hadjiiska, P.Iaydjiev, M.Rodozov, M.Shopova, G.Soultanov, A.Dimitrov, L. Litov, B. Pavlov, P. Petkov, A. Petrov, S. Qian, H. Kou, Z.-A. Liu, J. Zhao,...

work page doi:10.1016/j.nima.2023.168449 2023
[9]

Burazin Mišura, J

A. Burazin Mišura, J. Musić, M. Prvan, D. Lelas, Towards real-time machine learning-based signal/background selection in the cms detector using quantized neural networks and input data reduction, Applied Sciences 14 (4) (2024).doi:10.3390/app14041559. URLhttps://www.mdpi.com/2076-3417/14/4/1559

work page doi:10.3390/app14041559 2024
[10]

LeCun, Y

Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–44. doi:10.1038/nature14539

work page doi:10.1038/nature14539 2015
[11]

Abadi, A

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke...

work page 2015
[12]

Chollet, et al., Keras,https://keras.io(2015)

F. Chollet, et al., Keras,https://keras.io(2015)

work page 2015
[13]

Friedman, Greedy function approximation: A gradient boosting machine, The Annals of Statistics 29 (11 2000).doi:10.1214/aos/1013203451

J. Friedman, Greedy function approximation: A gradient boosting machine, The Annals of Statistics 29 (11 2000).doi:10.1214/aos/1013203451

work page doi:10.1214/aos/1013203451 2000
[14]

T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining, KDD’16, Association for Computing Machinery, New York, NY, USA, 2016, p. 785–794. doi:10.1145/2939672.2939785. URLhttps://doi.org/10.1145/2939672.2939785 – 18 –

work page doi:10.1145/2939672.2939785 2016