Phast: Simultaneous reconstruction of photoelectron count and time profiles from PMT waveforms via machine learning

Hongyue Duyang; Siyu Chen; Teng Li; Yaoguang Wang; Yiming Xu; Youwen Fan

arxiv: 2605.29391 · v1 · pith:OITJEWIBnew · submitted 2026-05-28 · ✦ hep-ex

Phast: Simultaneous reconstruction of photoelectron count and time profiles from PMT waveforms via machine learning

Yiming Xu , Youwen Fan , Siyu Chen , Hongyue Duyang , Teng Li , Yaoguang Wang This is my paper

Pith reviewed 2026-06-29 00:11 UTC · model grok-4.3

classification ✦ hep-ex

keywords PMT waveform reconstructionmachine learningphotoelectron countingtiming reconstructiontransformer decoderpileupMonte Carlo simulation

0 comments

The pith

A transformer model with shared encoder and count-conditioned decoder reconstructs photoelectron count and timing from PMT waveforms simultaneously.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Phast, a machine learning method to extract both the total number of photoelectrons and their time profile from photomultiplier tube waveforms. Electronic effects such as pileup, charge fluctuations and noise complicate this task in real detectors. The architecture uses a shared wave-transformer encoder, a dedicated counting branch, and a time branch that applies a count-conditioned query decoder with dynamic query activation. Performance is evaluated on toy Monte Carlo waveform datasets that include uniform and mixed fast-slow double-temporal-component cases. The results show stable accuracy in both counting and timing under the tested conditions.

Core claim

Phast consists of a shared wave-transformer encoder followed by a counting branch for total PE number prediction and a time branch that employs a count-conditioned query decoder with dynamic query activation; this structure reconstructs PE count and time profile simultaneously and maintains high consistency across uniform and mixed fast-slow double-temporal-component toy Monte Carlo PMT waveform datasets.

What carries the argument

Shared wave-transformer encoder followed by a counting branch for total PE number prediction and a time branch employing a count-conditioned query decoder with dynamic query activation.

If this is right

Accurate simultaneous reconstruction of count and time remains stable when pileup and noise levels vary within the simulated waveform sets.
The count-conditioned query decoder enables the time branch to produce consistent profiles once the total PE number is known.
Convolutional feature extraction combined with query-based transformer decoding handles both single-component and mixed fast-slow waveforms without separate processing chains.
High consistency between predicted and true values holds for both the total count and the detailed time distribution in the tested configurations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same architecture performs well on real detector data, it could reduce systematic uncertainties in downstream event reconstruction for large neutrino or dark-matter experiments.
The query-decoder approach may generalize to waveform data from other photosensor types or to signals with additional temporal components.
Real-time deployment on FPGA or GPU hardware could support high-rate environments where conventional methods become computationally expensive.
A direct comparison of reconstruction variance between Phast and template-fitting methods on the same real waveforms would quantify any practical gain.

Load-bearing premise

The toy Monte Carlo PMT waveform datasets, including uniform and mixed fast-slow double-temporal-component configurations, sufficiently represent the electronic effects present in real detector environments.

What would settle it

Apply the trained model to real PMT waveforms recorded in a physics experiment and compare the output PE counts and times against results from conventional reconstruction algorithms or against known calibration signals.

read the original abstract

Photomultiplier tubes (PMTs) are widely used in particle and nuclear physics experiments. The reconstruction of PMT waveforms is a fundamental task in these experiments, where accurate extraction of photoelectron (PE) multiplicities and time from the waveform is required for downstream event reconstruction and analysis. In realistic detector environments, PMT waveform reconstruction is complicated by electronic effects such as pileup, charge fluctuations, noise etc., which make precise recovery of physical observables challenging. To address these challenges, we present \phast{}, a machine-learning-based method that reconstructs PE count and time profile simultaneously. The model consists of a shared wave-transformer encoder followed by two dedicated branches: a counting branch for the total PE number prediction, and a time branch employing a count-conditioned query decoder with dynamic query activation. To study the reconstruction performance under controlled conditions, we construct several toy Monte Carlo PMT waveform datasets, including both uniform and mixed fast-slow double-temporal-components configurations. The proposed method demonstrates stable and accurate reconstruction performance across various waveform conditions, achieving high consistency in both PE counting and time reconstruction. These results indicate that architectures combining convolutional feature extraction with query-based transformer decoders provide an effective approach for complex PMT waveform reconstruction tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New transformer architecture for joint PE count and time extraction from PMT waveforms, but results shown only on toy Monte Carlo with no baselines or real-data checks.

read the letter

The main thing here is a new ML setup that shares a wave-transformer encoder and then uses a count-conditioned query decoder to pull both total photoelectron number and time profile at once. That specific combination looks fresh for this exact task.

What the paper does is train and test the model on several controlled toy Monte Carlo waveform sets, some with uniform single-PE responses and some mixing fast and slow components. The abstract claims stable performance across those conditions. The architecture choice makes sense for handling variable numbers of photoelectrons without fixed output sizes.

The soft spots are straightforward. No numbers appear in the abstract—no accuracy, no resolution, no error bars, no comparison to template fitting or simpler neural nets. Everything rests on synthetic data whose generation details are not spelled out. The stress-test note is on target: real PMT effects like afterpulsing, baseline wander, charge fluctuations at the single-PE level, and digitizer nonlinearity are not obviously injected at realistic rates. If those are missing, the reported consistency will not carry over to actual detector waveforms.

This is aimed at experimental groups that already run PMT-based detectors and want a drop-in reconstruction tool. A reader who needs a practical method for pileup-heavy data might get an idea from the architecture, but would still have to implement and validate it themselves.

The work is coherent on its own terms and shows clear thinking about the joint reconstruction problem. It deserves a serious referee to check whether the full paper supplies the missing metrics, baselines, and realism tests.

Referee Report

2 major / 2 minor

Summary. The paper presents PHAST, a machine-learning method for simultaneous reconstruction of photoelectron (PE) count and time profiles from PMT waveforms. The architecture consists of a shared wave-transformer encoder followed by a counting branch for total PE number and a time branch that uses a count-conditioned query decoder with dynamic query activation. Evaluation is performed exclusively on toy Monte Carlo PMT waveform datasets in both uniform and mixed fast-slow double-temporal-component configurations, with the claim of stable and accurate reconstruction performance under controlled waveform conditions.

Significance. If the performance claims are substantiated with quantitative metrics and extended to realistic conditions, the work could contribute a modern ML approach to PMT waveform analysis in particle and nuclear physics, where simultaneous count and timing extraction is valuable for event reconstruction. The query-based decoder conditioned on count predictions is a technically interesting design choice for handling variable PE multiplicities.

major comments (2)

[Abstract] Abstract: the claim of 'high consistency in both PE counting and time reconstruction' is presented without any numerical metrics, baseline comparisons, error bars, training details, or data-exclusion criteria. This absence prevents verification of the central performance assertions.
[Evaluation on toy datasets] Evaluation section (toy MC datasets): all results are confined to synthetic uniform and mixed fast-slow waveforms. No tests are reported on real detector waveforms or on simulations that inject calibrated electronic effects (pileup at realistic rates, baseline wander, afterpulsing, digitizer nonlinearity, or charge fluctuations). Because the introduction explicitly motivates the method by these realistic complications, the lack of such validation is load-bearing for any claim of applicability beyond controlled toy conditions.

minor comments (2)

[Title and Abstract] Title uses 'Phast' while the abstract uses \phast{}; consistent acronym usage would improve readability.
[Results] The manuscript would benefit from at least one explicit comparison to a conventional reconstruction technique (e.g., leading-edge or constant-fraction timing combined with threshold counting) to place the ML results in context.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We respond to each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'high consistency in both PE counting and time reconstruction' is presented without any numerical metrics, baseline comparisons, error bars, training details, or data-exclusion criteria. This absence prevents verification of the central performance assertions.

Authors: We agree that the abstract would benefit from quantitative support for the performance claims. In the revised manuscript we will update the abstract to include key numerical results from the evaluations, such as PE counting accuracy and timing metrics, while retaining the high-level summary. revision: yes
Referee: [Evaluation on toy datasets] Evaluation section (toy MC datasets): all results are confined to synthetic uniform and mixed fast-slow waveforms. No tests are reported on real detector waveforms or on simulations that inject calibrated electronic effects (pileup at realistic rates, baseline wander, afterpulsing, digitizer nonlinearity, or charge fluctuations). Because the introduction explicitly motivates the method by these realistic complications, the lack of such validation is load-bearing for any claim of applicability beyond controlled toy conditions.

Authors: The manuscript states that evaluation is performed exclusively on toy Monte Carlo datasets under controlled conditions to isolate the method's behavior. While the introduction references realistic complications as motivation, the presented work is limited to these simplified settings. We will add an explicit limitations section clarifying the scope and outlining future extensions to realistic waveforms and effects such as pileup and afterpulsing. revision: partial

Circularity Check

0 steps flagged

No circularity: ML performance evaluated on held-out toy MC splits with no self-referential derivations

full rationale

The paper describes a transformer-based ML model trained on constructed toy Monte Carlo PMT waveform datasets (uniform and mixed fast-slow configurations) and reports reconstruction metrics on those same simulated conditions. No equations, parameters, or self-citations are presented that reduce the reported accuracy/consistency to quantities fitted on the evaluation data by construction. Standard train/test separation on synthetic data is used; the central claim is empirical performance under controlled simulation, which does not match any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, mathematical axioms, or invented physical entities are stated. The central claim rests on the unexamined assumption that performance on the described toy MC datasets transfers to real data.

pith-pipeline@v0.9.1-grok · 5764 in / 1110 out tokens · 24844 ms · 2026-06-29T00:11:56.026770+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Yu et al.,Waveform reconstruction method for photomultiplier tubes in the juno experiment,Nucl

Z. Yu et al.,Waveform reconstruction method for photomultiplier tubes in the juno experiment,Nucl. Instrum. Meth. A988(2021) 164896

2021
[2]

Adamson et al.,Reconstruction of overlapping photomultiplier tube signals using maximum likelihood methods,Nucl

P. Adamson et al.,Reconstruction of overlapping photomultiplier tube signals using maximum likelihood methods,Nucl. Instrum. Meth. A492(2002) 325

2002
[3]

Xu et al.,Fsmp: Fast stochastic matching pursuit for pmt waveform reconstruction,Nucl

B. Xu et al.,Fsmp: Fast stochastic matching pursuit for pmt waveform reconstruction,Nucl. Instrum. Meth. A1058(2024) 168839

2024
[4]

Jiang, G

W. Jiang, G. Huang, Z. Liu, W. Luo, L. Wen and J. Luo,Machine-learning based photon counting for pmt waveforms and its application to the improvement of the energy resolution in large liquid scintillator detectors,Eur. Phys. J. C85(2025) 69

2025
[5]

Zhang et al.,Pmt waveform simulation and reconstruction with conditional diffusion networks,Mach

Y. Zhang et al.,Pmt waveform simulation and reconstruction with conditional diffusion networks,Mach. Learn.: Sci. Technol.6(2025) 015042

2025
[6]

Attention Is All You Need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez et al.,Attention is all you need, inAdvances in Neural Information Processing Systems 30, 2017 [1706.03762]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Carion, F

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov and S. Zagoruyko,End-to-end object detection with transformers, inComputer Vision – ECCV 2020, 2020 [2005.12872]

work page arXiv 2020
[8]

Cheng, A

B. Cheng, A. Schwing and A. Kirillov,Masked-attention mask transformer for universal image segmentation, inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 1280–1289, 2022 [2112.01527]

work page arXiv 2022
[9]

Kuhn,The hungarian method for the assignment problem,Naval Res

H.W. Kuhn,The hungarian method for the assignment problem,Naval Res. Logist. Q.2 (1955) 83

1955
[10]

Jetter, D

S. Jetter, D. Dwyer, W.-Q. Jiang, D.-W. Liu, Y.-F. Wang, Z.-M. Wang et al.,Pmt waveform modeling at the daya bay experiment,Chin. Phys. C36(2012) 733. A Waveform simulation The waveform simulation mainly consists of four procedures,including PE-count and arrival- time sampling, SPE template and charge smearing, overshoot, baseline, and noise. PE-count and...

2012

[1] [1]

Yu et al.,Waveform reconstruction method for photomultiplier tubes in the juno experiment,Nucl

Z. Yu et al.,Waveform reconstruction method for photomultiplier tubes in the juno experiment,Nucl. Instrum. Meth. A988(2021) 164896

2021

[2] [2]

Adamson et al.,Reconstruction of overlapping photomultiplier tube signals using maximum likelihood methods,Nucl

P. Adamson et al.,Reconstruction of overlapping photomultiplier tube signals using maximum likelihood methods,Nucl. Instrum. Meth. A492(2002) 325

2002

[3] [3]

Xu et al.,Fsmp: Fast stochastic matching pursuit for pmt waveform reconstruction,Nucl

B. Xu et al.,Fsmp: Fast stochastic matching pursuit for pmt waveform reconstruction,Nucl. Instrum. Meth. A1058(2024) 168839

2024

[4] [4]

Jiang, G

W. Jiang, G. Huang, Z. Liu, W. Luo, L. Wen and J. Luo,Machine-learning based photon counting for pmt waveforms and its application to the improvement of the energy resolution in large liquid scintillator detectors,Eur. Phys. J. C85(2025) 69

2025

[5] [5]

Zhang et al.,Pmt waveform simulation and reconstruction with conditional diffusion networks,Mach

Y. Zhang et al.,Pmt waveform simulation and reconstruction with conditional diffusion networks,Mach. Learn.: Sci. Technol.6(2025) 015042

2025

[6] [6]

Attention Is All You Need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez et al.,Attention is all you need, inAdvances in Neural Information Processing Systems 30, 2017 [1706.03762]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Carion, F

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov and S. Zagoruyko,End-to-end object detection with transformers, inComputer Vision – ECCV 2020, 2020 [2005.12872]

work page arXiv 2020

[8] [8]

Cheng, A

B. Cheng, A. Schwing and A. Kirillov,Masked-attention mask transformer for universal image segmentation, inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 1280–1289, 2022 [2112.01527]

work page arXiv 2022

[9] [9]

Kuhn,The hungarian method for the assignment problem,Naval Res

H.W. Kuhn,The hungarian method for the assignment problem,Naval Res. Logist. Q.2 (1955) 83

1955

[10] [10]

Jetter, D

S. Jetter, D. Dwyer, W.-Q. Jiang, D.-W. Liu, Y.-F. Wang, Z.-M. Wang et al.,Pmt waveform modeling at the daya bay experiment,Chin. Phys. C36(2012) 733. A Waveform simulation The waveform simulation mainly consists of four procedures,including PE-count and arrival- time sampling, SPE template and charge smearing, overshoot, baseline, and noise. PE-count and...

2012