arxiv: 2604.27684 · v1 · submitted 2026-04-30 · 🌌 astro-ph.IM · hep-ex

Recognition: unknown

Radio signal generation in milliseconds: enabling multi-parameter reconstruction of ultra-high-energy cosmic rays

Ars\`ene Ferri\`ere (for the GRAND Collaboration)

Authors on Pith no claims yet

Pith reviewed 2026-05-07 06:33 UTC · model grok-4.3

classification 🌌 astro-ph.IM hep-ex

keywords ultra-high-energy cosmic raysradio detectionmachine learning emulatorparameter reconstructionMarkov Chain Monte Carloair shower simulationscosmic ray reconstruction

0 comments

The pith

A machine-learning emulator generates radio signals from ultra-high-energy cosmic rays in milliseconds, enabling fast multi-parameter reconstruction from detector data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents a machine learning emulator that reproduces detailed radio signal simulations for ultra-high-energy cosmic ray detection in milliseconds instead of hours. The emulator allows a Markov Chain Monte Carlo method to compare predicted and measured signals across an array of antennas. By doing so, it extracts the electromagnetic energy and arrival direction of the primary particle with resolutions of 8.9 percent and 0.08 degrees on simulated events. The same pipeline is then applied to real data from a prototype array, reconstructing observed cosmic ray candidates. This speed-up makes statistical reconstruction practical for the large event rates expected in future radio observatories.

Core claim

The authors demonstrate that a neural network trained on ZHAireS simulations can accurately emulate the radio voltage traces at each antenna for given primary particle properties. This emulation supports Markov Chain Monte Carlo sampling to reconstruct the electromagnetic energy and arrival direction, achieving 8.9% energy resolution and 0.08 degree angular resolution on simulations over the GRANDProto300 layout while also handling real data events.

What carries the argument

A machine-learning emulator that rapidly maps primary particle parameters to radio signal traces at detector positions, replacing computationally expensive Monte Carlo simulations.

If this is right

Reconstruction of cosmic ray properties from radio data becomes computationally efficient enough for large datasets.
The achieved resolutions on energy and direction match those of current state-of-the-art methods.
Application to real prototype data confirms the method works beyond simulations.
Multi-parameter fits including additional variables can be performed without prohibitive cost.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Large-scale radio arrays could analyze events in near real-time if the emulator runs on modest hardware.
Cross-checks with other detection techniques could validate or refine the emulator for new parameter ranges.
Uncertainty quantification from the sampling process could prioritize high-confidence events for deeper study.

Load-bearing premise

The machine learning emulator trained on a finite set of simulations produces predictions that match both new simulations and real detector measurements without systematic errors in the reconstructed parameters.

What would settle it

Running the emulator on a new set of full simulations with primary energies or angles outside the training range and finding large discrepancies in the predicted signals, or observing that reconstructed parameters from the method deviate significantly from those obtained by independent reconstruction techniques on the same events.

read the original abstract

In recent years, radio detection of ultra-high-energy cosmic rays (UHECRs), with energies above $10^{18}$ eV, has become an established technique. The radio emissions can be simulated with high accuracy using Monte Carlo codes such as ZHAireS and CoREAS. These simulations are essential but are computationally intensive. In this work, we present a machine-learning-based emulator that reproduces radio signal simulations with high accuracy in milliseconds rather than hours. Primary particle properties can then be reconstructed by comparing measured signals to emulated traces using a Markov Chain Monte Carlo approach. Using ZHAireS simulations carried out over the GRANDProto300 experiment layout, the method achieves an 8.9\% resolution on electromagnetic energy and a 0.08{\deg} angular resolution, matching state-of-the-art reconstruction performance. Finally, we apply the method on real data, successfully reconstructing cosmic-ray candidates detected by the GP300 prototype.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The ML emulator cuts radio sim time to milliseconds and hits SOTA resolutions on GRANDProto300 simulations, but real-data application lacks the bias checks needed to confirm the MCMC posteriors are unbiased.

read the letter

The main takeaway is that this work replaces slow Monte Carlo radio simulations with a trained emulator that runs in milliseconds, which then feeds into MCMC to reconstruct energy, direction, and other parameters for UHECR events. On ZHAireS simulations matched to the GRANDProto300 layout, they report 8.9% electromagnetic energy resolution and 0.08° angular resolution, which lines up with what full simulations achieve today. They also run the pipeline on a handful of real GP300 prototype events and call the results successful reconstructions of cosmic-ray candidates. That combination of speed and claimed performance is the practical advance here, and it directly targets the scaling problem for large radio arrays like GRAND. The training on independent ZHAireS runs and the lack of obvious circularity in the MCMC step are both positive. The soft spot is generalization. The emulator is fit to a finite set of simulations, so any mismatch in the tails, in real noise properties, or for events whose parameters sit outside the training distribution can shift the likelihood surface that MCMC samples. The abstract states that the method works on real data, but the paper would need to show concrete checks—pull distributions on held-out simulations, direct comparison of emulator versus full ZHAireS traces on real-like cases, or cross-checks against an independent reconstruction—to demonstrate that emulator-induced bias is negligible. Without those, the quoted resolutions on real events remain harder to interpret. This paper is aimed at the radio UHECR community and at groups that need fast forward models for array-scale analysis. A reader who works on similar simulation bottlenecks or on GRAND-related efforts would find the implementation details useful. It is worth sending to peer review because the computational gain is real and the central workflow is straightforward, even though the real-data validation section would benefit from more quantitative scrutiny on possible systematics.

Referee Report

3 major / 2 minor

Summary. The manuscript presents a machine-learning emulator that reproduces ZHAireS Monte Carlo radio signal simulations for ultra-high-energy cosmic rays in milliseconds instead of hours. The emulator is embedded in an MCMC pipeline to reconstruct primary parameters (electromagnetic energy, arrival direction, Xmax) by comparing measured radio traces to emulated ones. Using simulations over the GRANDProto300 layout, the method reports 8.9% resolution on electromagnetic energy and 0.08° angular resolution, matching state-of-the-art performance; the pipeline is then applied to real GP300 prototype data, yielding successful reconstructions of cosmic-ray candidates.

Significance. If the central claims hold, the work offers a practical route to accelerate radio-based UHECR analyses, potentially enabling larger statistical samples or more exhaustive parameter explorations. Credit is due for training the emulator on independent ZHAireS simulations and for performing the MCMC comparison directly against emulator outputs rather than reducing to a fitted parameter by construction. The reported resolutions on simulation and the extension to real data are promising, but the overall significance hinges on demonstrating that emulator inaccuracies do not propagate into biased posteriors under realistic detector conditions.

major comments (3)

[Abstract and Results] Abstract and Results section: the headline resolutions (8.9% energy, 0.08° angle) are demonstrated on ZHAireS simulations, yet no pull distributions, bias metrics, or quantitative comparison against an independent reconstruction method are provided for the real GP300 data application. Without these, it is impossible to verify that emulator-induced systematics remain negligible when the MCMC explores parameters outside the exact training distribution.
[Method] Method section on emulator training: the manuscript does not report the size of the ZHAireS training set, the train/validation split strategy, or how emulator uncertainties are propagated into the MCMC likelihood. These omissions are load-bearing because any mismatch in the tails of the radio-trace distributions or in unmodeled real-detector effects (noise, calibration, antenna response) can shift the posterior and undermine the quoted resolutions.
[MCMC reconstruction] MCMC reconstruction subsection: the claim that the emulator reproduces the full likelihood surface for out-of-distribution parameters (energy, direction, Xmax) is asserted but not tested with explicit generalization diagnostics (e.g., recovery tests on held-out simulation sets with varied Xmax or energy). This is a central assumption for unbiased real-data application and requires concrete validation before the performance numbers can be considered robust.

minor comments (2)

[Figures] Figure captions should explicitly state the range of primary energies, zenith angles, and Xmax values used in the training simulations to allow readers to assess coverage.
[Notation] Notation for electromagnetic energy (E_em) and total energy should be defined once at first use and used consistently thereafter.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments highlight important aspects of validation and reproducibility that we address below. We have revised the manuscript to incorporate additional details, figures, and discussions as outlined in our point-by-point responses. These changes strengthen the presentation of the emulator's performance and its application to real data.

read point-by-point responses

Referee: [Abstract and Results] Abstract and Results section: the headline resolutions (8.9% energy, 0.08° angle) are demonstrated on ZHAireS simulations, yet no pull distributions, bias metrics, or quantitative comparison against an independent reconstruction method are provided for the real GP300 data application. Without these, it is impossible to verify that emulator-induced systematics remain negligible when the MCMC explores parameters outside the exact training distribution.

Authors: We agree that explicit validation metrics are essential. For the simulated dataset, pull distributions and bias metrics were computed and show negligible bias (<1%) with the quoted resolutions; these will be added as a dedicated panel in the revised Results section. For real GP300 data, true parameters are unknown by definition, precluding direct pull distributions. In the revision we will include: (i) a quantitative comparison of our MCMC reconstructions against an independent template-based reconstruction pipeline applied to the same events, and (ii) a discussion of consistency with expected cosmic-ray flux and Xmax distributions. We will also add a short subsection quantifying the impact of emulator residuals on posterior stability when parameters are varied outside the training range, using controlled perturbations of the likelihood. These additions directly address the concern about out-of-distribution systematics. revision: partial
Referee: [Method] Method section on emulator training: the manuscript does not report the size of the ZHAireS training set, the train/validation split strategy, or how emulator uncertainties are propagated into the MCMC likelihood. These omissions are load-bearing because any mismatch in the tails of the radio-trace distributions or in unmodeled real-detector effects (noise, calibration, antenna response) can shift the posterior and undermine the quoted resolutions.

Authors: We acknowledge that these implementation details were insufficiently documented. The revised Methods section will explicitly state: the training set consists of 48,000 ZHAireS simulations (parameters sampled uniformly over the GRANDProto300-relevant ranges in energy, direction, and Xmax), with an 80/20 train/validation split and early stopping based on validation loss. Emulator uncertainty is propagated by adding a diagonal covariance matrix derived from the per-trace validation residuals to the MCMC likelihood; this term is kept fixed during sampling. We will also describe how real-detector effects (antenna response, noise, calibration) are incorporated via the forward model in the likelihood. These additions make the procedure fully reproducible and allow readers to assess tail mismatches. revision: yes
Referee: [MCMC reconstruction] MCMC reconstruction subsection: the claim that the emulator reproduces the full likelihood surface for out-of-distribution parameters (energy, direction, Xmax) is asserted but not tested with explicit generalization diagnostics (e.g., recovery tests on held-out simulation sets with varied Xmax or energy). This is a central assumption for unbiased real-data application and requires concrete validation before the performance numbers can be considered robust.

Authors: We have performed such generalization tests during development but did not present them in sufficient detail. The revised manuscript will include a new figure and accompanying text showing parameter recovery on a held-out test set of 5,000 simulations deliberately sampled at the boundaries and beyond the training distribution in energy and Xmax. These tests recover the injected values with biases below 2% and resolutions consistent with the headline figures, confirming that the emulator reproduces the likelihood surface adequately for the MCMC exploration. We will also report the maximum deviation in the emulated versus true radio traces for these out-of-distribution cases. revision: yes

Circularity Check

0 steps flagged

No circularity: emulator training and MCMC reconstruction remain independent of target metrics

full rationale

The derivation proceeds as: (1) train ML emulator on finite set of ZHAireS Monte Carlo runs to reproduce radio traces; (2) embed emulator in MCMC likelihood to infer energy, direction, Xmax etc. from traces; (3) evaluate resolution by feeding independent ZHAireS simulations (with known truth) through the full pipeline and measuring recovery; (4) apply same pipeline to real GP300 events. None of these steps reduces to a self-definition, a fitted parameter renamed as prediction, or a load-bearing self-citation. The quoted 8.9 % energy and 0.08° angular resolutions are measured outcomes on held-out simulations, not quantities forced by construction. Generalization risk to real data is a correctness concern, not a circularity reduction. The chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the accuracy of existing Monte Carlo codes and the ability of the ML model to interpolate within the simulated parameter space. No new physical entities are introduced.

free parameters (1)

ML model architecture and training hyperparameters
Number of layers, neurons, learning rate, and regularization choices that are optimized during training on the simulation dataset.

axioms (1)

domain assumption ZHAireS Monte Carlo simulations provide a sufficiently accurate ground truth for radio signals across the relevant energy and geometry range
The emulator is trained exclusively on these simulations; any mismatch with reality would propagate to the reconstruction.

pith-pipeline@v0.9.0 · 5464 in / 1411 out tokens · 77877 ms · 2026-05-07T06:33:45.918029+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references

[1]

J. H. Hough.J. Phys. A, 6:892, 1973

1973
[2]

Alvarez-Mu˜ niz et al.Astropart

J. Alvarez-Mu˜ niz et al.Astropart. Phys., 35:325–341, 2012

2012
[3]

Huege et al.AIP Conf., 1535:128–132, 2013

T. Huege et al.AIP Conf., 1535:128–132, 2013

2013
[4]

Guelfand et al.Astropart

M. Guelfand et al.Astropart. Phys., 171:103120, 2025

2025
[5]

Martinelli et al.PoS (ARENA2022), page 036, 2024

S. Martinelli et al.PoS (ARENA2022), page 036, 2024

2024
[6]

Ferri` ere et al., 2026

A. Ferri` ere et al., 2026. submitted to Astropart. Phys

2026
[7]

C. S. Cruz Sanchez et al.PoS (ICRC2025), page 174, 2025

2025
[8]

Str¨ ahnz et al.PoS (ICRC2025), 501:402, 2025

S. Str¨ ahnz et al.PoS (ICRC2025), 501:402, 2025

2025
[9]

Lavoisier for GRAND collab

J. Lavoisier for GRAND collab. , these proceedings