Multimodal Transformer Based Generic Mixture Density Network for Scattering Timescale Estimation of Fast Radio Bursts

Afrokk Khan; Bikash Kharel; Emmanuel Fonseca; Lordrick Kahinga; Mason Ng; Mawson W. Simmons; Paul Scholz; Srinjoy Das

arxiv: 2606.03596 · v1 · pith:PQEDWJN4new · submitted 2026-06-02 · 🌌 astro-ph.HE · stat.ML

Multimodal Transformer Based Generic Mixture Density Network for Scattering Timescale Estimation of Fast Radio Bursts

Bikash Kharel , Emmanuel Fonseca , Srinjoy Das , Mason Ng , Paul Scholz , Mawson W. Simmons , Lordrick Kahinga , Afrokk Khan This is my paper

Pith reviewed 2026-06-28 09:14 UTC · model grok-4.3

classification 🌌 astro-ph.HE stat.ML

keywords fast radio burstsscattering timescalemixture density networktransformerdeep learningprobabilistic predictionheteroskedastic errorsdynamic spectrum

0 comments

The pith

A multimodal transformer model estimates scattering timescales for fast radio bursts by predicting full probability distributions from spectra and profiles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a neural network architecture that processes both the dynamic spectrum and the time series profile of fast radio bursts in parallel to estimate the scattering timescale. It fuses features from transformer encoders and uses a mixture density output to handle the many cases where scattering is too weak to measure. This offers a faster alternative to traditional fitting methods while also supplying uncertainty ranges for each prediction.

Core claim

The MT-GMDN ingests the dynamic spectrum and timeseries profile through parallel transformer encoders, fuses their latent representations, and predicts the distribution of the scattering timescale using a generic mixture-density formulation. This captures both measurable scattering values and the zero-inflated population of bursts with unresolvable scattering. On held-out test data the expected values achieve a coefficient of determination of 94 percent with 90 percent recall, and the model incorporates heteroskedastic errors to allow confidence intervals.

What carries the argument

Multimodal Transformer Based Generic Mixture Density Network that runs parallel transformers on spectrum and profile inputs before fusing to a mixture density head for the scattering timescale.

If this is right

The approach scales to large numbers of bursts without requiring manual supervision or careful initialization.
It distinguishes bursts with measurable scattering from those without through the mixture components.
Predictions come with uncertainty estimates derived from the output distributions and heteroskedastic modeling.
Training on thousands of examples produces high accuracy on unseen data from the same survey.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar architectures could estimate other burst parameters such as dispersion measure or burst width.
Deployment in real-time detection pipelines would enable immediate parameter reporting alongside discovery.
Cross-validation against independent measurements from different instruments would test robustness beyond the training survey.

Load-bearing premise

The held-out events used for testing represent the statistical properties of future observations and the mixture-density formulation captures the zero-inflated population without overfitting to the training data.

What would settle it

Measuring the coefficient of determination and recall on an independent set of fast radio bursts observed with a different telescope and comparing against manual template fits; a substantial drop below 94 percent R-squared or 90 percent recall would indicate the model does not generalize.

Figures

Figures reproduced from arXiv: 2606.03596 by Afrokk Khan, Bikash Kharel, Emmanuel Fonseca, Lordrick Kahinga, Mason Ng, Mawson W. Simmons, Paul Scholz, Srinjoy Das.

**Figure 1.** Figure 1: Simulated FRB dynamic spectra with corresponding timeseries at the top of respective dynamic spectrum. The left panel shows an FRB without scattering characterized by a symmetric Gaussian pulse profile, while the right panel shows a scattered FRB with asymmetric pulse profile characterized by exponentially decaying tail [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Left: The distribution of τ from CHIME/FRB Catalog 2 ( at reference frequency 400 MHz) exhibiting highly skewed heavy tailed profile. Right: Corresponding log transformed (ln τ ) distribution. The exhibition of Gaussian shape in the right panel verify that the scattering timescales follow a lognormal distribution. 1997) and is also expected to be scaled along with the scattering timescale as standard error… view at source ↗

**Figure 3.** Figure 3: Schematic diagram of the parallel transformer architecture for our regression task. The model processes dynamic spectrum and timeseries representations with two independent parallel transformer branches. The resulting contextual embeddings are then concatenated and passed through a final regression head to predict the output values. discussed before and was free from both overfitting and underfitting (see… view at source ↗

**Figure 4.** Figure 4: Scatter plot between the MT-GMDN predictions and fitburst measured values of scattering timescales at a reference frequency (νref ) of 400 MHz. The plot consists of events that have resolved scattering in CHIME/FRB Catalog 2 . to be 94% for the model trained with timeseries computed by spectral averaging discussed in the Section A.2 and only the model performance is discussed in this section. The performan… view at source ↗

**Figure 5.** Figure 5: Test set prediction performance with x-axis denoting discrete FRB samples and y-axis scattering timescale at a reference frequency of 400 MHz. fitburst measured values are shown as blue dots and MT-GMDN point estimates are represented by black dashed lines while the blue shaded region representing 95% confidence interval. Orange squares denote the FRB samples which are out of the 95% confidence interval [… view at source ↗

**Figure 6.** Figure 6: Receiver operating characteristic curve for scattering detection by MT-GMDN at different decision threshold values. fitburst from which our training target labels were derived and numerical methods implemented in the simpulse backend. We accounted this systematic offset by a linear calibration of the model’s point estimate and upper and lower bound of the confidence interval. The linear calibration was obt… view at source ↗

**Figure 7.** Figure 7: Confusion matrix for the scattering event detection at a detection threshold value of p0 = 0.6. Here no-scatter refers to unresolved scattering rather than physical absence of scattering. . where Ndof,2 is the number of degrees of freedom in the complex model with scattering and ∆Nfit is the difference in the number of free parameters between the two models. Using the F-statistic, p-value is obtained which… view at source ↗

**Figure 8.** Figure 8: Scattering timescale recovery for synthetic broadband pulses generated by simpulse with intrinsic widths of 1 ms. The vertical error bars denote 95% confidence interval with all of the target values within the interval. The MT-GMDN model achieves R2 score of 92% on the point estimate of the τ values. Measurement uncertainty scales with τ justifying the inherent difficulty in characterizing the morphology o… view at source ↗

**Figure 9.** Figure 9: Comparison of fitburst measured values of τ on synthetic data against MT-GMDN predictions. The synthetic data set consists of broadband pulses with maximum SNR for the event with minimum scattering. Fluence for each of the event is held constant such that the SNR gets lower for the events with higher scattering due to the redistribution of radiation energy across time in exponentially decaying tail. Left: … view at source ↗

**Figure 10.** Figure 10: Dynamic spectrum of a synthetic FRB with modeled by MCMC method. The injected pulse had an intrinsic width of 1 ms and scattering timescale of 2 ms. To evaluate the MT-GMDN performance against MCMC on estimating the scattering timescale, we needed to create complete model of each dynamic spectrum in this evaluation. An example of a model generated by MCMC method is shown in [PITH_FULL_IMAGE:figures/full_… view at source ↗

**Figure 11.** Figure 11: Posterior distribution of some of the model parameters in Equation 10 generated by MCMC method for a simulated FRB shown in the [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison of MCMC vs MT-GMDN performance on the simulated dataset. The size of the markers denote the SNR of corresponding dynamic spectrum and it ranges from 20 to 4 with larger size denoting higher SNR. The error bars for MT-GMDN predictions represent 95% confidence interval and the error bars for MCMC estimates represent 95% credible interval. The pulse width of all the bursts were set to 1 ms. Both t… view at source ↗

**Figure 13.** Figure 13: Relationship between predicted uncertainty parameter (σ) and SNR for two representative cases. Left plot displays the variation of σ with SNR for simulated events with width of 1 ms and τ of 5 ms while the right plot is for simulated events with width of 1 ms and τ of 10 ms. case with τ =5 ms and p = 8.28 × 10−23 for the case with τ =10 ms. This confirms that observed correlations are highly unlikely by c… view at source ↗

**Figure 14.** Figure 14: Illustration of the tokenization scheme for a dynamic spectrum. Each time stamp acts as a distinct token and the frequency intensities vector at that time stamp serves as token embedding referred to as d-dimensional frequency vector. the self attention mechanism to model the long range temporal dependencies along with frequency dependent temporal smearing. 13 https://github.com/kharelb/Scattering-Timescal… view at source ↗

**Figure 15.** Figure 15: Timeseries (pulse profile) created by different dimensionality reduction techniques along the frequency dimension. The dynamic spectrum transformer is similar to the standard transformer encoder (A. Vaswani et al. 2023) consisting of two encoder layers with each encoder layer containing 4 self-attention heads which performs full self-attention. We experimented with higher number of heads (8, 16) but there… view at source ↗

**Figure 16.** Figure 16: Illustration of the projection layer where each time sample is mapped to d-dimensional learnable embedding vector by a simple feed forward neural network. The timeseries transformer has the same number of encoder layers and attention heads as the dynamic spectrum transformer. This transformer thus focuses on capturing pulse asymmetry, tail decay structure, temporal sub-components and burst duration variat… view at source ↗

**Figure 17.** Figure 17: A schematic diagram of regression head. The d-dimensional output from the preceding attention pooling layer is passed through a linear layer to obtain the parameters of the mixture density as outputs. where z˜ ∈ R n×d and W and b are weights and bias associated with the linear transformation. We utilize an attention based pooling mechanism to aggregate the output from the last projection layer to a single… view at source ↗

**Figure 18.** Figure 18: Evaluation of MT-GMDN performance across four different timeseries extraction methods. All of the examples in the [PITH_FULL_IMAGE:figures/full_fig_p027_18.png] view at source ↗

**Figure 19.** Figure 19: Each panel includes two sub-panels with the first sub-panel containing a simulated dynamic spectrum and the corresponding timeseries. While the second sub-panel describes real injected τ value, the MT-GMDN point estimate with 95% confidence interval, and the fitburst fit statistics. The uncertainty provided for the fitburst measurement here corresponds to the 1σ limits. All the τ values are referenced at … view at source ↗

**Figure 20.** Figure 20: Representative FRBs from the CHIME/FRB Catalog 2 where traditional methods failed to extract physical parameters due to strong noise and RFI. Each panel includes two sub-panels with the first sub-panel (left) containing a dynamic spectrum and the corresponding timeseries, while the second sub-panel (right) contains MT-GMDN estimates. The estimates include probability of scattering, point estimate for τ va… view at source ↗

**Figure 21.** Figure 21: Test set prediction performance with x-axis denoting discrete FRB samples and y-axis scattering timescale at a reference frequency of 400 MHz. The mixture density formulation here implements the Gamma distribution instead of the Lognormal distribution. fitburst measured values are shown as blue dots and MT-GMDN point estimates are represented by black dashed lines while the blue shaded region representing… view at source ↗

read the original abstract

The discovery rate of fast radio bursts (FRBs) continues to increase with the advent of new radio facilities and yet extracting their astrophysical parameters such as scattering timescale ($\tau$) remains a significant bottleneck. Current $\tau$ measurement approaches like fitting analytic template models and scattering aware de-convolution are accurate but slow, sensitive to initialization, limited by low signal to noise and often require manual supervision. These limitations inspired us to explore fast, robust and scalable machine learning methods to estimate the astrophysical parameter value. We present a deep learning approach named Multimodal Transformer Based Generic Mixture Density Network (MT-GMDN) which ingests FRB dynamic spectrum and its corresponding timeseries profile through parallel transformer encoders, fuses their latent representations and predicts the distribution of $\tau$ with probabilistic output derived from generic mixture-density formulation. This formulation not only estimates the value of $\tau$ but also captures the (zero inflated) nature of FRB populations where a significant fraction of bursts exhibit unresolvable scattering. We trained MT-GMDN on $\sim3500$ FRBs from CHIME/FRB \cattwo while holding out some fraction of FRBs for validation during training and for testing after the training completes. The model achieves a coefficient of determination ($R^2$) value of $94\%$ on the expected value of $\tau$ for the events with measurable scattering with an excellent recall value of $90\%$ on the test data set. The model was also able to incorporate heteroskedastic errors enabling us the construction of a confidence interval for the predictions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper trains a multimodal transformer plus mixture-density network on CHIME FRBs to predict scattering timescale with zero-inflation handling and reports R²=0.94 plus 90% recall on a held-out split.

read the letter

The core result is that this MT-GMDN architecture ingests dynamic spectra and time series through separate transformers, fuses them, and outputs a mixture density that captures both measurable τ and the large zero-inflated population. On the test portion of the ~3500 CHIME events it reaches R²=0.94 for the expected value on events with measurable scattering and 90% recall, while also producing heteroskedastic uncertainties.

What is actually new is the specific combination of parallel transformer encoders with a generic MDN head tuned to the zero-inflated nature of FRB scattering; prior work on analytic fitting or deconvolution did not use this probabilistic multimodal setup.

The approach is practical for the stated goal of removing a manual bottleneck. The metrics are concrete and the zero-inflation modeling is a reasonable match to the data distribution.

The main soft spot is the evaluation: a single held-out fraction with no reported checks for temporal ordering, repeater blocking, or matching SNR/τ distributions between train and test. Without those, it is hard to know whether the numbers will hold on new observations from the same or different instruments. The abstract also gives no architecture depth, loss details, or hyperparameter search, so reproducibility and sensitivity remain open.

This is a subfield paper aimed at FRB observers who need faster parameter extraction for large catalogs. It deserves a serious referee because the problem is real, the metrics are reported, and the architecture is a plausible next step, even if the generalization claims need more evidence.

Referee Report

2 major / 1 minor

Summary. The paper introduces MT-GMDN, a multimodal transformer architecture with parallel encoders for FRB dynamic spectra and time-series profiles, fused into a generic mixture-density network head that outputs a probabilistic distribution over scattering timescale τ. Trained on ~3500 CHIME/FRB events (with some fraction held out), the model reports R²=0.94 on expected τ for measurable events, 90% recall on the test set, and the ability to model zero-inflated populations plus heteroskedastic errors for confidence intervals.

Significance. If the generalization holds, the approach could accelerate τ estimation for growing FRB samples by replacing slow, initialization-sensitive template fitting with a fast, scalable probabilistic predictor. The mixture-density formulation for zero-inflated data and heteroskedastic uncertainty are positive features that could support downstream statistical analyses of FRB populations.

major comments (2)

[Abstract] Abstract: The headline metrics (R²=0.94, recall=0.90) rest on a single held-out test fraction of the ~3500 events, but the abstract provides no description of the splitting procedure (temporal ordering, source-level blocking for repeaters, or explicit comparison of SNR/τ distributions between train and test). This directly affects whether the test set is representative of the zero-inflated population and future observations.
[Abstract] Abstract: No details are supplied on network depth, loss function, hyperparameter search, or regularization; without these it is impossible to judge reproducibility or whether the reported performance is robust to low-SNR events, which the abstract itself identifies as a limitation of existing methods.

minor comments (1)

[Abstract] Abstract: The token \cattwo is an unrendered LaTeX citation and should be replaced with a properly formatted reference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments on the abstract below and will revise the abstract in the next version to improve clarity on data handling and methodology while preserving its brevity.

read point-by-point responses

Referee: [Abstract] Abstract: The headline metrics (R²=0.94, recall=0.90) rest on a single held-out test fraction of the ~3500 events, but the abstract provides no description of the splitting procedure (temporal ordering, source-level blocking for repeaters, or explicit comparison of SNR/τ distributions between train and test). This directly affects whether the test set is representative of the zero-inflated population and future observations.

Authors: We agree the abstract should briefly indicate the splitting approach to support assessment of test-set representativeness. The manuscript details a random per-burst hold-out (with source-level blocking for repeaters and distribution matching on SNR and τ) in the data section; we will add a concise clause to the abstract summarizing this procedure and directing readers to the methods for full specifics. revision: yes
Referee: [Abstract] Abstract: No details are supplied on network depth, loss function, hyperparameter search, or regularization; without these it is impossible to judge reproducibility or whether the reported performance is robust to low-SNR events, which the abstract itself identifies as a limitation of existing methods.

Authors: We concur that the abstract would benefit from a short reference to core training choices to aid reproducibility judgments. These elements (transformer depth, mixture-density negative log-likelihood loss, hyperparameter tuning, and regularization) are fully specified in the methods; we will insert a brief parenthetical note in the abstract and retain the existing pointer to the detailed description. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML performance on external telescope data

full rationale

The paper trains a multimodal transformer + mixture-density model on ~3500 CHIME/FRB events and reports R²=94% and 90% recall on a held-out test fraction. These metrics are direct empirical outcomes from external observational data; no derivation chain, equation, or first-principles result reduces by construction to fitted parameters, self-citations, or renamed inputs. No self-definitional steps, uniqueness theorems, or ansatzes smuggled via author citations appear in the reported claims. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Performance numbers rest on the assumption that the ~3500 CHIME/FRB events are representative and that the transformer encoders extract scattering-relevant features without domain-specific physics constraints.

axioms (1)

domain assumption The CHIME/FRB catalog events used for training and testing are representative of the broader FRB population and future observations.
Training and held-out testing on this single catalog implicitly assumes generalization beyond the observed sample.

pith-pipeline@v0.9.1-grok · 5845 in / 1353 out tokens · 28166 ms · 2026-06-28T09:14:07.128167+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 23 canonical work pages · 1 internal anchor

[1]

R., & Garver-Daniels, N

Lorimer, D. R., & Garver-Daniels, N. 2020, Monthly Notices of the Royal Astronomical Society, 497, 1661, doi: 10.1093/mnras/staa1856

work page doi:10.1093/mnras/staa1856 2020
[2]

Bhat, N. D. R., Cordes, J. M., & Chatterjee, S. 2003, ApJ, 584, 782, doi: 10.1086/345775

work page doi:10.1086/345775 2003
[3]

M., & Bishop, H

Bishop, C. M., & Bishop, H. 2024, Deep Learning: Foundations and Concepts (Springer), doi: 10.1007/978-3-031-45468-4 CHIME/FRB Collaboration, Amiri, M., Bandura, K., et al. 2018, ApJ, 863, 48, doi: 10.3847/1538-4357/aad188 CHIME/FRB Collaboration, Amiri, M., Bandura, K., et al. 2019, Nature, 566, 230, doi: 10.1038/s41586-018-0867-7 Chime/FRB Collaboration...

work page doi:10.1007/978-3-031-45468-4 2024
[4]

2018, The Astronomical Journal, 156, 256, doi: 10.3847/1538-3881/aae649

Connor, L., & van Leeuwen, J. 2018, The Astronomical Journal, 156, 256, doi: 10.3847/1538-3881/aae649

work page doi:10.3847/1538-3881/aae649 2018
[5]

M., & McLaughlin, M

Cordes, J. M., & McLaughlin, M. A. 2003, ApJ, 596, 1142, doi: 10.1086/378231

work page doi:10.1086/378231 2003
[6]

M., Ocker, S

Cordes, J. M., Ocker, S. K., & Chatterjee, S. 2022, Astrophys. J., 931, 88, doi: 10.3847/1538-4357/ac6873

work page doi:10.3847/1538-4357/ac6873 2022
[7]

2021, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, https://arxiv.org/abs/2010.11929

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. 2021, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, https://arxiv.org/abs/2010.11929

Pith/arXiv arXiv 2021
[8]

C., et al

Fonseca, E., Pleunis, Z., Andersen, B. C., et al. 2024, The Astrophysical Journal Supplement Series, 272, 7, doi: 10.3847/1538-4365/ad27d6

work page doi:10.3847/1538-4365/ad27d6 2024
[9]

Harris and K

Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357, doi: 10.1038/s41586-020-2649-2

work page doi:10.1038/s41586-020-2649-2 2020
[10]

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Hoffman, M. D., & Gelman, A. 2011, arXiv e-prints, arXiv:1111.4246, doi: 10.48550/arXiv.1111.4246

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1111.4246 2011
[11]

2013, Applied Logistic Regression: Third Edition (wiley), doi: 10.1002/9781118548387

Hosmer, D., Lemeshow, S., & Sturdivant, R. 2013, Applied Logistic Regression: Third Edition (wiley), doi: 10.1002/9781118548387

work page doi:10.1002/9781118548387 2013
[12]

Hunter, J. D. 2007, Computing in Science & Engineering, 9, 90, doi: 10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007
[13]

MeerKAT Science: On the Pathway to the SKA , year = 2016, month = jan, eid =

Jonas, J., & MeerKAT Team. 2016, in MeerKAT Science: On the Pathway to the SKA, 1, doi: 10.22323/1.277.0001

work page doi:10.22323/1.277.0001 2016
[14]

2025, Repeating vs

Kharel, B., Fonseca, E., Brar, C., et al. 2025, Repeating vs. Non-Repeating FRBs: A Deep Learning Approach To Morphological Characterization, https://arxiv.org/abs/2509.06208

arXiv 2025
[15]

2015, Nature, 521, 436

LeCun, Y., Bengio, Y., & Hinton, G. 2015, Nature, 521, 436

2015
[16]

R., Bailes, M., McLaughlin, M

Narkevic, D. J., & Crawford, F. 2007,Science, 318, 777, doi: 10.1126/science.1147532

work page doi:10.1126/science.1147532 2007
[17]

R., & Kramer, M

Lorimer, D. R., & Kramer, M. 2004, Handbook of Pulsar Astronomy, Vol. 4 (Cambridge, UK ; New York: Cambridge University Press)

2004
[18]

Macquart, J.-P., Bailes, M., Bhat, N. D. R., et al. 2010, PASA, 27, 272, doi: 10.1071/AS09082

work page doi:10.1071/as09082 2010
[19]

McKinnon, M. M. 2014, Publications of the Astronomical Society of the Pacific, 126, 476, doi: 10.1086/676975

work page doi:10.1086/676975 2014
[20]

K., Cordes, J

Ocker, S. K., Cordes, J. M., & Chatterjee, S. 2021, ApJ, 911, 102, doi: 10.3847/1538-4357/abeb6e

work page doi:10.3847/1538-4357/abeb6e 2021
[21]

K., Cordes, J

Ocker, S. K., Cordes, J. M., Chatterjee, S., et al. 2023, MNRAS, 519, 821, doi: 10.1093/mnras/stac3547 pandas development team, T. 2020, pandas-dev/pandas: Pandas, latest Zenodo, doi: 10.5281/zenodo.3509134

work page doi:10.1093/mnras/stac3547 2023
[22]

2026, ApJL, 1000, L53, doi: 10.3847/2041-8213/ae52f8

Pandhi, A., Nimmo, K., Andrew, S., et al. 2026, ApJL, 1000, L53, doi: 10.3847/2041-8213/ae52f8

work page doi:10.3847/2041-8213/ae52f8 2026
[23]

2019, in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc.), 8024–8035

Paszke, A., Gross, S., Massa, F., et al. 2019, in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc.), 8024–8035

2019
[24]

2023, in American Astronomical Society Meeting

Sherman, M., & DSA-110 Collaboration. 2023, in American Astronomical Society Meeting

2023
[25]

2025, Astrophys

Shin, K., Leung, C., Simha, S., et al. 2025, Astrophys. J., 993, 208 32B. Kharel et al. Shivraj Patil, S., Main, R. A., Fonseca, E., et al. 2025, arXiv e-prints, arXiv:2509.06721, doi: 10.48550/arXiv.2509.06721

work page doi:10.48550/arxiv.2509.06721 2025
[26]

2005, simpulse: C++/python library for simulating FRBs and pulsars,, https://github.com/kmsmith137/simpulse

Smith, K. 2005, simpulse: C++/python library for simulating FRBs and pulsars,, https://github.com/kmsmith137/simpulse

2005
[27]

1997, Introduction to Error Analysis, the Study of Uncertainties in Physical Measurements, 2nd Edition TorchVision maintainers and contributors

Taylor, J. 1997, Introduction to Error Analysis, the Study of Uncertainties in Physical Measurements, 2nd Edition TorchVision maintainers and contributors. 2016, TorchVision: PyTorch’s Computer Vision library,, https://github.com/pytorch/vision GitHub

1997
[28]

2023, Attention Is All You Need, https://arxiv.org/abs/1706.03762

Vaswani, A., Shazeer, N., Parmar, N., et al. 2023, Attention Is All You Need, https://arxiv.org/abs/1706.03762

Pith/arXiv arXiv 2023
[29]

Waskom, M. L. 2021, Journal of Open Source Software, 6, 3021, doi: 10.21105/joss.03021

work page doi:10.21105/joss.03021 2021
[30]

Williamson, I. P. 1974, MNRAS, 166, 499, doi: 10.1093/mnras/166.3.499

work page doi:10.1093/mnras/166.3.499 1974
[31]

2016, ApJ, 832, 199, doi: 10.3847/0004-637X/832/2/199

Xu, S., & Zhang, B. 2016, ApJ, 832, 199, doi: 10.3847/0004-637X/832/2/199

work page doi:10.3847/0004-637x/832/2/199 2016
[32]

2025, A&A, 693, A85, doi: 10.1051/0004-6361/202450823

Yang, Tsung-Ching, Hashimoto, Tetsuya, Hsu, Tzu-Yin, et al. 2025, A&A, 693, A85, doi: 10.1051/0004-6361/202450823

work page doi:10.1051/0004-6361/202450823 2025

[1] [1]

R., & Garver-Daniels, N

Lorimer, D. R., & Garver-Daniels, N. 2020, Monthly Notices of the Royal Astronomical Society, 497, 1661, doi: 10.1093/mnras/staa1856

work page doi:10.1093/mnras/staa1856 2020

[2] [2]

Bhat, N. D. R., Cordes, J. M., & Chatterjee, S. 2003, ApJ, 584, 782, doi: 10.1086/345775

work page doi:10.1086/345775 2003

[3] [3]

M., & Bishop, H

Bishop, C. M., & Bishop, H. 2024, Deep Learning: Foundations and Concepts (Springer), doi: 10.1007/978-3-031-45468-4 CHIME/FRB Collaboration, Amiri, M., Bandura, K., et al. 2018, ApJ, 863, 48, doi: 10.3847/1538-4357/aad188 CHIME/FRB Collaboration, Amiri, M., Bandura, K., et al. 2019, Nature, 566, 230, doi: 10.1038/s41586-018-0867-7 Chime/FRB Collaboration...

work page doi:10.1007/978-3-031-45468-4 2024

[4] [4]

2018, The Astronomical Journal, 156, 256, doi: 10.3847/1538-3881/aae649

Connor, L., & van Leeuwen, J. 2018, The Astronomical Journal, 156, 256, doi: 10.3847/1538-3881/aae649

work page doi:10.3847/1538-3881/aae649 2018

[5] [5]

M., & McLaughlin, M

Cordes, J. M., & McLaughlin, M. A. 2003, ApJ, 596, 1142, doi: 10.1086/378231

work page doi:10.1086/378231 2003

[6] [6]

M., Ocker, S

Cordes, J. M., Ocker, S. K., & Chatterjee, S. 2022, Astrophys. J., 931, 88, doi: 10.3847/1538-4357/ac6873

work page doi:10.3847/1538-4357/ac6873 2022

[7] [7]

2021, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, https://arxiv.org/abs/2010.11929

Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. 2021, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, https://arxiv.org/abs/2010.11929

Pith/arXiv arXiv 2021

[8] [8]

C., et al

Fonseca, E., Pleunis, Z., Andersen, B. C., et al. 2024, The Astrophysical Journal Supplement Series, 272, 7, doi: 10.3847/1538-4365/ad27d6

work page doi:10.3847/1538-4365/ad27d6 2024

[9] [9]

Harris and K

Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357, doi: 10.1038/s41586-020-2649-2

work page doi:10.1038/s41586-020-2649-2 2020

[10] [10]

The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo

Hoffman, M. D., & Gelman, A. 2011, arXiv e-prints, arXiv:1111.4246, doi: 10.48550/arXiv.1111.4246

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1111.4246 2011

[11] [11]

2013, Applied Logistic Regression: Third Edition (wiley), doi: 10.1002/9781118548387

Hosmer, D., Lemeshow, S., & Sturdivant, R. 2013, Applied Logistic Regression: Third Edition (wiley), doi: 10.1002/9781118548387

work page doi:10.1002/9781118548387 2013

[12] [12]

Hunter, J. D. 2007, Computing in Science & Engineering, 9, 90, doi: 10.1109/MCSE.2007.55

work page doi:10.1109/mcse.2007.55 2007

[13] [13]

MeerKAT Science: On the Pathway to the SKA , year = 2016, month = jan, eid =

Jonas, J., & MeerKAT Team. 2016, in MeerKAT Science: On the Pathway to the SKA, 1, doi: 10.22323/1.277.0001

work page doi:10.22323/1.277.0001 2016

[14] [14]

2025, Repeating vs

Kharel, B., Fonseca, E., Brar, C., et al. 2025, Repeating vs. Non-Repeating FRBs: A Deep Learning Approach To Morphological Characterization, https://arxiv.org/abs/2509.06208

arXiv 2025

[15] [15]

2015, Nature, 521, 436

LeCun, Y., Bengio, Y., & Hinton, G. 2015, Nature, 521, 436

2015

[16] [16]

R., Bailes, M., McLaughlin, M

Narkevic, D. J., & Crawford, F. 2007,Science, 318, 777, doi: 10.1126/science.1147532

work page doi:10.1126/science.1147532 2007

[17] [17]

R., & Kramer, M

Lorimer, D. R., & Kramer, M. 2004, Handbook of Pulsar Astronomy, Vol. 4 (Cambridge, UK ; New York: Cambridge University Press)

2004

[18] [18]

Macquart, J.-P., Bailes, M., Bhat, N. D. R., et al. 2010, PASA, 27, 272, doi: 10.1071/AS09082

work page doi:10.1071/as09082 2010

[19] [19]

McKinnon, M. M. 2014, Publications of the Astronomical Society of the Pacific, 126, 476, doi: 10.1086/676975

work page doi:10.1086/676975 2014

[20] [20]

K., Cordes, J

Ocker, S. K., Cordes, J. M., & Chatterjee, S. 2021, ApJ, 911, 102, doi: 10.3847/1538-4357/abeb6e

work page doi:10.3847/1538-4357/abeb6e 2021

[21] [21]

K., Cordes, J

Ocker, S. K., Cordes, J. M., Chatterjee, S., et al. 2023, MNRAS, 519, 821, doi: 10.1093/mnras/stac3547 pandas development team, T. 2020, pandas-dev/pandas: Pandas, latest Zenodo, doi: 10.5281/zenodo.3509134

work page doi:10.1093/mnras/stac3547 2023

[22] [22]

2026, ApJL, 1000, L53, doi: 10.3847/2041-8213/ae52f8

Pandhi, A., Nimmo, K., Andrew, S., et al. 2026, ApJL, 1000, L53, doi: 10.3847/2041-8213/ae52f8

work page doi:10.3847/2041-8213/ae52f8 2026

[23] [23]

2019, in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc.), 8024–8035

Paszke, A., Gross, S., Massa, F., et al. 2019, in Advances in Neural Information Processing Systems 32 (Curran Associates, Inc.), 8024–8035

2019

[24] [24]

2023, in American Astronomical Society Meeting

Sherman, M., & DSA-110 Collaboration. 2023, in American Astronomical Society Meeting

2023

[25] [25]

2025, Astrophys

Shin, K., Leung, C., Simha, S., et al. 2025, Astrophys. J., 993, 208 32B. Kharel et al. Shivraj Patil, S., Main, R. A., Fonseca, E., et al. 2025, arXiv e-prints, arXiv:2509.06721, doi: 10.48550/arXiv.2509.06721

work page doi:10.48550/arxiv.2509.06721 2025

[26] [26]

2005, simpulse: C++/python library for simulating FRBs and pulsars,, https://github.com/kmsmith137/simpulse

Smith, K. 2005, simpulse: C++/python library for simulating FRBs and pulsars,, https://github.com/kmsmith137/simpulse

2005

[27] [27]

1997, Introduction to Error Analysis, the Study of Uncertainties in Physical Measurements, 2nd Edition TorchVision maintainers and contributors

Taylor, J. 1997, Introduction to Error Analysis, the Study of Uncertainties in Physical Measurements, 2nd Edition TorchVision maintainers and contributors. 2016, TorchVision: PyTorch’s Computer Vision library,, https://github.com/pytorch/vision GitHub

1997

[28] [28]

2023, Attention Is All You Need, https://arxiv.org/abs/1706.03762

Vaswani, A., Shazeer, N., Parmar, N., et al. 2023, Attention Is All You Need, https://arxiv.org/abs/1706.03762

Pith/arXiv arXiv 2023

[29] [29]

Waskom, M. L. 2021, Journal of Open Source Software, 6, 3021, doi: 10.21105/joss.03021

work page doi:10.21105/joss.03021 2021

[30] [30]

Williamson, I. P. 1974, MNRAS, 166, 499, doi: 10.1093/mnras/166.3.499

work page doi:10.1093/mnras/166.3.499 1974

[31] [31]

2016, ApJ, 832, 199, doi: 10.3847/0004-637X/832/2/199

Xu, S., & Zhang, B. 2016, ApJ, 832, 199, doi: 10.3847/0004-637X/832/2/199

work page doi:10.3847/0004-637x/832/2/199 2016

[32] [32]

2025, A&A, 693, A85, doi: 10.1051/0004-6361/202450823

Yang, Tsung-Ching, Hashimoto, Tetsuya, Hsu, Tzu-Yin, et al. 2025, A&A, 693, A85, doi: 10.1051/0004-6361/202450823

work page doi:10.1051/0004-6361/202450823 2025