Detecting Gravitational-Wave Anisotropies with Simulation-Based Inference
Pith reviewed 2026-06-30 14:32 UTC · model grok-4.3
The pith
A neural network classifier trained on simulations replaces the Gaussian assumption in frequentist searches and raises 3σ detection rates for gravitational-wave anisotropies by up to 200%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a Simulation-Based Inference framework that replaces the analytic Gaussian likelihood used in frequentist searches with a neural network classifier trained on synthetic data. This approach captures the non-Gaussian structure of the data and significantly improves performance. Specifically, the probability of 3σ detection increases by approximately 90% for single-hotspot scenarios and by 200% for double-hotspot scenarios compared to standard frequentist methods.
What carries the argument
Neural network classifier trained on synthetic PTA timing-residual datasets to discriminate isotropic Hellings-Downs correlations from anisotropic deviations, used in place of an analytic Gaussian likelihood.
If this is right
- Anisotropy searches no longer need to rely on the Gaussian approximation for correlation estimators.
- Detection power for both single- and double-hotspot configurations rises measurably at fixed significance threshold.
- The method supplies a computationally lighter alternative to full Bayesian modeling while retaining the ability to capture non-Gaussian features.
- The same simulation-trained classifier can be retrained for other anisotropy morphologies without changing the underlying search pipeline.
Where Pith is reading between the lines
- The framework could be extended to include joint constraints from multiple PTA datasets or from future space-based interferometers by simply augmenting the training simulations.
- If the classifier outputs are calibrated as proper probabilities, they could be combined with existing Bayesian pipelines to produce hybrid posterior distributions.
- A mismatch between simulated and real noise would appear as a drop in classification accuracy on hold-out real-data-like injections, providing a built-in diagnostic for model misspecification.
Load-bearing premise
The neural network classifier trained on synthetic data will generalize to real PTA observations without substantial mismatch between simulated and observed noise properties or correlation statistics.
What would settle it
Apply the trained classifier to an independent set of simulations that include the actual measured noise spectra and red-noise properties of current PTA pulsars; if the resulting 3σ detection fractions fall to or below those of the standard frequentist pipeline, the claimed sensitivity gain is falsified.
Figures
read the original abstract
Over the last five years, multiple Pulsar Timing Array (PTA) collaborations have reported mounting evidence for a gravitational-wave background (GWB) at nanohertz frequencies. Measuring anisotropies in the sky distribution of the GWB power is one of the most promising ways to identify and characterize its source. These anisotropies are expected to manifest as deviations from the Hellings-Downs (HD) correlations between the timing residuals of different pulsars. Current search strategies include Bayesian methods, which model anisotropies in the timing residuals likelihood, and faster frequentist approaches, which construct correlation estimators from timing residuals and use these to test the isotropic assumption. However, frequentist methods rely on the assumption that correlation estimators are Gaussian-distributed, an assumption that is not justified and that -- as we will show -- severely limits detection sensitivity. In this work, we present a Simulation-Based Inference (SBI) framework that replaces the analytic Gaussian likelihood used in frequentist searches with a neural network classifier trained on synthetic data. This approach captures the non-Gaussian structure of the data and significantly improves performance. Specifically, we find that the probability of $3\sigma$ detection increases by approximately 90% for single-hotspot scenarios and by 200% for double-hotspot scenarios compared to standard frequentist methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a simulation-based inference (SBI) framework that trains a neural network classifier on synthetic PTA timing-residual realizations to detect anisotropies in the nanohertz gravitational-wave background. By replacing the Gaussian likelihood assumption of standard frequentist correlation estimators with a learned classifier, the method is reported to increase the probability of a 3σ detection by ~90% in single-hotspot scenarios and ~200% in double-hotspot scenarios relative to the Gaussian baseline.
Significance. If the performance gains are shown to be robust under realistic validation and to generalize beyond the training distribution, the SBI classifier would offer a concrete sensitivity improvement for anisotropy searches that is directly usable with existing PTA datasets.
major comments (2)
- [Abstract] Abstract: the 90% and 200% improvements in 3σ detection probability are stated without any information on network architecture, training procedure, held-out validation performance, or calibration of the classifier output, preventing assessment of whether the gains are statistically stable or merely an artifact of the synthetic test set.
- The central performance claim is measured exclusively on synthetic realizations whose noise and correlation statistics match the training distribution exactly. No test is reported that injects additional unmodeled effects (chromatic noise, DM variations, timing-model errors, non-stationary red noise) and verifies that the decision boundary remains superior to the Gaussian frequentist estimator under such distribution shift.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the scope and presentation of our results. We respond to each major comment below and indicate the corresponding revisions to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the 90% and 200% improvements in 3σ detection probability are stated without any information on network architecture, training procedure, held-out validation performance, or calibration of the classifier output, preventing assessment of whether the gains are statistically stable or merely an artifact of the synthetic test set.
Authors: We agree that the abstract is concise and omits key methodological details. While the network architecture (a convolutional classifier), training procedure (SBI with 10^5 synthetic realizations), held-out validation (5% test set with reported AUC), and output calibration (via isotonic regression) are fully described in Sections 3.2–3.4 and validated in Section 4, these elements should be referenced in the abstract for immediate assessment. We have revised the abstract to include a single sentence summarizing the classifier training and validation protocol. revision: yes
-
Referee: The central performance claim is measured exclusively on synthetic realizations whose noise and correlation statistics match the training distribution exactly. No test is reported that injects additional unmodeled effects (chromatic noise, DM variations, timing-model errors, non-stationary red noise) and verifies that the decision boundary remains superior to the Gaussian frequentist estimator under such distribution shift.
Authors: The reported gains are intentionally evaluated on data drawn from the identical distribution used for training, which isolates the benefit arising from the classifier’s ability to exploit non-Gaussian structure in the timing residuals. We acknowledge that robustness under realistic distribution shifts is essential for deployment on actual PTA data. The manuscript does not contain such tests; we have therefore added an explicit limitations paragraph in Section 5 stating the matched-distribution scope of the present study and identifying robustness to unmodeled effects as an important direction for follow-up work. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper trains a neural-network classifier on independent synthetic PTA realizations and reports empirical detection-probability gains versus frequentist methods on held-out test realizations drawn from the identical distribution. This is a standard simulation benchmark, not a self-referential fit, self-definition, or load-bearing self-citation. No equations or claims reduce by construction to their inputs; the central performance numbers are direct measurements against an external baseline on the same simulated data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In do- ing so, we adopt the sky positions, observation times, and measured noise properties from the most recent data re- leases from each of these collaborations [9, 35–38]
Timing residuals generation We start by generating mock timing residuals for each of the 120 pulsars observed by regional PTAs comprising the International Pulsar Timing Arrays (IPTA). In do- ing so, we adopt the sky positions, observation times, and measured noise properties from the most recent data re- leases from each of these collaborations [9, 35–38...
-
[2]
noise- marginalized optimal statistic
Noise run The weightsw ab,k entering the definition of the cross- correlation estimators in Eq. (9) depend on the noise and GWB parameters. In real PTA analyses, posterior distri- butions for these quantities would be obtained through single-pulsar runs and a preliminary Bayesian analysis modeling the GWB as common uncorrelated red noise (CURN). To margin...
-
[3]
viqGWyreD8DAuVxd9b0BpzK9ce8=
Cross-correlation estimation Finally, for each set of mock timing residuals, we use the software packageDEFIANTto construct the cross- correlation estimators given in Eq. (9), where the weights, wab,k, are given by [44, 45]: wab,k = P −1 a · ˜Sab,k ·P −1 b tr h P −1a · ˜Sab,k ·P −1 b · ˜S′ ab,k i .(24) The cross-covariance matrix is defined as ˜S′ ab,k ≡ ...
-
[4]
To each node of the initial graph, we associate a node attribute,h (0) a , given by the pulsar position in the sky,h (0) a = ˆpa
Input graph : Each observation is represented as a fully-connected graph with one node for each pul- sar in the array. To each node of the initial graph, we associate a node attribute,h (0) a , given by the pulsar position in the sky,h (0) a = ˆpa. Similarly, for each edge, we associate an edge attribute,e (0) ab , given by the whitened cross-correlation ...
-
[5]
Both the edge and node attributes are standardized to zero mean and unit variance across the entire training data set
= 14,280 directed edges, where the factor of 2 accounts for representing each undirected edge as two directed edges with opposite orientations. Both the edge and node attributes are standardized to zero mean and unit variance across the entire training data set
-
[6]
Edge and node encoders : A linear projection maps 10 each one-dimensional node feature of the input graph into a higher-dimensional latent space with dimensiond h = 16. Edge features are also pro- jected into and h-dimensional latent space us- ing a two-layer Multilayer Perceptron (MLP): the first layer maps the one-dimensional edge feature to dimensiond ...
-
[7]
Convolutional layers: Two GINE convolutional layers [49] update node features by aggregating neighboring node and edge features and using the following update rule: h(n+1) a = MLP h (1 +ε)h (n) a + X b∈N(a) ReLU h(n) b +e ab . (30) whereϵis a (learnable) scalar parameter that controls the relative weighting of the node’s own embedding versus the aggre...
-
[8]
Quantum Universe
Readout: After the convolutional layers, the fi- nal node embeddings from allN p pulsars are con- catenated into a single vector of dimensionN p × dh = 7,680 and passed through a three-layer MLP readout: Linear(7,680→1,024)→GELU→ Dropout(0.2)→Linear(1,024→256)→GELU→ Dropout(0.2)→Linear(256→1), producing a sin- gle logit (unnormalized log-probability) for ...
2000
-
[9]
G. B. Hobbs, R. T. Edwards, and R. N. Manchester, “tempo2, a new pulsar-timing package - I. An overview: tempo2, a new pulsar-timing package - I. Overview,” Monthly Notices of the Royal Astronomical Society369 no. 2, (2006) 655–672.http://mnras.oxfordjournals. org/cgi/doi/10.1111/j.1365-2966.2006.10302.x
-
[10]
D. R. Lorimer and M. Kramer,Handbook of Pulsar Astronomy, vol. 4. 2004
2004
-
[11]
Pulsar timing and relativistic gravity,
J. H. Taylor, “Pulsar timing and relativistic gravity,” Class. Quant. Grav.10(1993) S167–S174
1993
-
[12]
The NANOGrav 15-year Data Set: Evidence for a Gravitational-Wave Background
R. W. Hellings and G. S. Downs, “Upper Limits on the Isotropic Gravitational Radiation Background from Pulsar Timing Analysis,”Astrophys. J. Lett.265(1983) L39–L42. [5]NANOGravCollaboration, G. Agazieet al., “The NANOGrav 15-year Data Set: Evidence for a Gravitational-Wave Background,”Astrophys. J. Lett. 951no. 1, (June, 2023) L8,arXiv:2306.16213 [astro-p...
work page internal anchor Pith review Pith/arXiv arXiv 1983
-
[13]
Searching For Anisotropic Gravitational-wave Backgrounds Using Pulsar Timing Arrays
S. R. Taylor and J. R. Gair, “Searching For Anisotropic Gravitational-wave Backgrounds Using Pulsar Timing Arrays,”Physical Review D88no. 8, (Oct., 2013) 084001,arxiv:1306.5395 [gr-qc]
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[14]
E. C. Gardiner, L. Z. Kelley, A.-M. Lemke, and A. Mitridate, “Beyond the Background: Gravitational-wave Anisotropy and Continuous Waves from Supermassive Black Hole Binaries,”Astrophys. J. 965no. 2, (2024) 164,arXiv:2309.07227 [astro-ph.HE]
-
[15]
Detecting gravitational wave anisotropies from supermassive black hole binaries,
A.-M. Lemke, A. Mitridate, and K. A. Gersbach, “Detecting gravitational wave anisotropies from supermassive black hole binaries,”Phys. Rev. D111 no. 6, (2025) 063068,arXiv:2407.08705 [astro-ph.HE]
-
[16]
N. Pol, S. R. Taylor, and J. D. Romano, “Forecasting pulsar timing array sensitivity to anisotropy in the stochastic gravitational wave background,”The Astrophysical Journal940no. 2, (Dec., 2022) 173, arxiv:2206.09936 [astro-ph.HE]
-
[17]
Pulsar timing array sensitivity to anisotropies in the gravitational wave background,
P. F. Depta, V. Domcke, G. Franciolini, and M. Pieroni, “Pulsar timing array sensitivity to anisotropies in the gravitational wave background,”Phys. Rev. D111 no. 8, (2025) 083039,arXiv:2407.14460 [astro-ph.CO]
-
[18]
Cosmological Backgrounds of Gravitational Waves
C. Caprini and D. G. Figueroa, “Cosmological Backgrounds of Gravitational Waves,”Classical and Quantum Gravity35no. 16, (July, 2018) 163001. http://arxiv.org/abs/1801.04268
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[19]
Probing Anisotropies of the Stochastic Gravitational Wave Background with LISA,
N. Bartolo, D. Bertacca,et al., “Probing Anisotropies of the Stochastic Gravitational Wave Background with LISA,”JCAP11no. 11, (Nov., 2022) 009. http://arxiv.org/abs/2201.08782
-
[20]
Characterising gravitational wave stochastic background anisotropy with Pulsar Timing Arrays
C. M. F. Mingarelli, T. Sidery, I. Mandel, and A. Vecchio, “Characterising gravitational wave stochastic background anisotropy with Pulsar Timing Arrays,”Physical Review D88no. 6, (Sept., 2013) 062005.http://arxiv.org/abs/1306.5394
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[21]
Limits on anisotropy in the nanohertz stochastic gravitational-wave background
S. R. Tayloret al., “Limits on anisotropy in the nanohertz stochastic gravitational-wave background,” Physical Review Letters115no. 4, (July, 2015) 041101, arxiv:1506.08817 [astro-ph.HE]. [22]NANOGravCollaboration, G. Agazieet al., “The NANOGrav 15 yr Data Set: Search for Anisotropy in the Gravitational-wave Background,”Astrophys. J. Lett.956no. 1, (2023)...
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[22]
Prospects and limitations of PTAs anisotropy searches — the frequentist case,
T. Konstandin, A.-M. Lemke, A. Mitridate, and E. Perboni, “Prospects and limitations of PTAs anisotropy searches — the frequentist case,”JCAP02 (2026) 084,arXiv:2509.07074 [astro-ph.CO]
-
[23]
J. S. Hazboun, P. M. Meyers, J. D. Romano, X. Siemens, and A. M. Archibald, “Analytic distribution of the optimal cross-correlation statistic for stochastic gravitational-wave-background searches using pulsar timing arrays,”Phys. Rev. D108no. 10, (2023) 104050,arXiv:2305.01116 [gr-qc]
-
[24]
nanograv/discovery
M. Vallisneri, P. M. Meyers, D. Wright, A. D. Johnson, J. G. Baier, and R. van Haasteren, “nanograv/discovery.” Nov., 2025. https://github.com/nanograv/discovery
2025
-
[25]
Discovery: the next-generation pulsar-timing-array data-analysis package,
M. Vallisneriet al., “Discovery: the next-generation pulsar-timing-array data-analysis package,”in preparation
-
[26]
On the Problem of the Most Efficient Tests of Statistical Hypotheses,
J. Neyman and E. S. Pearson, “On the Problem of the Most Efficient Tests of Statistical Hypotheses,”Phil. Trans. Roy. Soc. Lond. A231no. 694-706, (1933) 289–337
1933
-
[27]
K. Cranmer, J. Brehmer, and G. Louppe, “The frontier of simulation-based inference,”Proceedings of the National Academy of Sciences117no. 48, (Dec., 2020) 30055–30062.http://arxiv.org/abs/1911.01429. arXiv:1911.01429 [stat]
-
[28]
Fast Parameter Inference on Pulsar Timing Arrays with Normalizing Flows,
D. Shih, M. Freytsis, S. R. Taylor, J. A. Dror, and N. Smyth, “Fast Parameter Inference on Pulsar Timing Arrays with Normalizing Flows,”Phys. Rev. Lett.133 no. 1, (2024) 011402,arXiv:2310.12209 [astro-ph.IM]
-
[29]
M. Vallisneri, M. Crisostomi, A. D. Johnson, and P. M. Meyers, “Rapid Parameter Estimation for Pulsar-Timing-Array Datasets with Variational Inference and Normalizing Flows,”Phys. Rev. Lett.135 no. 7, (2025) 071401,arXiv:2405.08857 [gr-qc]
-
[30]
J. Lai and C. Li, “Accelerated Bayesian inference for pulsar timing arrays: Normalizing flows for rapid model comparison across stochastic gravitational-wave background sources,”Phys. Rev. D112no. 2, (2025) 023533,arXiv:2504.04211 [astro-ph.CO]
-
[31]
Deep Neural Emulation of the Supermassive Black Hole Binary Population,
N. Laalet al., “Deep Neural Emulation of the Supermassive Black Hole Binary Population,” Astrophys. J.982no. 1, (2025) 55,arXiv:2411.10519 [astro-ph.IM]
-
[32]
Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison,
N. Jeffrey and B. D. Wandelt, “Evidence Networks: simple losses for fast, amortized, neural Bayesian model comparison,”Mach. Learn. Sci. Tech.5no. 1, (2024) 015008,arXiv:2305.11241 [cs.LG]
-
[33]
N. Anau Montel, J. Alvey, and C. Weniger, “Tests for model misspecification in simulation-based inference: From local distortions to global model checks,”Phys. Rev. D111no. 8, (2025) 083013,arXiv:2412.15100 [astro-ph.IM]. [35]EPTA, InPTACollaboration, J. Antoniadiset al., “The second data release from the European Pulsar Timing Array - II. Customised puls...
-
[34]
J. Ellis and R. van Haasteren, “jellis18/pal2: Pal2.” Jan., 2017.https://doi.org/10.5281/zenodo.251456. [40]NANOGravCollaboration, G. Agazie, A. Anumarlapudi,et al., “The NANOGrav 15-Year Data Set: Detector Characterization and Noise Budget,”The Astrophysical Journal Letters951no. 1, (June, 2023) L10.http://arxiv.org/abs/2306.16218
-
[35]
Fast Bayesian analysis of individual binaries in pulsar timing array data,
B. B´ ecsy, N. J. Cornish, and M. C. Digman, “Fast Bayesian analysis of individual binaries in pulsar timing array data,”Phys. Rev. D105no. 12, (2022) 122003, arXiv:2204.07160 [gr-qc]
-
[36]
pta replicator,
B. B´ ecsy, J. Hazboun, and A. Johnson, “pta replicator,” GitHub repository, 2025. https://github.com/bencebecsy/pta_replicator. Available at https://github.com/bencebecsy/pta_replicator
2025
-
[37]
S. J. Vigeland, K. Islo, S. R. Taylor, and J. A. Ellis, “Noise-marginalized optimal statistic: A robust hybrid frequentist-Bayesian statistic for the stochastic gravitational-wave background in pulsar timing arrays,” Phys. Rev. D98(2018) 044003,arXiv:1805.12188 [astro-ph.IM]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[38]
K. A. Gersbach, S. R. Taylor, P. M. Meyers, and J. D. Romano, “Spatial and spectral characterization of the gravitational-wave background with the PTA optimal statistic,”Phys. Rev. D111no. 2, (2025) 023027, arXiv:2406.11954 [astro-ph.IM]
-
[39]
K. A. Gersbach, S. R. Taylor, B. B´ ecsy, A.-M. Lemke, A. Mitridate, and N. Pol, “Mapping the gravitational-wave background across the spectrum with a next-generation anisotropic per-frequency optimal statistic,”Phys. Rev. D113no. 10, (2026) 103031,arXiv:2509.07090 [astro-ph.IM]
-
[40]
The impact of cosmic variance on PTAs anisotropy searches,
T. Konstandin, A.-M. Lemke, A. Mitridate, and E. Perboni, “The impact of cosmic variance on PTAs anisotropy searches,”JCAP04(2025) 059, arXiv:2408.07741 [astro-ph.CO]
-
[41]
Cosmic Variance in Anisotropy Searches at Pulsar Timing Arrays,
V. Domcke, G. Franciolini, and M. Pieroni, “Cosmic Variance in Anisotropy Searches at Pulsar Timing Arrays,”arXiv:2508.21131 [astro-ph.CO]
-
[42]
Gaussian Error Linear Units (GELUs)
D. Hendrycks and K. Gimpel, “Gaussian Error Linear Units (GELUs),”arXiv e-prints(June, 2016) arXiv:1606.08415,arXiv:1606.08415 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[43]
Strategies for Pre-training Graph Neural Networks,
W. Hu, B. Liu, J. Gomes, M. Zitnik, P. Liang, V. Pande, and J. Leskovec, “Strategies for Pre-training Graph Neural Networks,”arXiv e-prints(May, 2019) arXiv:1905.12265,arXiv:1905.12265 [cs.LG]
-
[44]
The NANOGrav 15-year Gravitational-Wave Background Methods,
A. D. Johnson, P. M. Meyers,et al., “The NANOGrav 15-year Gravitational-Wave Background Methods,” Physical Review D109no. 10, (May, 2024) 103012. http://arxiv.org/abs/2306.16223. [51]MeerKATCollaboration, K. Grunthal, R. S. Nathan, et al., “The MeerKAT Pulsar Timing Array: Maps of the gravitational-wave sky with the 4.5 year data release,”Monthly Notices ...
-
[45]
B. E. Moreschi, S. Valtolina, A. Sesana, G. Shaifullah, M. Falxa, L. Speri, D. Izquierdo-Villalba, and A. Chalumeau, “Dissecting the nanoHz gravitational wave sky: frequency-correlated anisotropy induced by eccentric supermassive black hole binaries,” arXiv:2506.14882 [astro-ph.GA]
work page internal anchor Pith review Pith/arXiv arXiv
-
[46]
Measuring anisotropies in the PTA band with cross-correlations
G. Cusin, C. Pitrou, M. Pijnenburg, and A. Sesana, “Measuring anisotropies in the PTA band with cross-correlations.” Feb., 2025. http://arxiv.org/abs/2502.17401
-
[47]
D. Agarwal, J. D. Romano, Y. Ali-Ha¨ ımoud, and T. L. Smith, “Addressing leakage and mode suppression in angular power spectrum estimation for gravitational-wave backgrounds using pulsar timing arrays.” Feb., 2026. http://arxiv.org/abs/2602.20075
-
[48]
M. Cury lo, E. Thrane, P. D. Lasky, and D. S. Gaynor, “A comprehensive framework for phase-coherent mapping of the gravitational-wave sky with pulsar timing arrays.” Apr., 2026. http://arxiv.org/abs/2604.19073. arXiv:2604.19073 [astro-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.