arxiv: 2605.11274 · v1 · submitted 2026-05-11 · 🌀 gr-qc · astro-ph.CO· hep-ph

Recognition: no theorem link

End-to-End Population Inference from Gravitational-Wave Strain using Transformers

Cecilia Maria Fabbri, Jonathan Gair, Konstantin Leyde, Matthew Mould, Maximilian Dax, Stephen R. Green

Pith reviewed 2026-05-13 01:57 UTC · model grok-4.3

classification 🌀 gr-qc astro-ph.COhep-ph

keywords gravitational wavespopulation inferencetransformerscompact binariessimulation-based inferencehierarchical Bayesian inferenceHubble constantselection effects

0 comments

The pith

Dingo-Pop uses a transformer to infer compact-binary population properties directly from gravitational-wave strain data in one second.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Dingo-Pop, a simulation-based method that infers population-level properties of merging compact binaries straight from catalogs of raw gravitational-wave strain without first extracting per-event parameters. Traditional hierarchical Bayesian analyses combine individual-event posteriors but grow expensive and noisy with larger catalogs due to Monte Carlo sampling. Dingo-Pop instead embeds each event's strain into low-dimensional tokens, feeds the variable-length sequence to a transformer trained on simulated catalogs that include selection effects, and outputs the population posterior directly. This yields amortized inference that runs in roughly one second for catalogs of 25 to 1000 events and produces posteriors that are well-calibrated and consistent with standard methods. A reader would care because the approach removes the computational bottleneck that currently limits population studies and enables rapid, large-scale injection campaigns that test how cosmological inferences improve with catalog size.

Core claim

Dingo-Pop is a simulation-based inference framework that directly maps gravitational-wave strain data from catalogs of 25 to 1000 events to population posteriors using a transformer architecture. Data from each event are embedded into low-dimensional tokens and combined via the transformer trained on simulated catalogs subject to selection effects. This produces well-calibrated posteriors in about one second without per-event Monte Carlo sampling noise and matches results from traditional hierarchical Bayesian methods while supporting new classes of large-scale studies, such as examining how spectral-siren Hubble constant uncertainties scale with catalog size.

What carries the argument

A transformer network that ingests sequences of embedded gravitational-wave strain tokens from catalogs of variable size and, after training on simulations that incorporate selection effects, directly outputs population posterior distributions.

If this is right

Population inference proceeds without Monte Carlo sampling noise from individual-event analyses.
A single trained network handles catalogs ranging from 25 to 1000 events without retraining.
End-to-end inference completes in approximately one second per catalog.
Posteriors remain well-calibrated and agree with those from traditional hierarchical methods.
New large-scale injection studies become feasible, including direct tests of how spectral-siren Hubble-constant uncertainties decrease with growing catalog size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-time population updates could become routine once new events are detected in future observing runs.
The method might eventually reduce reliance on detailed per-event parameter estimation when the primary goal is population inference.
Similar token-embedding and transformer pipelines could be tested on other hierarchical inference tasks that combine noisy individual measurements into global parameters.

Load-bearing premise

A network trained only on simulated catalogs with modeled noise and selection effects will still produce accurate posteriors when applied to real gravitational-wave data whose noise properties and selection function may differ from the training distribution.

What would settle it

Apply Dingo-Pop to a set of real LIGO-Virgo strain events and compare the resulting population posterior against the posterior obtained from standard hierarchical inference on the same events; a statistically significant discrepancy in any population parameter would falsify the claim of consistency.

Figures

Figures reproduced from arXiv: 2605.11274 by Cecilia Maria Fabbri, Jonathan Gair, Konstantin Leyde, Matthew Mould, Maximilian Dax, Stephen R. Green.

**Figure 1.** Figure 1: Dingo-Pop framework, showing inference (left) and training (right). Blue boxes indicate sampling from the population forward model, orange boxes neural networks, cyan boxes network outputs, and purple observed data. To accelerate training, we use auxiliary neural networks that estimate pdet(θ) and emulate detected embeddings. dition, as an SBI method, Dingo-Pop enables training directly on astrophysical si… view at source ↗

**Figure 2.** Figure 2: P–P plot for 2500 simulated catalogs with random [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Hyperparameter posterior (top) and inferred mass [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Relative H0 uncertainty (2-σ width divided by median) versus catalog size for 128 simulated populations. Gray: individual populations as events are added; black: median; blue band: 1-σ scatter across populations. Plots for the other hyperparameters are in the Supplemental Material. of N. The uncertainty decreases with N, though with significant population-to-population scatter reflecting the specific hype… view at source ↗

**Figure 5.** Figure 5: Log detection probability as a function of detector-frame mass and luminosity distance, for secondary mass of [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution of embeddings in five out of the 32 em [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: (Left) P–P plot for posterior calibration for catalog size N = 1000, with 2500 catalog realizations. For each hyperparameter and simulated catalog, we compute the percentile rank of the true value within the marginal posterior. The cumulative distribution function (CDF) of these ranks is plotted; for well-calibrated posteriors, this follows the diagonal. Gray bands indicate 1-σ, 2-σ, and 3-σ intervals expe… view at source ↗

**Figure 8.** Figure 8: Comparison between the conventional likelihood-based (orange) and likelihood-free SBI (blue) methods for Population 1 [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: As in Fig. 8 but for Population 2 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Conventional analysis for Population 1, comparing the hyperparameter posterior for different numbers of single-event [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Conventional analysis for Population 2, comparing the hyperparameter posterior for different numbers of injections [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: Comparison for the 100-event out-of-distribution catalog (cf. Tab. V) between the conventional likelihood-based [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Repetition of Fig. 4 but for all hyperparameters. To reduce fluctuations, for [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

read the original abstract

The population of compact binaries encodes information about their astrophysical origins and the expansion of the universe. Hierarchical Bayesian methods infer these properties by combining single-event posteriors. As catalogs grow, however, this approach becomes computationally expensive and is subject to increasing Monte Carlo uncertainty. We introduce Dingo-Pop, a simulation-based framework that infers population posteriors directly from gravitational-wave strain data. The data for each event are embedded into low-dimensional tokens and combined using a transformer trained on simulated catalogs subject to selection effects. This enables (i) population inference without per-event Monte Carlo sampling noise, (ii) amortization across variable catalog sizes using a single network, and (iii) end-to-end inference in about one second. We train a network for catalog sizes of 25 to 1000 events, and obtain well-calibrated posteriors consistent with traditional methods. By avoiding per-event analyses that can take hours to days, Dingo-Pop enables new classes of large-scale injection studies; as an application, we examine how spectral-siren Hubble constant uncertainties change with catalog size.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Dingo-Pop shows a workable transformer route from raw strain segments straight to population posteriors, but the sim-to-real gap is the part that still needs checking.

read the letter

The main point is that they have trained one network to take tokenized strain data from a variable number of events and output population-level posteriors in about a second, without running separate parameter estimation on each event. That removes the Monte Carlo noise that grows with catalog size and lets them amortize the work across catalogs from 25 to 1000 events while folding in selection effects during training. On the simulated catalogs they used, the posteriors line up with the usual hierarchical pipeline and they demonstrate the speed-up on a spectral-siren Hubble-constant exercise. Those are the concrete advances: direct strain-to-population mapping and a single model that scales with catalog size. The technical choice to embed strain into tokens and let the transformer aggregate them is new in this setting and avoids the usual two-stage workflow. The paper is therefore useful for anyone who wants to run large injection campaigns or explore how population constraints tighten with more events. The soft spot is exactly the one the stress-test flags. Everything is trained and validated inside the simulation distribution that includes modeled noise and selection. Real strain contains glitches, non-stationary segments, and calibration mismatches that are not guaranteed to stay inside that distribution. Because the architecture produces no per-event posteriors, there is no internal diagnostic when the domain shift appears. The abstract says the posteriors are well-calibrated and consistent, but without the quantitative tables or real-event cross-checks in front of me I cannot judge how large the mismatch might be. For a methods paper this is a normal limitation rather than a fatal one, yet it does mean the central claim is still provisional until someone tests it on actual LIGO/Virgo data. I would bring this to a reading group to see the architecture details and the calibration plots. It is worth sending to peer review because the computational claim is sharp and the idea is worth stress-testing by the community, even if the final referee reports will probably ask for more real-data validation.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces Dingo-Pop, a simulation-based inference framework that uses a transformer to infer population-level posteriors for compact binary mergers directly from gravitational-wave strain data. Events are embedded as tokens and aggregated by the network, which is trained exclusively on simulated catalogs that include selection effects. The method is claimed to eliminate per-event Monte Carlo sampling noise, amortize inference across variable catalog sizes (25–1000 events) with a single network, deliver end-to-end results in ~1 s, and produce well-calibrated posteriors consistent with traditional hierarchical Bayesian analyses. An application examining spectral-siren Hubble-constant uncertainties as a function of catalog size is presented.

Significance. If the central claims hold under real-data conditions, the approach would substantially lower the computational barrier to population inference for the large catalogs expected from future observing runs, enabling previously intractable injection campaigns and rapid re-analyses. The amortization property and removal of per-event sampling noise are particularly valuable strengths.

major comments (3)

[Abstract] Abstract: the statement that the posteriors are 'well-calibrated' and 'consistent with traditional methods' is presented without any quantitative metrics (coverage probabilities, bias or variance comparisons, KL divergences, or calibration plots). This is load-bearing for the central reliability claim.
[Abstract] Abstract: selection effects are stated to be incorporated during training, yet no description is given of the specific selection function, how it is sampled, or how the network is shown to recover unbiased population parameters when selection is present. This directly affects the validity of end-to-end inference from strain data.
[Abstract] Abstract / method description: the network is trained exclusively on simulated catalogs; no tests for robustness to mismatches between simulated and real LIGO/Virgo noise (non-stationary glitches, calibration errors) or selection-function deviations are reported. Because the architecture produces population posteriors without intermediate per-event diagnostics, any domain shift propagates directly to the final result.

minor comments (1)

[Abstract] The abstract would be clearer if it briefly indicated the range of population parameters (e.g., mass, spin, redshift distributions) and the transformer architecture (number of layers, attention heads, embedding dimension).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed report. We address each major comment point by point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that the posteriors are 'well-calibrated' and 'consistent with traditional methods' is presented without any quantitative metrics (coverage probabilities, bias or variance comparisons, KL divergences, or calibration plots). This is load-bearing for the central reliability claim.

Authors: We agree that the abstract would benefit from explicit quantitative support for the calibration claim. The main text already contains calibration plots (showing empirical coverage probabilities for 68% and 95% intervals) and direct comparisons of posterior summaries to traditional hierarchical Bayesian results on the same simulated catalogs. To address the concern, we will revise the abstract to include a concise statement such as 'yielding well-calibrated posteriors with coverage probabilities consistent with nominal levels and in agreement with traditional methods.' revision: yes
Referee: [Abstract] Abstract: selection effects are stated to be incorporated during training, yet no description is given of the specific selection function, how it is sampled, or how the network is shown to recover unbiased population parameters when selection is present. This directly affects the validity of end-to-end inference from strain data.

Authors: The Methods section specifies the selection function as an SNR-threshold-based detection probability drawn from standard LIGO/Virgo sensitivity curves, with catalogs generated by sampling the population model and retaining only detectable events. Recovery of unbiased parameters under selection is shown via direct comparison of inferred hyperparameters to injected values and to conventional hierarchical analyses on identical selected catalogs. We will add a brief clause to the abstract: 'trained on simulated catalogs that incorporate selection effects via an SNR-based detection threshold.' revision: yes
Referee: [Abstract] Abstract / method description: the network is trained exclusively on simulated catalogs; no tests for robustness to mismatches between simulated and real LIGO/Virgo noise (non-stationary glitches, calibration errors) or selection-function deviations are reported. Because the architecture produces population posteriors without intermediate per-event diagnostics, any domain shift propagates directly to the final result.

Authors: The work is deliberately scoped to controlled simulation-based validation with realistic but stationary noise models. No explicit robustness tests against non-stationary glitches or calibration errors are included. We will add a dedicated paragraph in the Discussion section acknowledging this as a current limitation and outlining future directions such as domain-adversarial training or fine-tuning on real-data injections. revision: partial

Circularity Check

0 steps flagged

No circularity: training on independent simulations yields independent population inference

full rationale

The paper trains a transformer on simulated catalogs that embed selection effects and then applies the network to produce population posteriors from strain data. This is a standard simulation-based inference setup where the network learns a mapping from data to parameters; the output on new inputs is not forced by construction to match any fitted quantity from the target data. Validation consists of calibration checks and consistency with traditional per-event PE plus hierarchical inference, both performed on held-out simulations whose ground-truth population parameters are known independently of the network weights. No self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the derivation chain. The method therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on abstract; training details, hyperparameters, and exact simulation assumptions unavailable. Transformer architecture implies many free parameters whose values are not reported.

pith-pipeline@v0.9.0 · 5505 in / 1113 out tokens · 22824 ms · 2026-05-13T01:57:52.620540+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 15 internal anchors

[1]

A. G. Abacet al.(LIGO Scientific, VIRGO, KA- GRA), GWTC-4.0: An Introduction to Version 4.0 of the Gravitational-Wave Transient Catalog, (2025), arXiv:2508.18080 [gr-qc]

work page arXiv 2025
[2]

A. G. Abacet al.(LIGO Scientific, VIRGO, KAGRA), GWTC-4.0: Updating the Gravitational-Wave Transient Catalog with Observations from the First Part of the Fourth LIGO-Virgo-KAGRA Observing Run, (2025), arXiv:2508.18082 [gr-qc]. 6

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Advanced LIGO

J. Aasiet al.(LIGO Scientific), Advanced LIGO, Class. Quant. Grav.32, 074001 (2015), arXiv:1411.4547 [gr-qc]

work page internal anchor Pith review arXiv 2015
[4]

Advanced Virgo: a 2nd generation interferometric gravitational wave detector

F. Acerneseet al.(VIRGO), Advanced Virgo: a second- generation interferometric gravitational wave detector, Class. Quant. Grav.32, 024001 (2015), arXiv:1408.3978 [gr-qc]

work page internal anchor Pith review arXiv 2015
[5]

Akutsuet al.(KAGRA), Overview of KAGRA: Detector design and construction history, PTEP2021, 05A101 (2021), arXiv:2005.05574 [physics.ins-det]

T. Akutsuet al.(KAGRA), Overview of KAGRA: Detec- tor design and construction history, PTEP2021, 05A101 (2021), arXiv:2005.05574 [physics.ins-det]

work page arXiv 2021
[6]

A. G. Abacet al.(LIGO Scientific, VIRGO, KAGRA), GWTC-4.0: Population Properties of Merging Compact Binaries, (2025), arXiv:2508.18083 [astro-ph.HE]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Tests of General Relativity with GWTC-3

R. Abbottet al.(LIGO Scientific, VIRGO, KAGRA), Tests of General Relativity with GWTC-3, (2021), arXiv:2112.06861 [gr-qc]

work page internal anchor Pith review Pith/arXiv arXiv 2021
[8]

A. G. Abacet al.(LIGO Scientific, VIRGO, KAGRA), GWTC-4.0: Constraints on the Cosmic Expansion Rate and Modified Gravitational-wave Propagation, (2025), arXiv:2509.04348 [astro-ph.CO]

work page arXiv 2025
[9]

W. M. Farr, Accuracy Requirements for Empirically- Measured Selection Functions, Research Notes of the AAS 3, 66 (2019), arXiv:1904.10879 [astro-ph.IM]

work page arXiv 2019
[10]

Essick and W

R. Essick and W. Farr, Precision Requirements for Monte Carlo Sums within Hierarchical Bayesian Infer- ence, (2022), arXiv:2204.00461 [astro-ph.IM]

work page arXiv 2022
[11]

Talbot and J

C. Talbot and J. Golomb, Growing pains: understand- ing the impact of likelihood uncertainty on hierarchical Bayesian inference for gravitational-wave astronomy, Mon. Not. Roy. Astron. Soc.526, 3495 (2023), arXiv:2304.06138 [astro-ph.IM]

work page arXiv 2023
[12]

Heinzel and S

J. Heinzel and S. Vitale, When (not) to trust Monte Carlo approximations for hierarchical Bayesian inference, (2025), arXiv:2509.07221 [astro-ph.HE]

work page arXiv 2025
[13]

Branchesiet al., Science with the Einstein Tele- scope: a comparison of different designs, JCAP07, 068, arXiv:2303.15923 [gr-qc]

M. Branchesiet al., Science with the Einstein Tele- scope: a comparison of different designs, JCAP07, 068, arXiv:2303.15923 [gr-qc]

work page arXiv
[14]

Cosmic Explorer: The U.S. Contribution to Gravitational-Wave Astronomy beyond LIGO

D. Reitzeet al., Cosmic Explorer: The U.S. Contribution to Gravitational-Wave Astronomy beyond LIGO, Bull. Am. Astron. Soc.51, 035 (2019), arXiv:1907.04833 [astro- ph.IM]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[15]

Mandel, W

I. Mandel, W. M. Farr, and J. R. Gair, Extracting distri- bution parameters from multiple uncertain observations with selection biases, Mon. Not. Roy. Astron. Soc.486, 1086 (2019), arXiv:1809.02063 [physics.data-an]

work page arXiv 2019
[16]

Fishbach, D

M. Fishbach, D. E. Holz, and W. M. Farr, Does the Black Hole Merger Rate Evolve with Redshift?, Astrophys. J. Lett.863, L41 (2018), arXiv:1805.10270 [astro-ph.HE]

work page arXiv 2018
[17]

Inferring the properties of a population of compact binaries in presence of selection effects

S. Vitale, D. Gerosa, W. M. Farr, and S. R. Taylor, Infer- ring the properties of a population of compact binaries in presence of selection effects 10.1007/978-981-15-4702- 7 45-1 (2020), arXiv:2007.05579 [astro-ph.IM]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/978-981-15-4702- 2020
[18]

Talbotet al., Inference with finite time series: II

C. Talbotet al., Inference with finite time series: II. The window strikes back, Class. Quant. Grav.42, 235023 (2025), arXiv:2508.11091 [gr-qc]

work page arXiv 2025
[19]

Essicket al., Compact binary coalescence sensitivity es- timates with injection campaigns during the LIGO-Virgo- KAGRA Collaborations’ fourth observing run, Phys

R. Essicket al., Compact binary coalescence sensitivity estimates with injection campaigns during the LIGO- Virgo-KAGRA Collaborations’ fourth observing run, Phys. Rev. D112, 102001 (2025), arXiv:2508.10638 [gr-qc]

work page arXiv 2025
[20]

Tiwari, Estimation of the Sensitive Volume for Gravitational-wave Source Populations Using Weighted Monte Carlo Integration, Class

V. Tiwari, Estimation of the Sensitive Volume for Gravitational-wave Source Populations Using Weighted Monte Carlo Integration, Class. Quant. Grav.35, 145009 (2018), arXiv:1712.00482 [astro-ph.HE]

work page arXiv 2018
[21]

J. W. Barrett, I. Mandel, C. J. Neijssel, S. Stevenson, and A. Vigna-Gomez, Exploring the Parameter Space of Compact Binary Population Synthesis, IAU Symp.325, 46 (2016), arXiv:1704.03781 [astro-ph.HE]

work page arXiv 2016
[22]

S. R. Taylor and D. Gerosa, Mining Gravitational-wave Catalogs To Understand Binary Stellar Evolution: A New Hierarchical Bayesian Framework, Phys. Rev. D98, 083017 (2018), arXiv:1806.08365 [astro-ph.HE]

work page arXiv 2018
[23]

Zevin, S

M. Zevin, S. S. Bavera, C. P. L. Berry, V. Kalogera, T. Fragos, P. Marchant, C. L. Rodriguez, F. Antonini, D. E. Holz, and C. Pankow, One Channel to Rule Them All? Constraining the Origins of Binary Black Holes Using Multiple Formation Pathways, Astrophys. J.910, 152 (2021), arXiv:2011.10057 [astro-ph.HE]

work page arXiv 2021
[24]

K. W. K. Wong, K. Breivik, K. Kremer, and T. Callis- ter, Joint constraints on the field-cluster mixing fraction, common envelope efficiency, and globular cluster radii from a population of binary hole mergers via deep learn- ing, Phys. Rev. D103, 083021 (2021), arXiv:2011.03564 [astro-ph.HE]

work page arXiv 2021
[25]

Mould, D

M. Mould, D. Gerosa, and S. R. Taylor, Deep learning and Bayesian inference of gravitational-wave populations: Hierarchical black-hole mergers, Phys. Rev. D106, 103013 (2022), arXiv:2203.03651 [astro-ph.HE]

work page arXiv 2022
[26]

Colloms, C

S. Colloms, C. P. L. Berry, J. Veitch, and M. Zevin, Exploring the Evolution of Gravitational-wave Emitters with Efficient Emulation: Constraining the Origins of Binary Black Holes Using Normalizing Flows, Astrophys. J.988, 189 (2025), arXiv:2503.03819 [astro-ph.HE]

work page arXiv 2025
[27]

Plunkett, M

C. Plunkett, M. Mould, and S. Vitale, Constraining Pop- ulation III stellar demographics with next-generation gravitational-wave observatories, Phys. Rev. D112, 023039 (2025), arXiv:2504.18615 [gr-qc]

work page arXiv 2025
[28]

Leyde, S

K. Leyde, S. R. Green, A. Toubiana, and J. Gair, Grav- itational wave populations and cosmology with neural posterior estimation, Phys. Rev. D109, 064056 (2024), arXiv:2311.12093 [gr-qc]

work page arXiv 2024
[29]

Papamakarios, E

G. Papamakarios, E. Nalisnick, D. J. Rezende, S. Mo- hamed, and B. Lakshminarayanan, Normalizing Flows for Probabilistic Modeling and Inference, J. Machine Learn- ing Res.22, 2617 (2021), arXiv:1912.02762 [stat.ML]

work page arXiv 2021
[30]

Lueckmann, P

J.-M. Lueckmann, P. J. Gon¸ calves, G. Bassetto, K.¨Ocal, M. Nonnenmacher, and J. H. Macke, Flexible statistical inference for mechanistic models of neural dynamics, in Proceedings of the 31st International Conference on Neu- ral Information Processing Systems(2017) pp. 1289–1299

work page 2017
[31]

Greenberg, M

D. Greenberg, M. Nonnenmacher, and J. Macke, Auto- matic posterior transformation for likelihood-free infer- ence, inInternational Conference on Machine Learning (PMLR, 2019) pp. 2404–2414

work page 2019
[32]

Cranmer, J

K. Cranmer, J. Brehmer, and G. Louppe, The frontier of simulation-based inference, Proc. Nat. Acad. Sci.117, 30055 (2020), arXiv:1911.01429 [stat.ML]

work page arXiv 2020
[33]

M. Dax, S. R. Green, J. Gair, J. H. Macke, A. Buonanno, and B. Sch¨ olkopf, Real-Time Gravitational Wave Science with Neural Posterior Estimation, Phys. Rev. Lett.127, 241103 (2021), arXiv:2106.12594 [gr-qc]

work page arXiv 2021
[34]

Attention Is All You Need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention Is All You Need, arXiv e-prints , arXiv:1706.03762 (2017), arXiv:1706.03762 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[35]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

A. Paszkeet al., PyTorch: An Imperative Style, High-Performance Deep Learning Library, (2019), arXiv:1912.01703 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[36]

Gloeckler, M

M. Gloeckler, M. Deistler, C. Weilbach, F. Wood, and 7 J. H. Macke, All-in-one simulation-based inference, arXiv preprint arXiv:2404.09636 (2024)

work page arXiv 2024
[37]

Jiang, H.-L

J.-Q. Jiang, H.-L. Huang, J. He, Y.-T. Wang, and Y.-S. Piao, A fast deep-learning approach to probing primor- dial black hole populations in gravitational wave events, (2025), arXiv:2505.15530 [gr-qc]

work page arXiv 2025
[38]

Talbot and E

C. Talbot and E. Thrane, Measuring the binary black hole mass spectrum with an astrophysically moti- vated parameterization, Astrophys. J.856, 173 (2018), arXiv:1801.02699 [astro-ph.HE]

work page arXiv 2018
[39]

S. R. Taylor, J. R. Gair, and I. Mandel, Hubble without the Hubble: Cosmology using advanced gravitational- wave detectors alone, Phys. Rev. D85, 023535 (2012), arXiv:1108.5161 [gr-qc]

work page arXiv 2012
[40]

W. M. Farr, M. Fishbach, J. Ye, and D. Holz, A Future Percent-Level Measurement of the Hubble Expansion at Redshift 0.8 With Advanced LIGO, Astrophys. J. Lett. 883, L42 (2019), arXiv:1908.09084 [astro-ph.CO]

work page arXiv 2019
[41]

Mastrogiovanni, K

S. Mastrogiovanni, K. Leyde, C. Karathanasis, E. Chassande-Mottin, D. A. Steer, J. Gair, A. Ghosh, R. Gray, S. Mukherjee, and S. Rinaldi, On the importance of source population models for gravitational-wave cosmol- ogy, Phys. Rev. D104, 062009 (2021), arXiv:2103.14663 [gr-qc]

work page arXiv 2021
[42]

Zaheer, S

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, Deep sets, Advances in neural information processing systems30(2017)

work page 2017
[43]

J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh, Set transformer: A framework for attention-based permutation-invariant neural networks, inInternational conference on machine learning(PMLR, 2019) pp. 3744– 3753

work page 2019
[44]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, inProceedings of the 2019 Conference of the North American Chapter of the Asso- ciation for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), edited by J. Burstein, C. Doran, and T. Sol...

work page 2019
[45]

Darcet, M

T. Darcet, M. Oquab, J. Mairal, and P. Bojanowski, Vision transformers need registers, inThe Twelfth Inter- national Conference on Learning Representations(2024)

work page 2024
[46]

Katharopoulos, A

A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, Transformers are rnns: Fast autoregressive transform- ers with linear attention, inInternational conference on machine learning(PMLR, 2020) pp. 5156–5165

work page 2020
[47]

Kitaev, L

N. Kitaev, L. Kaiser, and A. Levskaya, Reformer: The ef- ficient transformer, inInternational Conference on Learn- ing Representations(2020)

work page 2020
[48]

Generating Long Sequences with Sparse Transformers

R. Child, S. Gray, A. Radford, and I. Sutskever, Gen- erating long sequences with sparse transformers (2019), arXiv:1904.10509 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[49]

Longformer: The Long-Document Transformer

I. Beltagy, M. E. Peters, and A. Cohan, Long- former: The long-document transformer, arXiv preprint arXiv:2004.05150 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2004
[50]

It Just Takes Two: Scaling Amortized Inference to Large Sets

A. Wehenkel, M. Kagan, L. Heinrich, and C. Pollard, It just takes two: Scaling amortized inference to large sets (2026), arXiv:2605.07972 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[51]

Abbottet al.(LIGO Scientific, Virgo,, KAGRA, VIRGO), Constraints on the Cosmic Expansion His- tory from GWTC–3, Astrophys

R. Abbottet al.(LIGO Scientific, Virgo,, KAGRA, VIRGO), Constraints on the Cosmic Expansion His- tory from GWTC–3, Astrophys. J.949, 76 (2023), arXiv:2111.03604 [astro-ph.CO]

work page arXiv 2023
[52]

Computationally efficient models for the dominant and sub-dominant harmonic modes of precessing binary black holes

G. Prattenet al., Computationally efficient models for the dominant and subdominant harmonic modes of precessing binary black holes, Phys. Rev. D103, 104056 (2021), arXiv:2004.06503 [gr-qc]

work page internal anchor Pith review arXiv 2021
[53]

Ramos-Buades, A

A. Ramos-Buades, A. Buonanno, H. Estell´ es, M. Khalil, D. P. Mihaylov, S. Ossokine, L. Pompili, and M. Shiferaw, Next generation of accurate and efficient multipo- lar precessing-spin effective-one-body waveforms for bi- nary black holes, Phys. Rev. D108, 124037 (2023), arXiv:2303.18046 [gr-qc]

work page arXiv 2023
[54]

Buikema et al

A. Buikemaet al.(aLIGO), Sensitivity and performance of the Advanced LIGO detectors in the third observing run, Phys. Rev. D102, 062003 (2020), arXiv:2008.01301 [astro-ph.IM]

work page arXiv 2020
[55]

Tseet al., Quantum-Enhanced Advanced LIGO Detec- tors in the Era of Gravitational-Wave Astronomy, Phys

M. Tseet al., Quantum-Enhanced Advanced LIGO Detec- tors in the Era of Gravitational-Wave Astronomy, Phys. Rev. Lett.123, 231107 (2019)

work page 2019
[56]

GWTC-3: Compact Binary Coalescences Observed by LIGO and Virgo During the Second Part of the Third Observing Run

R. Abbottet al.(KAGRA, VIRGO, LIGO Scientific), GWTC-3: Compact Binary Coalescences Observed by LIGO and Virgo during the Second Part of the Third Observing Run, Phys. Rev. X13, 041039 (2023), arXiv:2111.03606 [gr-qc]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[57]

Durkan, A

C. Durkan, A. Bekasov, I. Murray, and G. Papamakarios, Neural spline flows, Advances in neural information pro- cessing systems32(2019), arXiv:1906.04032 [stat.ML]

work page arXiv 2019
[58]

Bilby: A user-friendly Bayesian inference library for gravitational-wave astronomy

G. Ashton, M. H¨ ubner, P. D. Lasky, C. Talbot, K. Ackley, S. Biscoveanu, Q. Chu, A. Divakarla, P. J. Easter, B. Gon- charov, and et al., BILBY: A User-friendly Bayesian In- ference Library for Gravitational-wave Astronomy, Astro- phys. J. Suppl.241, 27 (2019), arXiv:1811.02042 [astro- ph.IM]

work page internal anchor Pith review arXiv 2019
[59]

Mastrogiovanni, D

S. Mastrogiovanni, D. Laghi, R. Gray, G. C. Santoro, A. Ghosh, C. Karathanasis, K. Leyde, D. A. Steer, S. Per- ries, and G. Pierra, Joint population and cosmological properties inference with gravitational waves standard sirens and galaxy surveys, Phys. Rev. D108, 042002 (2023), arXiv:2305.10488 [astro-ph.CO]

work page arXiv 2023
[60]

Essick and M

R. Essick and M. Fishbach, Ensuring Consistency between Noise and Detection in Hierarchical Bayesian Inference, Astrophys. J.962, 169 (2024), arXiv:2310.02017 [gr-qc]

work page arXiv 2024
[61]

Talbot and E

C. Talbot and E. Thrane, Flexible and Accurate Evalua- tion of Gravitational-wave Malmquist Bias with Machine Learning, Astrophys. J.927, 76 (2022), arXiv:2012.01317 [gr-qc]

work page arXiv 2022
[62]

Gerosa, G

D. Gerosa, G. Pratten, and A. Vecchio, Gravitational- wave selection effects using neural-network classifiers, Phys. Rev. D102, 103020 (2020), arXiv:2007.06585 [astro- ph.HE]

work page arXiv 2020
[63]

T. A. Callister, R. Essick, and D. E. Holz, Neural net- work emulator of the Advanced LIGO and Advanced Virgo selection function, Phys. Rev. D110, 123041 (2024), arXiv:2408.16828 [astro-ph.HE]

work page arXiv 2024
[64]

Lorenzo-Medina and T

A. Lorenzo-Medina and T. Dent, A physically modelled se- lection function for compact binary mergers in the LIGO- Virgo O3 run and beyond, Class. Quant. Grav.42, 045008 (2025), arXiv:2408.13383 [gr-qc]

work page arXiv 2025
[65]

Kofler, M

A. Kofler, M. Dax, S. R. Green, J. Wildberger, N. Gupte, J. H. Macke, J. Gair, A. Buonanno, and B. Sch¨ olkopf, Flexible Gravitational-Wave Parameter Estimation with Transformers, (2025), arXiv:2512.02968 [gr-qc]

work page arXiv 2025
[66]

Cannon, D

P. Cannon, D. Ward, and S. M. Schmon, Investigating the impact of model misspecification in neural simulation- based inference, arXiv preprint arXiv:2209.01845 (2022)

work page arXiv 2022
[67]

Schmitt, P.-C

M. Schmitt, P.-C. B¨ urkner, U. K¨ othe, and S. T. Radev, Detecting model misspecification in amortized bayesian in- 8 ference with neural networks, inDagm german conference on pattern recognition(Springer, 2023) pp. 541–557

work page 2023
[68]

Wehenkel, J

A. Wehenkel, J. L. Gamella, O. Sener, J. Behrmann, G. Sapiro, J.-H. Jacobsen, and M. Cuturi, Addressing misspecification in simulation-based inference through data-driven calibration, arXiv preprint arXiv:2405.08719 (2024)

work page arXiv 2024
[69]

Geffner, G

T. Geffner, G. Papamakarios, and A. Mnih, Composi- tional score modeling for simulation-based inference, in International Conference on Machine Learning(PMLR,

work page
[70]

N. E. Wolfe, M. Mould, J. Veitch, and S. Vitale, Neural Bayesian updates to populations with grow- ing gravitational-wave catalogs, arXiv:2602.20277 [astro- ph.IM] (2026). 9 Supplemental Material PRIOR DISTRIBUTIONS TheDingo-Popframework involves two levels of prior distributions: (1) the single-event priors used to train the underlyingDingoembedding netw...

work page arXiv 2026
[71]

The model has four components: a tokenizer, a transformer encoder, a final feedforward network, and a normalizing flow that estimates the hyper- parameter posterior

for population NPE. The model has four components: a tokenizer, a transformer encoder, a final feedforward network, and a normalizing flow that estimates the hyper- parameter posterior. The architectures ofDingo-Pop and its two auxiliary networks are detailed in Tab. IV. The neural networks are implemented inPyTorch[ 35], with layer normalization (rather ...

work page 2000