Rapid and robust simulation-based inference for kilonovae

Daniel Mortlock; Gurjeet Jagwani; Hiranya V. Peiris; Mattia Bulla; Nikhil Sarin; Samaya Nissanke; Stephanie M. Brown; Stephan Rosswog; Stephen Thorp

arxiv: 2605.13983 · v2 · pith:KOXLGCBVnew · submitted 2026-05-13 · 🌌 astro-ph.IM · astro-ph.HE· hep-ph

Rapid and robust simulation-based inference for kilonovae

Stephanie M. Brown , Mattia Bulla , Hiranya V. Peiris , Nikhil Sarin , Daniel Mortlock , Stephen Thorp , Gurjeet Jagwani , Stephan Rosswog

show 1 more author

Samaya Nissanke

This is my paper

Pith reviewed 2026-06-30 21:12 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.HEhep-ph

keywords simulation-based inferencekilonovaeparameter estimationlikelihood-free inferenceGaussian process emulatorejecta massAT2017gfoneutron star mergers

0 comments

The pith

Simulation-based inference recovers kilonova parameters accurately by learning emulator uncertainty structure directly from simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a simulation-based inference framework that uses density estimation on forward simulations from a Gaussian process emulator to estimate kilonova parameters. It shows that this approach avoids the Gaussian likelihood assumption of MCMC methods, which fails to capture the actual non-Gaussian and correlated emulator errors and produces biased results. Simulation recovery tests confirm that SBI retrieves injected parameters correctly while MCMC exhibits systematic offsets, and the same pattern appears in the analysis of AT2017gfo. The method runs in seconds and yields a total ejecta mass of roughly 0.087 solar masses dominated by lanthanide-poor material with strong exclusion of certain geometries.

Core claim

The central claim is that simulation-based inference learns the non-Gaussian, correlated structure of emulator uncertainty directly from forward simulations of kilonovae, providing accurate posterior samples in seconds while traditional MCMC methods suffer systematic bias from likelihood misspecification. This is demonstrated in recovery tests on injected parameters and in analysis of AT2017gfo, where SBI infers a total ejecta mass of approximately 0.087 solar masses dominated by lanthanide-poor ejecta and excludes toroidal and peanut geometries at the 99th percentile.

What carries the argument

Density-estimation likelihood-free inference framework trained on Gaussian process emulators of POSSIS kilonova simulations. It directly models the mapping from simulated observables to parameters without an explicit likelihood function.

If this is right

The SBI framework produces approximately 20,000 posterior samples in seconds per event.
MCMC posteriors for AT2017gfo accumulate at prior boundaries while SBI posteriors remain interior.
The inferred total ejecta mass for AT2017gfo is about 0.087 solar masses and is dominated by lanthanide-poor material.
Toroidal and peanut ejecta geometries are excluded at the 99th percentile for both components.
Simulation studies show SBI recovers injected parameters without the systematic bias seen in MCMC.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach could support real-time parameter estimation during future multi-messenger events that produce multiple kilonovae.
The framework may apply to other transient sources where simulation costs are high and emulator errors are complex.
Testing on additional observed kilonovae beyond AT2017gfo would check whether the learned uncertainty structure generalizes.

Load-bearing premise

The Gaussian process emulator trained on the simulations captures the true non-Gaussian and correlated structure of emulator uncertainty accurately enough that the inference remains reliable on real data.

What would settle it

Apply both SBI and MCMC to a new set of simulated kilonova light curves generated with known injected parameters plus realistic non-Gaussian emulator errors; if SBI credible intervals contain the true values while MCMC shows consistent offsets, the claim holds.

Figures

Figures reproduced from arXiv: 2605.13983 by Daniel Mortlock, Gurjeet Jagwani, Hiranya V. Peiris, Mattia Bulla, Nikhil Sarin, Samaya Nissanke, Stephanie M. Brown, Stephan Rosswog, Stephen Thorp.

**Figure 1.** Figure 1: Velocity densities in the vy-vz plane for single-component ejecta models 1 day after the merger, with densities described by Cassini ovals as in Eq. 1. Models are shown for increasing values of the shape parameter q (from left to right) and share the same total ejecta mass, mej = 0.1 M⊙, and mass-weighted averaged ejecta velocity, vej = 0.2 c. The models are symmetric about the z axis and the merger plane … view at source ↗

**Figure 2.** Figure 2: Physical parameters (mass, mej; electron fraction, Ye; velocity, vej; and shape, q) of points added to the training set as a function of optimization iteration. Wind ejecta points (mej ≥ 0.02 M⊙) are shown as dots, and dynamical ejecta points (mej < 0.02 M⊙) are shown as crosses. The original training grid is shown as lines. one with m dyn ej ≤ 0.02 M⊙ (dynamical ejecta). Each added model is evaluated at… view at source ↗

**Figure 3.** Figure 3: Logarithm of the ratio of emulator-to-data uncertainty in flux, sorted by wavelength (left to right) and binned time (bottom to top). Values > 0 indicate that emulator error dominates the likelihood, while values < 0 indicate that observational uncertainty dominates. Predicted emulator error for BEO models with mej > 0.02 M⊙ for grid-only model (grey) and grid+BEO (purple). Light vertical dashed lines indi… view at source ↗

**Figure 4.** Figure 4: Distribution of the ratio of empirical-to-predicted emulator error, (fpred − ftrue)/σmodel for the points in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Ratio of empirical error, defined as the difference between predicted flux fpred and true flux ftrue, to predicted uncertainty, σmodel = p Var(fpred) for all bands in the emulator. The red curve shows a unit normal distribution for comparison, and the shaded bands show the 1, 2, and 3 σ intervals [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Sampling distribution (blue histogram) around tobs ≃ 2.5 d in the u, g, and H bands. The model uncertainty (red) and the total uncertainty (blue) are shown as Gaussian distributions with mean equal to the empirical mean of the sampling distribution and variance equal to σ 2 m and σ 2 tot = σ 2 m + σ 2 d, respectively. 6. SIMULATION-BASED INFERENCE In order to explore the posterior constraints on Φ given d… view at source ↗

**Figure 7.** Figure 7: Posterior bias assessment for the kilonova total mass. Each row shows results from 1,000 independent synthetic datasets with known injected parameters. Top row: Parameters for synthetic datasets are randomly drawn from the SBI posterior for AT2017gfo. Bottom row: Parameters for synthetic datasets are fixed to the median posterior values from AT2017gfo. Left: kernel density estimations of posterior predicti… view at source ↗

**Figure 8.** Figure 8: Posterior distributions from a simulation recovery using an MCMC sampler (purple) and the ANPE (grey). The true parameter values are shown in red. Titles report the median and the 16th and 84th percentiles of each posterior. Contours indicate the 1σ and 2σ credible regions. reliable than constraints from the kilonova light curves themselves. It is therefore reasonable that, under these priors, the kilonova… view at source ↗

**Figure 9.** Figure 9: Posterior predictive distributions for a simulated light curve, comparing inference using MCMC (left) and ANPE (right). Predicted fluxes are generated from posterior samples, with the median light curve shown as a solid line and the 90th percentile range shown as a shaded band. The simulated data used for inference are shown in black. The bottom two panels show the distribution of the flux– to-uncertaint… view at source ↗

**Figure 10.** Figure 10: Posterior distributions from AT2017gfo using an MCMC sampler (purple) and the ANPE (grey). Titles report the median and the 16th and 84th percentiles of each posterior. Contours indicate the 1σ and 2σ credible regions. limitations are inherited by both the ANPE and the MCMC. This is evident in the posterior predictive light curves for the u band ( [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: Posterior predictive distributions for AT2017gfo, comparing inference using an MCMC sampler (left) and the ANPE (right). Predicted fluxes are generated from posterior samples, with the median light curve shown as a solid line and the 90th percentile credible interval shown as a shaded band. AT2017gfo is shown in black. The lower panels show the distribution of the flux-to-uncertainty ratio for the reddes… view at source ↗

read the original abstract

With the next generation of both electromagnetic and gravitational wave observatories beginning to come online, rapid analysis methods for kilonova data are becoming increasingly important in astronomy. Traditional Bayesian parameter estimation using Markov chain Monte Carlo (MCMC) is time-consuming and relies on explicit likelihood approximations that can break down when modeling uncertainties are significant. We develop a simulation-based inference (SBI) framework for kilonova parameter estimation using density-estimation likelihood-free inference. The framework uses a Gaussian process emulator trained on $\sim 1300$ POSSIS simulations. We demonstrate that SBI provides a rapid alternative to MCMC that is robust to likelihood misspecification. The standard Gaussian likelihood approximation fails to capture the non-Gaussian, correlated structure of emulator uncertainty; SBI learns this structure directly from forward simulations. Simulation studies show that the SBI method accurately recovers injected parameters, while the MCMC suffers from systematic bias caused by likelihood misspecification. This problem persists when analyzing AT2017gfo, where a subset of the MCMC posteriors pile up at prior boundaries and the SBI posteriors do not. The SBI framework infers a total ejecta mass of $\sim 0.087 M_{\odot}$ dominated by lanthanide-poor ejecta and excludes toroidal and peanut ejecta geometries at the 99th percentile for both components. The SBI framework generates $\sim 2 \times 10^{4}$ posterior samples in seconds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SBI with a GP emulator on ~1300 simulations recovers kilonova parameters without the MCMC biases from Gaussian likelihood misspecification, though emulator fidelity on limited training data needs checking.

read the letter

The main result is that simulation-based inference recovers injected kilonova parameters accurately while MCMC with a Gaussian likelihood shows systematic bias from not capturing the emulator's non-Gaussian correlated uncertainties.

The paper trains a Gaussian process on roughly 1300 POSSIS runs and applies density-estimation likelihood-free inference. This setup is new for kilonova parameter estimation. Their simulation studies show clean recovery with SBI and boundary piling with MCMC. On AT2017gfo the SBI posteriors give a total ejecta mass of ~0.087 solar masses dominated by lanthanide-poor material and exclude toroidal and peanut geometries at the 99th percentile. The speed, with 20,000 samples in seconds, is a clear practical gain.

The work does a solid job demonstrating the misspecification problem through direct comparison and showing how SBI sidesteps it by learning the structure from forward simulations.

The soft spot is the emulator. Trained on only ~1300 points, the GP could miss parts of the true uncertainty structure in sparsely sampled regions. Because the injection tests draw from the same emulator, they do not expose whether emulator errors propagate into the posteriors. The abstract does not describe held-out validation, so that detail matters for trusting the real-data results.

This is for astronomers doing rapid kilonova analysis tied to gravitational-wave triggers. Anyone building pipelines for next-generation observatories would find the method and the comparison useful.

It deserves peer review. The simulation evidence is concrete and the application to AT2017gfo is specific enough to evaluate.

Referee Report

1 major / 1 minor

Summary. The manuscript develops a simulation-based inference (SBI) framework for kilonova parameter estimation that trains a Gaussian process emulator on ~1300 POSSIS simulations and employs density-estimation likelihood-free inference. It claims that SBI provides a rapid alternative to MCMC, is robust to likelihood misspecification because it learns the non-Gaussian and correlated structure of emulator uncertainty directly from forward simulations, accurately recovers injected parameters in simulation studies (while MCMC exhibits systematic bias), and when applied to AT2017gfo yields a total ejecta mass of ~0.087 M⊙ dominated by lanthanide-poor material while excluding toroidal and peanut geometries at the 99th percentile.

Significance. If the results hold, the work would represent a useful contribution to rapid kilonova analysis methods needed for upcoming electromagnetic and gravitational-wave facilities. The explicit contrast between SBI and Gaussian-likelihood MCMC on both simulated and real data (AT2017gfo) is a concrete strength, as is the use of forward simulations to capture emulator uncertainty structure. The reported inference on ejecta mass and geometry exclusion provides a falsifiable prediction that can be tested with future observations.

major comments (1)

[Abstract (simulation studies and emulator description)] The simulation studies inject parameters drawn from the same GP emulator that supplies the training data for the SBI density estimator. Because the studies therefore cannot expose biases arising from inaccuracies in the emulator's predictive mean or miscalibrated uncertainties (especially in sparsely sampled regions of parameter space), they do not independently validate the claim that SBI recovers parameters accurately when applied to real data. The abstract states that the emulator is trained on ~1300 simulations but does not report held-out validation metrics; this validation is load-bearing for the central robustness claim.

minor comments (1)

The abstract could specify the precise form of the density estimator (e.g., normalizing flow architecture or neural posterior estimation variant) used within the SBI framework.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review. The major comment raises a valid point about the scope of our simulation studies and the need for explicit emulator validation metrics. We address this below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract (simulation studies and emulator description)] The simulation studies inject parameters drawn from the same GP emulator that supplies the training data for the SBI density estimator. Because the studies therefore cannot expose biases arising from inaccuracies in the emulator's predictive mean or miscalibrated uncertainties (especially in sparsely sampled regions of parameter space), they do not independently validate the claim that SBI recovers parameters accurately when applied to real data. The abstract states that the emulator is trained on ~1300 simulations but does not report held-out validation metrics; this validation is load-bearing for the central robustness claim.

Authors: We agree that the simulation studies, by design, treat the GP emulator as the forward model and therefore primarily demonstrate SBI's robustness to likelihood misspecification relative to Gaussian-likelihood MCMC under that model; they do not independently test for biases due to emulator inaccuracies on real observations. This was an intentional choice to isolate the impact of the Gaussian approximation. For the application to AT2017gfo, the differing posterior behavior between SBI and MCMC provides supporting evidence of robustness, but we acknowledge that held-out emulator validation metrics are necessary to strengthen the claim. In revision we will (i) add held-out validation metrics (e.g., predictive mean squared error and coverage on a withheld set of simulations) to the abstract and emulator section, and (ii) clarify the intended scope of the simulation studies in the text. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's derivation relies on forward simulations from the external POSSIS code, a GP emulator trained on ~1300 runs, and density-estimation SBI that learns the emulator's uncertainty structure directly from those simulations. Simulation studies inject known parameters into the same forward model and compare recovery between SBI and MCMC; this is an independent consistency check rather than a reduction by construction. The application to AT2017gfo is a direct inference step with no self-definitional equations, fitted inputs renamed as predictions, or load-bearing self-citations that force the central claims. The method remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework relies on the unstated assumption that the POSSIS simulations and emulator training set are representative of real kilonova physics.

pith-pipeline@v0.9.1-grok · 5821 in / 1090 out tokens · 20541 ms · 2026-06-30T21:12:43.306331+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Probabilistic Data-Driven Modelling of Astrophysical Transients: The Neural Process Family for Ultrafast and Class-Agnostic Light Curve Reconstruction with NightLANP
astro-ph.IM 2026-05 unverdicted novelty 6.0

Attentive Neural Processes outperform Gaussian Processes and neural networks on light curve interpolation quality, feature recovery, calibration, and speed for 15 transient classes under realistic Rubin cadences.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

2026, JCAP, 2026(03), 081, doi: 10.1088/1475-7516/2026/03/081 Abbott, B

Abac, A., Abramo, R., Albanesi, S., et al. 2026, JCAP, 2026(03), 081, doi: 10.1088/1475-7516/2026/03/081 Abbott, B. P., Abbott, R., Abbott, T. D., et al. 2017a, PhRvL, 119, doi: 10.1103/PhysRevLett.119.161101 Abbott, B. P., Abbott, R., Abbott, T. D., et al. 2017b, ApJL, 848, L12, doi: 10.3847/2041-8213/aa91c9 Abbott, B. P., Abbott, R., Abbott, T. D., et a...

work page doi:10.1088/1475-7516/2026/03/081 2026
[2]

2002, Machine Learning, 47, 235, doi: 10.1023/A:1013689704352 Banerjee, S., Tanaka, M., Kato, D., et al

https://www.jmlr.org/papers/v3/auer02a.html Auer, P., Cesa-Bianchi, N., & Fischer, P. 2002, Machine Learning, 47, 235, doi: 10.1023/A:1013689704352 Banerjee, S., Tanaka, M., Kato, D., et al. 2022, ApJ, 934, 117, doi: 10.3847/1538-4357/ac7565 Banerjee, S., Tanaka, M., Kawaguchi, K., Kato, D., & Gaigalas, G. 2020, ApJ, 901, 29, doi: 10.3847/1538-4357/abae61...

work page doi:10.1023/a:1013689704352 2002
[3]

Cosmic Explorer: The U.S. Contribution to Gravitational-Wave Astronomy beyond LIGO

http://jmlr.org/papers/v22/19-1028.html Pedersen, C., Font-Ribera, A., Rogers, K. K., et al. 2021, JCAP, 2021(05), 033, doi: 10.1088/1475-7516/2021/05/033 Peng, Y., Risti´ c, M., Kedia, A., et al. 2024, PhRvR, 6, 033078, doi: 10.1103/PhysRevResearch.6.033078 Pian, E., D’Avanzo, P., Benetti, S., et al. 2017, Nature, 551, 67, doi: 10.1038/nature24298 Pognan...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1475-7516/2021/05/033 2021

[1] [1]

2026, JCAP, 2026(03), 081, doi: 10.1088/1475-7516/2026/03/081 Abbott, B

Abac, A., Abramo, R., Albanesi, S., et al. 2026, JCAP, 2026(03), 081, doi: 10.1088/1475-7516/2026/03/081 Abbott, B. P., Abbott, R., Abbott, T. D., et al. 2017a, PhRvL, 119, doi: 10.1103/PhysRevLett.119.161101 Abbott, B. P., Abbott, R., Abbott, T. D., et al. 2017b, ApJL, 848, L12, doi: 10.3847/2041-8213/aa91c9 Abbott, B. P., Abbott, R., Abbott, T. D., et a...

work page doi:10.1088/1475-7516/2026/03/081 2026

[2] [2]

2002, Machine Learning, 47, 235, doi: 10.1023/A:1013689704352 Banerjee, S., Tanaka, M., Kato, D., et al

https://www.jmlr.org/papers/v3/auer02a.html Auer, P., Cesa-Bianchi, N., & Fischer, P. 2002, Machine Learning, 47, 235, doi: 10.1023/A:1013689704352 Banerjee, S., Tanaka, M., Kato, D., et al. 2022, ApJ, 934, 117, doi: 10.3847/1538-4357/ac7565 Banerjee, S., Tanaka, M., Kawaguchi, K., Kato, D., & Gaigalas, G. 2020, ApJ, 901, 29, doi: 10.3847/1538-4357/abae61...

work page doi:10.1023/a:1013689704352 2002

[3] [3]

Cosmic Explorer: The U.S. Contribution to Gravitational-Wave Astronomy beyond LIGO

http://jmlr.org/papers/v22/19-1028.html Pedersen, C., Font-Ribera, A., Rogers, K. K., et al. 2021, JCAP, 2021(05), 033, doi: 10.1088/1475-7516/2021/05/033 Peng, Y., Risti´ c, M., Kedia, A., et al. 2024, PhRvR, 6, 033078, doi: 10.1103/PhysRevResearch.6.033078 Pian, E., D’Avanzo, P., Benetti, S., et al. 2017, Nature, 551, 67, doi: 10.1038/nature24298 Pognan...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1475-7516/2021/05/033 2021