pith. machine review for the scientific record. sign in

arxiv: 2511.05652 · v1 · submitted 2025-11-07 · 🌌 astro-ph.EP · astro-ph.IM

Knobs and dials of retrieving JWST transmission spectra. II. Impacts of pipeline-level differences on retrieval posteriors

Pith reviewed 2026-05-17 23:33 UTC · model grok-4.3

classification 🌌 astro-ph.EP astro-ph.IM
keywords JWSTtransmission spectraatmospheric retrievalWASP-39 bdata reduction pipelinesexoplanet atmospheresNIRSpec
0
0 comments X

The pith

Different pipelines applied to the same JWST raw data produce transmission spectra that yield differing atmospheric retrieval posteriors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether atmospheric retrieval results depend on the specific data reduction steps used to create a transmission spectrum from JWST observations. It runs retrievals with the TauREx code on three independently reduced versions of the NIRSpec PRISM spectrum for the hot Jupiter WASP-39 b and compares them to retrievals on randomly perturbed copies of one baseline spectrum. Some parameters produce stable, Gaussian posteriors while others show large shifts, upper bounds, or heavy tails that change with the input spectrum. A reader cares because these variations mean that reported abundances and other properties can depend on processing choices that are not yet standardized across the community.

Core claim

Retrievals performed on independently reduced transmission spectra from the same JWST NIRSpec PRISM transit observation of WASP-39 b produce different posterior distributions, especially for species whose constraints come from minor spectral features, whereas parameters such as planetary radius and the pressure-temperature profile remain consistent.

What carries the argument

Comparison of posterior distributions from TauREx retrievals on three pipeline-reduced spectra versus randomly scattered versions of a single baseline spectrum.

If this is right

  • Species constrained across the full spectrum, such as H2O and CO2, produce stable Gaussian posteriors.
  • Weakly constrained species such as CO and CH4 yield uniform posteriors with upper bounds.
  • Species constrained by minor features, such as SO2 and C2H2, produce unstable heavy-tailed posteriors.
  • Planetary radius and p-T profile parameters stay stable under spectral changes.
  • Credible intervals must be chosen carefully to reflect pipeline-induced variations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If pipeline variations persist for other targets, population-level studies of exoplanet atmospheres may accumulate unrecognized systematic offsets.
  • Testing retrievals on spectra reduced by additional independent pipelines could quantify how much of the posterior spread is currently underestimated.
  • Adopting community standards for reduction choices might narrow the range of reported atmospheric parameters across different analyses.

Load-bearing premise

The differences observed between the three independently reduced spectra and the random perturbations capture the main uncertainties that affect real JWST retrievals.

What would settle it

Running retrievals on additional independent reductions of the same raw JWST data and checking whether the posterior differences remain larger than those produced by random perturbations of a single spectrum.

Figures

Figures reproduced from arXiv: 2511.05652 by Aiko Voigt, Ingo Waldmann, Manuel G\"udel, Quentin Changeat, Simon Schleich, Sudeshna Boro Saikia.

Figure 1
Figure 1. Figure 1: Comparison of transmission spectra used in this work. Data associated with the spectrum produced in this work (SP-TW) are shown in black, while data associated with Rustamkulov et al. (2023) (RU-23) and Carter & May et al. (2024) (CA-24) are shown in red and blue, respectively. (a) Transmission spectra, showing wavelength (in µm) on the x-axis and transit depth (in %) on the y-axis. (b) Residual distributi… view at source ↗
Figure 2
Figure 2. Figure 2: Retrieval results of the fiducial model applied to SP-TW (the spectrum produced in our work), showing the posterior distributions of the molecular mixing ratios. Marginalised posterior distributions (main diagonal) show the parameter estimate median (points) and CCI95 (error bar). The inset plot on the top right shows the median retrieved p-T profile (dashed line) and CCI95 (shaded region). We find strong … view at source ↗
Figure 3
Figure 3. Figure 3: Transmission spectrum with model fit solution from model tuning process. Both panels show wavelength (in µm) on the x-axis against transit depth (in %) on the y-axis, as well as the data points and error bars (grey) from the spectrum produced in this work (SP-TW). (Top) Median model solution (solid black line) and corresponding 95% CCI (shaded area). (Bottom) Contributions of individual molecular opacity s… view at source ↗
Figure 4
Figure 4. Figure 4: Posterior distributions of select forward model parameters for at￾mospheric retrievals on scattered instances of SP-TW, showing inferred parameter values (x-axis) against weighted counts (y-axis) in all pan￾els. The parameter posterior distributions are the VMRs of CO2 (top left), CO (top right), and SO2 (bottom right). Marginalised posteriors and CCIs from the initial instance of SP-TW are shown in black,… view at source ↗
Figure 5
Figure 5. Figure 5: Results of atmospheric retrieval performed on three transmission spectra derived from the same observation. Results achieved with SP-TW (the spectrum produced in our work), as well as with RU-23 and CA-24 are shown in black, red, and blue, respectively. (Left) The grid of smaller panels shows the marginalised posterior distributions of the molecular mixing ratios and cloud-top pressure (Right) Retrieved 4-… view at source ↗
read the original abstract

Since the launch of JWST, observations of exoplanetary atmospheres have seen a revolution in data quality. Given that atmospheric parameter inferences depend heavily on the underlying data, a re-evaluation of current methodologies is warranted to assess the reliability of these results. We investigate the impact of variations in input spectra on atmospheric retrievals for the hot Jupiter WASP-39 b using JWST transit data. Specifically, we analyse the reliability of parameter estimations from random perturbations of the underlying spectrum and their sensitivity to three transmission spectra derived from the same observational data. Using the NIRSpec PRISM observation from a single transit of WASP-39 b, we perform retrievals with the TauREx framework. As a baseline, we use a spectrum derived with the Eureka! data reduction pipeline. To evaluate retrieval reliability, we analyse posterior distributions under deviations from this spectrum. We simulate random noise by performing retrievals on scattered instances of this spectrum and compare them with retrievals based on existing spectra reduced from the same raw observation. Our analysis identifies three types of posterior distributions: (1) Stable, Gaussian distributions for species constrained across the entire spectrum (e.g., H2O, CO2); (2) Uniform posteriors with upper bounds for weakly constrained species (e.g., CO, CH4); and (3) Unstable, heavy-tailed posteriors for species constrained by minor spectrum features (e.g., SO2, C2H2). We find that other parameters, such as the planetary radius and p-T profile, are stable under spectral perturbations. Posterior distributions differ for retrievals on independently reduced transmission spectra from the same raw data, complicating interpretation, particularly for skewed distributions. Based on this, we advocate for careful assessment and selection of credible interval sizes to reflect this.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper examines the impact of data reduction pipeline variations on atmospheric retrieval posteriors for the hot Jupiter WASP-39 b using a single JWST NIRSpec PRISM transit observation. Retrievals are performed with the TauREx framework on a baseline Eureka! reduced spectrum, two other independently reduced spectra from the same raw data, and randomly perturbed realizations of the baseline spectrum. The resulting posteriors are classified into three types: stable Gaussian distributions for well-constrained species (e.g., H2O, CO2), uniform posteriors with upper bounds for weakly constrained species (e.g., CO, CH4), and unstable heavy-tailed posteriors for species constrained by minor features (e.g., SO2, C2H2). The central claim is that posteriors differ across the independently reduced spectra, complicating interpretation especially for skewed distributions, with a recommendation for careful assessment of credible interval sizes.

Significance. If the pipeline-induced posterior differences are shown to be representative of dominant uncertainties, the work would usefully illustrate how reduction choices propagate into retrieval results for JWST data, using real observations and standard code. It provides concrete examples of posterior stability across parameter classes and highlights the value of comparing multiple reductions, which could inform best practices for the field.

major comments (2)
  1. The central claim that posterior distributions differ across independently reduced spectra and thereby complicate interpretation (particularly for skewed cases) is load-bearing, yet the analysis does not test whether these differences dominate over other sources such as variations in forward-model assumptions (e.g., different opacity sources or P-T profile parametrizations within TauREx). If the latter produce comparable or larger posterior variations, pipeline-level differences would not be the primary complicating factor. This concern is not addressed by the random-perturbation tests alone.
  2. Abstract: no quantitative metrics are provided on the magnitude of differences between the three reduced spectra (e.g., RMS or bin-by-bin offsets) or on the resulting changes in credible-interval widths or posterior shapes. Without such numbers it is difficult to judge whether the observed posterior variations are practically significant for typical JWST analyses.
minor comments (2)
  1. The description of the three posterior types would benefit from a summary table listing example species, typical posterior shapes, and which spectral features drive each category.
  2. Clarify whether the random perturbations preserve the wavelength-dependent noise properties of the actual reduced spectra or assume white noise; this affects how representative the stability tests are.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.

read point-by-point responses
  1. Referee: The central claim that posterior distributions differ across independently reduced spectra and thereby complicate interpretation (particularly for skewed cases) is load-bearing, yet the analysis does not test whether these differences dominate over other sources such as variations in forward-model assumptions (e.g., different opacity sources or P-T profile parametrizations within TauREx). If the latter produce comparable or larger posterior variations, pipeline-level differences would not be the primary complicating factor. This concern is not addressed by the random-perturbation tests alone.

    Authors: We appreciate this insightful comment. Our study is deliberately scoped to isolate the effects of data reduction pipeline variations while holding the forward model fixed in TauREx. This design choice enables a direct assessment of how differences in independently reduced transmission spectra from identical raw data propagate into posteriors. We agree that variations in forward-model assumptions (e.g., opacity sources or P-T parametrizations) represent another important uncertainty source and could be explored in future work. The random-perturbation experiments quantify sensitivity to spectral noise, while the multi-reduction comparison illustrates practical pipeline effects. In the revised manuscript we have added a dedicated paragraph in the discussion clarifying the study's scope and recommending that observers consider multiple reductions alongside other modeling choices. revision: partial

  2. Referee: Abstract: no quantitative metrics are provided on the magnitude of differences between the three reduced spectra (e.g., RMS or bin-by-bin offsets) or on the resulting changes in credible-interval widths or posterior shapes. Without such numbers it is difficult to judge whether the observed posterior variations are practically significant for typical JWST analyses.

    Authors: We agree that quantitative metrics strengthen the abstract and help readers evaluate practical significance. In the revised manuscript we have updated the abstract to report RMS differences between the three reduced spectra and to give concrete examples of changes in credible-interval widths for species such as SO2 and C2H2. These metrics are also expanded with additional detail in the results section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results from direct empirical retrieval comparisons

full rationale

The paper conducts atmospheric retrievals using TauREx on three independently reduced transmission spectra from the same JWST NIRSpec PRISM observation of WASP-39 b, plus retrievals on randomly perturbed versions of one spectrum. The central claim—that posterior distributions differ across these inputs, complicating interpretation for certain species—is presented as the direct outcome of these explicit runs, with no derivations, predictions, or first-principles results that reduce by construction to fitted parameters, self-citations, or ansatzes. No load-bearing self-citation chains, uniqueness theorems, or renamings of known results appear in the provided text. The analysis is self-contained against the performed retrievals and simulated perturbations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis relies on the standard assumptions of the TauREx retrieval framework and on the premise that random perturbations and pipeline differences are representative of real data-reduction uncertainty. No new free parameters, axioms, or invented entities are introduced beyond those already present in existing atmospheric retrieval codes.

axioms (1)
  • domain assumption Atmospheric retrieval models correctly map transmission spectra to molecular abundances and p-T profiles under the chosen opacity and chemistry assumptions.
    Invoked when interpreting all posterior distributions as physically meaningful.

pith-pipeline@v0.9.0 · 5655 in / 1307 out tokens · 50049 ms · 2026-05-17T23:33:15.678508+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. HAT-P-70b through the Eyes of MAROON-X: Constraining Elemental Abundances of Metals and Insights on Atmosphere Dynamics

    astro-ph.EP 2026-05 conditional novelty 6.0

    New MAROON-X observations of HAT-P-70b detect multiple neutral and ionized metals with day-to-night wind signatures and demonstrate that ionization-aware retrievals yield abundance ratios closer to solar values except...

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages · cited by 1 Pith paper

  1. [1]

    Abel, M., Frommhold, L., Li, X., & Hunt, K. L. C. 2011, J. Phys. Chem. A, 115, 6805 Abel, M., Frommhold, L., Li, X., & Hunt, K. L. C. 2012, The Journal of Chemi- cal Physics, 136, 044319 Adam, A. Y ., Yachmenev, A., Yurchenko, S. N., & Jensen, P. 2019, J. Phys. Chem. A, 123, 4755 Ahrer, E.-M., Stevenson, K. B., Mansfield, M., et al. 2023, Nature, 614, 653...

  2. [2]

    We evaluate the preference of individual models using mul- tiple metrics

    This model is extended through an investigation into model preference under variations of the considered molecular species. We evaluate the preference of individual models using mul- tiple metrics. Firstly, we consider the Bayes factor,B m0, be- tween forward models including new molecules, and the base- line model, Bm0 “ Em E0 .(B.1) In this equation,E 0...

  3. [3]

    If an extended forward model shows lnB m0 ą3 (corresponding to a posterior odds ratio of more than 20:1), we consider it as significant in our selection process and therefore include it in our finalised model setup. Secondly, we compare the corrected Akaike Information Cri- terion (cAIC, henceforth referred to asΨ) values for all models, Ψ“ ´2 logp ˆLq `2...

  4. [4]

    Article number, page 15 of 20 A&A proofs:manuscript no. main Appendix C: Comparison of marginalised posteriors from scattered instances of SP-TW 1.2 1.3 Rp 3 2 1 log10(CO2) 10 5 log10(CO) 1.5 1.0 0.5 log10(H2O) 10.0 7.5 5.0 log10(CH4) 6 4 2 log10(H2S) 10 5 log10(SO2) 10 5 log10(HCN) 10.0 7.5 5.0 log10(C2H2) 0 2 4 6 log10(pclouds [Pa]) 1000 2000 3000 Tp0 1...

  5. [5]

    For the background, we fit a second-order polynomial, with an out- lier rejection threshold of 5σ

    The background region for this step is defined as being outside ofyP r5; 22s. For the background, we fit a second-order polynomial, with an out- lier rejection threshold of 5σ. We skip therefpixstep, as there are no reference pixels in this sub-array of the detector (Birk- mann et al. 2022), as well as thegain_scalestep, as the rel- ative flux-measurement...

  6. [6]

    In stage 3, we constrain the spectral data in the dispersion direction within the rangexP r160; 512s

    We also skip thephotom (count-rate to flux-density conversion) andextract_1dsteps (1D signal extraction), which are not necessary for our purposes. In stage 3, we constrain the spectral data in the dispersion direction within the rangexP r160; 512s. We do this to exclude the saturated lower end of the spectrum. Saturation in the wave- length region was in...

  7. [7]

    We use the stellar parameters given in Table 1, and the Staggergrid of stellar models (Magic et al

    in- corporated inEureka!. We use the stellar parameters given in Table 1, and the Staggergrid of stellar models (Magic et al. 2015). Appendix D.2: Light-curve fitting We use a combined astrophysical and systematics model to fit both the pixel-resolution spectroscopic light-curves, and the in- tegrated white light-curve resulting from stage 4 ofEureka!. Th...

  8. [8]

    corresponding to a 4-parameter non- linear limb-darkening law (Claret 2000). We choose the 4- parameter limb-darkening prescription over the commonly em- ployed quadratic limb-darkening law, which has been shown to introduce biases in the retrieved transit depth (e.g. Morello et al. 2017; Keers et al. 2024). Limb-darkening parameters are calcu- lated for ...

  9. [9]

    For Gaussian priors, we list the mean and standard deviation, Npµ, σ2q

    Table D.1.Light-curve fitting parameters Parameter Unit Prior/Value Application Astrophysical model Rp RJ Np0.148,0.015 2qall t0 dUp59770.81,59770.86qwhite idegNp87.83,0.25 2qwhite a R ˚ Np11.4,1 2qwhite Pd 4.0552765 fixed e– 0 fixed ω– 90 fixed Systematics model c0 –Np1,0.05 2qall c1 –Np0,0.01 2qall c2 –Np0,0.01 2qall Notes.For the application of our com...

  10. [10]

    We fit for a total of seven free parameters in the case of the inte- grated white-light curve

    in time to the median-normalised spectroscopic and in- tegrated white-light curves to produce a combined model. We fit for a total of seven free parameters in the case of the inte- grated white-light curve. For the spectroscopic light-curves, we fit four free parameters, assuming that the inclination,i, time of inferior conjunction,t 0, and scaled semi-ma...