Knobs and dials of retrieving JWST transmission spectra. II. Impacts of pipeline-level differences on retrieval posteriors
Pith reviewed 2026-05-17 23:33 UTC · model grok-4.3
The pith
Different pipelines applied to the same JWST raw data produce transmission spectra that yield differing atmospheric retrieval posteriors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Retrievals performed on independently reduced transmission spectra from the same JWST NIRSpec PRISM transit observation of WASP-39 b produce different posterior distributions, especially for species whose constraints come from minor spectral features, whereas parameters such as planetary radius and the pressure-temperature profile remain consistent.
What carries the argument
Comparison of posterior distributions from TauREx retrievals on three pipeline-reduced spectra versus randomly scattered versions of a single baseline spectrum.
If this is right
- Species constrained across the full spectrum, such as H2O and CO2, produce stable Gaussian posteriors.
- Weakly constrained species such as CO and CH4 yield uniform posteriors with upper bounds.
- Species constrained by minor features, such as SO2 and C2H2, produce unstable heavy-tailed posteriors.
- Planetary radius and p-T profile parameters stay stable under spectral changes.
- Credible intervals must be chosen carefully to reflect pipeline-induced variations.
Where Pith is reading between the lines
- If pipeline variations persist for other targets, population-level studies of exoplanet atmospheres may accumulate unrecognized systematic offsets.
- Testing retrievals on spectra reduced by additional independent pipelines could quantify how much of the posterior spread is currently underestimated.
- Adopting community standards for reduction choices might narrow the range of reported atmospheric parameters across different analyses.
Load-bearing premise
The differences observed between the three independently reduced spectra and the random perturbations capture the main uncertainties that affect real JWST retrievals.
What would settle it
Running retrievals on additional independent reductions of the same raw JWST data and checking whether the posterior differences remain larger than those produced by random perturbations of a single spectrum.
Figures
read the original abstract
Since the launch of JWST, observations of exoplanetary atmospheres have seen a revolution in data quality. Given that atmospheric parameter inferences depend heavily on the underlying data, a re-evaluation of current methodologies is warranted to assess the reliability of these results. We investigate the impact of variations in input spectra on atmospheric retrievals for the hot Jupiter WASP-39 b using JWST transit data. Specifically, we analyse the reliability of parameter estimations from random perturbations of the underlying spectrum and their sensitivity to three transmission spectra derived from the same observational data. Using the NIRSpec PRISM observation from a single transit of WASP-39 b, we perform retrievals with the TauREx framework. As a baseline, we use a spectrum derived with the Eureka! data reduction pipeline. To evaluate retrieval reliability, we analyse posterior distributions under deviations from this spectrum. We simulate random noise by performing retrievals on scattered instances of this spectrum and compare them with retrievals based on existing spectra reduced from the same raw observation. Our analysis identifies three types of posterior distributions: (1) Stable, Gaussian distributions for species constrained across the entire spectrum (e.g., H2O, CO2); (2) Uniform posteriors with upper bounds for weakly constrained species (e.g., CO, CH4); and (3) Unstable, heavy-tailed posteriors for species constrained by minor spectrum features (e.g., SO2, C2H2). We find that other parameters, such as the planetary radius and p-T profile, are stable under spectral perturbations. Posterior distributions differ for retrievals on independently reduced transmission spectra from the same raw data, complicating interpretation, particularly for skewed distributions. Based on this, we advocate for careful assessment and selection of credible interval sizes to reflect this.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines the impact of data reduction pipeline variations on atmospheric retrieval posteriors for the hot Jupiter WASP-39 b using a single JWST NIRSpec PRISM transit observation. Retrievals are performed with the TauREx framework on a baseline Eureka! reduced spectrum, two other independently reduced spectra from the same raw data, and randomly perturbed realizations of the baseline spectrum. The resulting posteriors are classified into three types: stable Gaussian distributions for well-constrained species (e.g., H2O, CO2), uniform posteriors with upper bounds for weakly constrained species (e.g., CO, CH4), and unstable heavy-tailed posteriors for species constrained by minor features (e.g., SO2, C2H2). The central claim is that posteriors differ across the independently reduced spectra, complicating interpretation especially for skewed distributions, with a recommendation for careful assessment of credible interval sizes.
Significance. If the pipeline-induced posterior differences are shown to be representative of dominant uncertainties, the work would usefully illustrate how reduction choices propagate into retrieval results for JWST data, using real observations and standard code. It provides concrete examples of posterior stability across parameter classes and highlights the value of comparing multiple reductions, which could inform best practices for the field.
major comments (2)
- The central claim that posterior distributions differ across independently reduced spectra and thereby complicate interpretation (particularly for skewed cases) is load-bearing, yet the analysis does not test whether these differences dominate over other sources such as variations in forward-model assumptions (e.g., different opacity sources or P-T profile parametrizations within TauREx). If the latter produce comparable or larger posterior variations, pipeline-level differences would not be the primary complicating factor. This concern is not addressed by the random-perturbation tests alone.
- Abstract: no quantitative metrics are provided on the magnitude of differences between the three reduced spectra (e.g., RMS or bin-by-bin offsets) or on the resulting changes in credible-interval widths or posterior shapes. Without such numbers it is difficult to judge whether the observed posterior variations are practically significant for typical JWST analyses.
minor comments (2)
- The description of the three posterior types would benefit from a summary table listing example species, typical posterior shapes, and which spectral features drive each category.
- Clarify whether the random perturbations preserve the wavelength-dependent noise properties of the actual reduced spectra or assume white noise; this affects how representative the stability tests are.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we have revised the manuscript to address the concerns raised.
read point-by-point responses
-
Referee: The central claim that posterior distributions differ across independently reduced spectra and thereby complicate interpretation (particularly for skewed cases) is load-bearing, yet the analysis does not test whether these differences dominate over other sources such as variations in forward-model assumptions (e.g., different opacity sources or P-T profile parametrizations within TauREx). If the latter produce comparable or larger posterior variations, pipeline-level differences would not be the primary complicating factor. This concern is not addressed by the random-perturbation tests alone.
Authors: We appreciate this insightful comment. Our study is deliberately scoped to isolate the effects of data reduction pipeline variations while holding the forward model fixed in TauREx. This design choice enables a direct assessment of how differences in independently reduced transmission spectra from identical raw data propagate into posteriors. We agree that variations in forward-model assumptions (e.g., opacity sources or P-T parametrizations) represent another important uncertainty source and could be explored in future work. The random-perturbation experiments quantify sensitivity to spectral noise, while the multi-reduction comparison illustrates practical pipeline effects. In the revised manuscript we have added a dedicated paragraph in the discussion clarifying the study's scope and recommending that observers consider multiple reductions alongside other modeling choices. revision: partial
-
Referee: Abstract: no quantitative metrics are provided on the magnitude of differences between the three reduced spectra (e.g., RMS or bin-by-bin offsets) or on the resulting changes in credible-interval widths or posterior shapes. Without such numbers it is difficult to judge whether the observed posterior variations are practically significant for typical JWST analyses.
Authors: We agree that quantitative metrics strengthen the abstract and help readers evaluate practical significance. In the revised manuscript we have updated the abstract to report RMS differences between the three reduced spectra and to give concrete examples of changes in credible-interval widths for species such as SO2 and C2H2. These metrics are also expanded with additional detail in the results section. revision: yes
Circularity Check
No significant circularity; results from direct empirical retrieval comparisons
full rationale
The paper conducts atmospheric retrievals using TauREx on three independently reduced transmission spectra from the same JWST NIRSpec PRISM observation of WASP-39 b, plus retrievals on randomly perturbed versions of one spectrum. The central claim—that posterior distributions differ across these inputs, complicating interpretation for certain species—is presented as the direct outcome of these explicit runs, with no derivations, predictions, or first-principles results that reduce by construction to fitted parameters, self-citations, or ansatzes. No load-bearing self-citation chains, uniqueness theorems, or renamings of known results appear in the provided text. The analysis is self-contained against the performed retrievals and simulated perturbations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Atmospheric retrieval models correctly map transmission spectra to molecular abundances and p-T profiles under the chosen opacity and chemistry assumptions.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We perform retrievals with the TauREx framework... analyse posterior distributions under deviations... three types of posterior distributions: (1) Stable, Gaussian... (2) Uniform... (3) Unstable, heavy-tailed
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We judge model preference on the Bayes factor... corrected Akaike information criterion... reduced χ-square metric
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
HAT-P-70b through the Eyes of MAROON-X: Constraining Elemental Abundances of Metals and Insights on Atmosphere Dynamics
New MAROON-X observations of HAT-P-70b detect multiple neutral and ionized metals with day-to-night wind signatures and demonstrate that ionization-aware retrievals yield abundance ratios closer to solar values except...
Reference graph
Works this paper leans on
-
[1]
Abel, M., Frommhold, L., Li, X., & Hunt, K. L. C. 2011, J. Phys. Chem. A, 115, 6805 Abel, M., Frommhold, L., Li, X., & Hunt, K. L. C. 2012, The Journal of Chemi- cal Physics, 136, 044319 Adam, A. Y ., Yachmenev, A., Yurchenko, S. N., & Jensen, P. 2019, J. Phys. Chem. A, 123, 4755 Ahrer, E.-M., Stevenson, K. B., Mansfield, M., et al. 2023, Nature, 614, 653...
-
[2]
We evaluate the preference of individual models using mul- tiple metrics
This model is extended through an investigation into model preference under variations of the considered molecular species. We evaluate the preference of individual models using mul- tiple metrics. Firstly, we consider the Bayes factor,B m0, be- tween forward models including new molecules, and the base- line model, Bm0 “ Em E0 .(B.1) In this equation,E 0...
work page 1995
-
[3]
If an extended forward model shows lnB m0 ą3 (corresponding to a posterior odds ratio of more than 20:1), we consider it as significant in our selection process and therefore include it in our finalised model setup. Secondly, we compare the corrected Akaike Information Cri- terion (cAIC, henceforth referred to asΨ) values for all models, Ψ“ ´2 logp ˆLq `2...
work page 2009
-
[4]
Article number, page 15 of 20 A&A proofs:manuscript no. main Appendix C: Comparison of marginalised posteriors from scattered instances of SP-TW 1.2 1.3 Rp 3 2 1 log10(CO2) 10 5 log10(CO) 1.5 1.0 0.5 log10(H2O) 10.0 7.5 5.0 log10(CH4) 6 4 2 log10(H2S) 10 5 log10(SO2) 10 5 log10(HCN) 10.0 7.5 5.0 log10(C2H2) 0 2 4 6 log10(pclouds [Pa]) 1000 2000 3000 Tp0 1...
work page 2000
-
[5]
For the background, we fit a second-order polynomial, with an out- lier rejection threshold of 5σ
The background region for this step is defined as being outside ofyP r5; 22s. For the background, we fit a second-order polynomial, with an out- lier rejection threshold of 5σ. We skip therefpixstep, as there are no reference pixels in this sub-array of the detector (Birk- mann et al. 2022), as well as thegain_scalestep, as the rel- ative flux-measurement...
work page 2022
-
[6]
In stage 3, we constrain the spectral data in the dispersion direction within the rangexP r160; 512s
We also skip thephotom (count-rate to flux-density conversion) andextract_1dsteps (1D signal extraction), which are not necessary for our purposes. In stage 3, we constrain the spectral data in the dispersion direction within the rangexP r160; 512s. We do this to exclude the saturated lower end of the spectrum. Saturation in the wave- length region was in...
-
[7]
We use the stellar parameters given in Table 1, and the Staggergrid of stellar models (Magic et al
in- corporated inEureka!. We use the stellar parameters given in Table 1, and the Staggergrid of stellar models (Magic et al. 2015). Appendix D.2: Light-curve fitting We use a combined astrophysical and systematics model to fit both the pixel-resolution spectroscopic light-curves, and the in- tegrated white light-curve resulting from stage 4 ofEureka!. Th...
work page 2015
-
[8]
corresponding to a 4-parameter non- linear limb-darkening law (Claret 2000). We choose the 4- parameter limb-darkening prescription over the commonly em- ployed quadratic limb-darkening law, which has been shown to introduce biases in the retrieved transit depth (e.g. Morello et al. 2017; Keers et al. 2024). Limb-darkening parameters are calcu- lated for ...
work page 2000
-
[9]
For Gaussian priors, we list the mean and standard deviation, Npµ, σ2q
Table D.1.Light-curve fitting parameters Parameter Unit Prior/Value Application Astrophysical model Rp RJ Np0.148,0.015 2qall t0 dUp59770.81,59770.86qwhite idegNp87.83,0.25 2qwhite a R ˚ Np11.4,1 2qwhite Pd 4.0552765 fixed e– 0 fixed ω– 90 fixed Systematics model c0 –Np1,0.05 2qall c1 –Np0,0.01 2qall c2 –Np0,0.01 2qall Notes.For the application of our com...
work page 2018
-
[10]
We fit for a total of seven free parameters in the case of the inte- grated white-light curve
in time to the median-normalised spectroscopic and in- tegrated white-light curves to produce a combined model. We fit for a total of seven free parameters in the case of the inte- grated white-light curve. For the spectroscopic light-curves, we fit four free parameters, assuming that the inclination,i, time of inferior conjunction,t 0, and scaled semi-ma...
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.