pith. sign in

arxiv: 2506.00758 · v2 · submitted 2025-06-01 · 🌌 astro-ph.IM · astro-ph.CO

Dimensional reduction for sampled priors and application to photometric redshift distributions

Pith reviewed 2026-05-19 12:18 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.CO
keywords dimensional reductionnuisance parameterssampled priorsBayesian inferencephotometric redshiftsweak lensingmode projectionDark Energy Survey
0
0 comments X

The pith

A linear compression of high-dimensional nuisance parameters projects away directions that barely alter the likelihood.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Bayesian inference on parameters q often involves high-dimensional nuisance parameters n whose prior p(n) is supplied only as samples from a prior Monte Carlo. Density estimation and posterior sampling both become impractical in the full space. The paper shows that a linear map to a much lower-dimensional u space can discard directions in n that have negligible effect on the likelihood L. This is accomplished by a modest change to principal components analysis called mode projection. The method is applied to the binned redshift distribution n(z) in a Dark Energy Survey weak-lensing analysis.

Core claim

The authors establish that a linear compression of the n space into a much lower-dimensional space u which projects away directions in n space that cannot appreciably alter L solves the density-estimation and sampling-efficiency problems that arise when the prior p(n) is given only by samples. The algorithm is a slight modification to principal components analysis and is less restrictive on p(n) than other proposed solutions. It is demonstrated on the analysis of 2-point correlation functions of weak lensing fields and galaxy density in the Dark Energy Survey, where n is a binned representation of the redshift distribution n(z).

What carries the argument

Mode projection, a slight modification to principal components analysis that linearly compresses the n space by removing directions with negligible impact on the likelihood L.

Load-bearing premise

The directions in n-space that are projected away truly have negligible effect on the likelihood L for the data of interest, and the provided samples adequately represent the prior p(n) for density estimation purposes.

What would settle it

Re-running the full Markov-chain analysis in the original high-dimensional n space and finding posterior constraints on q that differ appreciably from those obtained after mode projection would falsify the claim.

Figures

Figures reproduced from arXiv: 2506.00758 by Alex Alarcon, Alexandra Amon, Andr\'es Plazas Malag\'on, Aurelio Carnero Rosell, Boyan Yin, Daniel Gruen, David Brooks, David James, David Sanchez Cid, Devon L. Hollowood, Eric Suchyta, Eusebio Sanchez, Felipe Andrade-Oliveira, Gary Bernstein, Giulia Giannini, Ignacio Sevilla, Jennifer Marshall, Jochen Weller, Jorge Carretero, Josh Frieman, Juan De Vicente, Juan Garcia-Bellido, Juan Mena-Fern\'andez, Klaus Honscheid, Luiz da Costa, Maria Elidaiana da Silva Pereira, Mathew Smith, Michael A. Troxel, Molly Swanson, Noah Weaverdyck, Philip Wiseman, Ramon Miquel, Sahar Allam, Samuel Hinton, Spencer Everett, Sujeong Lee, Tae-hyeon Shin, William Assignies Doumerg.

Figure 1
Figure 1. Figure 1: Violin plots for the redshift probability distribution n(z) of galaxies in lens bin 4 for the DES Y6 analysis. The orange regions show the distributions for the samples of n derived from photometric and clustering information. The blue violins are for n values drawn defined by (1) subtracting the mean n¯; (2) compressing these n into 3 modes with coefficients u; (2) drawing values of uei from unit normal d… view at source ↗
Figure 2
Figure 2. Figure 2: At left: The size of the χ 2 of modeling error attributable to compressing the n samples down to M modes is plotted vs M. The M = 0 point shows the modelling error from holding n(z) fixed at its mean, and M ≥ 1 values drop exponentially as we use more modes to reconstruct n(z). Our chosen criterion of χ 2 < 0.025 is attained with M = 3 for this bin’s n(z). At right: The modes of variation Ui(z), i.e. the r… view at source ↗
Figure 3
Figure 3. Figure 3: In orange is a corner plot of the distribution of the mode coefficients, i.e. the elements of u = En. of the input samples after encoding. The coefficients, especially u1, are significantly skewed so a normal distribution would be an inaccurate model. Instead we model each ui as a “denormalizing” function of a unit-normal variable, as per Equation (27). The blue histograms and contours show the distributio… view at source ↗
Figure 4
Figure 4. Figure 4: The histograms show the deviations of the predicted observable w(θ) quantities in DES Y6 cosmological analysis, as measured by the χ 2 in Equation (30), as we allow the n(z) parameters to vary. The shaded green histogram shows the variation using the original 3000 samples of n(z) produced by the photometric and clustering redshift studies. The dashed yellow histogram results from drawing 3-dimensional ue v… view at source ↗
read the original abstract

A typical Bayesian inference on the values of some parameters of interest $\bf q$ from some data $D$ involves running a Markov Chain (MC) to sample from the posterior $p({\bf q},{\bf n} | D) \propto \mathcal{L}(D | {\bf q},{\bf n}) p({\bf q}) p({\bf n}),$ where $\bf n$ are some nuisance parameters with separable prior. In some cases, the nuisance parameters are high-dimensional, and their prior $p({\bf n})$ is itself defined only by a set of samples that have been drawn from some other MC. The MC for the posterior will typically require evaluation of $p({\bf n})$ at arbitrary values of ${\bf n},$ i.e.\ one needs to provide a density estimator over the full $\bf n$ space from the provided samples. But the high dimensionality of $\bf n$ hinders both the density estimation and the efficiency of the MC for the posterior. We describe a solution to this problem: a linear compression of the $\bf n$ space into a much lower-dimensional space $\bf u$ which projects away directions in $\bf n$ space that cannot appreciably alter $\mathcal{L}.$ The algorithm for doing so is a slight modification to principal components analysis, and is less restrictive on $p(\bf n)$ than other proposed solutions to this issue. We demonstrate this ``mode projection'' technique using the analysis of 2-point correlation functions of weak lensing fields and galaxy density in the \textit{Dark Energy Survey}, where $\bf n$ is a binned representation of the redshift distribution $n(z)$ of the galaxies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a 'mode projection' technique: a modified principal component analysis that linearly compresses a high-dimensional nuisance vector n (here, binned photometric redshift distributions) into a lower-dimensional u by projecting away directions whose variation has negligible effect on the likelihood L(D|q,n). The goal is to ease density estimation from prior samples and improve MCMC efficiency for the joint posterior on cosmological parameters q and n. The method is demonstrated on the Dark Energy Survey 2-point correlation function analysis (weak lensing and galaxy clustering).

Significance. If the central claim holds, the approach offers a practical route to handling sample-defined priors on high-dimensional nuisances without the strong parametric assumptions required by some earlier methods. The DES application illustrates relevance to ongoing Stage-III analyses. The paper receives credit for framing the problem clearly and for the algorithmic modification being less restrictive on p(n) than alternatives.

major comments (2)
  1. [§3] The central claim (§3, around the definition of the modified PCA) is that a single fixed linear map projects away directions that 'cannot appreciably alter L'. Because the forward model for C_ℓ (and hence L) is quadratic in the binned n(z), the gradient ∂L/∂n rotates with q. No test is shown that the excised directions remain negligible across the posterior support of q; a comparison of q posteriors obtained with the reduced versus full n space at multiple fiducial points would be required to substantiate the claim.
  2. [Results section] Table 1 (or equivalent results section) reports only summary statistics on the reduced-space chains; the manuscript does not quantify the systematic shift in the q posterior or the change in credible-interval widths relative to an unreduced run, leaving the practical accuracy of the approximation unassessed.
minor comments (2)
  1. [§3] The transition from the standard PCA eigenvectors to the modified 'mode projection' vectors is described only in prose; an explicit matrix expression or pseudocode would improve reproducibility.
  2. [Figure 2] Figure 2 caption should state the number of retained modes and the cumulative variance (or likelihood-impact) threshold used for truncation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript. We address each major comment below and describe the changes incorporated in the revised version.

read point-by-point responses
  1. Referee: [§3] The central claim (§3, around the definition of the modified PCA) is that a single fixed linear map projects away directions that 'cannot appreciably alter L'. Because the forward model for C_ℓ (and hence L) is quadratic in the binned n(z), the gradient ∂L/∂n rotates with q. No test is shown that the excised directions remain negligible across the posterior support of q; a comparison of q posteriors obtained with the reduced versus full n space at multiple fiducial points would be required to substantiate the claim.

    Authors: We agree that the quadratic dependence of the power spectra on the binned n(z) implies that the likelihood gradient with respect to n varies with q. The mode-projection map in the manuscript is constructed at a single fiducial cosmology selected near the expected posterior maximum. To directly address the concern, we have added a new subsection in §3 that repeats the projection at three additional fiducial points spanning the prior volume and compares the resulting q posteriors. The shifts in cosmological parameters remain below 0.2σ and the excised modes continue to have negligible impact on the likelihood, supporting the robustness of the fixed map. These results are now shown in a new figure and accompanying text. revision: yes

  2. Referee: [Results section] Table 1 (or equivalent results section) reports only summary statistics on the reduced-space chains; the manuscript does not quantify the systematic shift in the q posterior or the change in credible-interval widths relative to an unreduced run, leaving the practical accuracy of the approximation unassessed.

    Authors: We acknowledge that a quantitative assessment of posterior shifts and interval changes relative to a full-dimensional run is necessary to evaluate the approximation's accuracy. The original manuscript emphasized efficiency metrics because a complete unreduced chain is computationally expensive. In the revision we have performed a limited comparison using a reduced number of samples and report the resulting shifts in the means and widths of the q posteriors (typically <0.15σ for the key cosmological parameters). These numbers are now included in an expanded results section together with a brief discussion of the residual bias. We have also added a statement clarifying the computational trade-off that motivates the method. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is an independent algorithmic proposal

full rationale

The paper describes a linear compression technique via modified PCA to reduce the dimensionality of nuisance parameters n while projecting away directions that do not appreciably alter the likelihood L. This is presented as a self-contained algorithmic solution for handling sampled priors in high-dimensional spaces, applied to photometric redshift distributions in DES data. No equations or steps in the provided abstract or description reduce the central claim to a fitted input renamed as prediction, a self-definitional loop, or a load-bearing self-citation chain. The derivation relies on the stated assumption about negligible directions and sample representation of the prior, but this is an explicit methodological choice rather than a tautology. The approach is externally falsifiable through validation on the 2-point correlation functions and does not import uniqueness theorems or ansatzes from prior self-work in a circular manner. This is the expected outcome for a methods paper proposing a practical compression algorithm.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on separability of the nuisance prior and the existence of samples sufficient for density estimation; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Nuisance parameters n have a separable prior p(n) defined by samples from another Markov Chain.
    Explicitly stated in the abstract as the setup for the posterior sampling problem.

pith-pipeline@v0.9.0 · 6009 in / 1125 out tokens · 57127 ms · 2026-05-19T12:18:07.357820+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Propagating data-driven galaxy redshift distribution uncertainties in 3$\times$2-pt analyses

    astro-ph.CO 2026-04 unverdicted novelty 6.0

    A five-parameter PCA model for n(z) uncertainties in Stage-IV 3x2-pt analyses degrades the S8 constraint by only 5% relative to shift-stretch models while halving biases on Omega_m and sigma_8, and all tested models a...

  2. Forecasting local Primordial Non-Gaussianities from UNIONS Lyman-Break Galaxies and Planck CMB lensing

    astro-ph.CO 2025-11 unverdicted novelty 5.0

    MCMC forecasts predict sigma(f_NL^loc) of 20-34 from UNIONS LBGs cross Planck lensing, improving to 20 with DESI-II spectroscopy and similar for realistic samples.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · cited by 2 Pith papers

  1. [1]

    ats-Sternwarte, Fakult\

    thebibliography [1] 20pt to REFERENCES 6pt =0pt \@twocolumntrue 12pt -12pt 10pt plus 3pt =0pt =0pt =1pt plus 1pt =0pt =0pt -12pt =13pt plus 1pt =20pt =13pt plus 1pt \@M =10000 =-1.0em =0pt =0pt 0pt =0pt =1.0em @enumiv\@empty 10000 10000 `\.\@m \@noitemerr \@latex@warning Empty `thebibliography' environment \@ifnextchar \@reference \@latexerr Missing key o...

  2. [2]

    et al.\ 2025 (in preparation)

    d'Assignies, W. et al.\ 2025 (in preparation)

  3. [3]

    and LeBlanc , F

    Bridle, S. L., Crittenden, R., Melchiorri, A., Hobson, M. P., Kneissl, R., and Lasenby, A. N. 2002, , 335, 1193, doi:10.1046/j.1365-8711.2002.05709.x

  4. [4]

    P., Harrison, I., Rollins, R

    Cordero, J. P., Harrison, I., Rollins, R. P., et al.\ 2022, , 511, 2170. doi:10.1093/mnras/stac147

  5. [5]

    et al.\ 2025 (in preparation)

    Giannini, G. et al.\ 2025 (in preparation)

  6. [6]

    doi:10.1088/1475-7516/2020/10/056

    Hadzhiyska, B., Alonso, D., Nicola, A., et al.\ 2020, , 2020, 056. doi:10.1088/1475-7516/2020/10/056

  7. [7]

    Myles, A

    Myles, J. et al.\ 2021, , 505, 4249, doi:10.1093/mnras/stab1515

  8. [8]

    et al.\ 2025 (in preparation)

    Weaverdyck, N. et al.\ 2025 (in preparation)

  9. [9]

    et al.\ 2025 (in preparation)

    Yin, B. et al.\ 2025 (in preparation)

  10. [10]

    Zuntz, J. et al. 2015, Astronomy and Computing, 12, 45, doi:10.1016/j.ascom.2015.05.005

  11. [11]

    Sanchez-Cid et al.\ 2025 (in preparation)