pith. sign in

arxiv: 2606.00847 · v1 · pith:MUTZJFNPnew · submitted 2026-05-30 · 📊 stat.ME

Partial Identification under High-Dimensional Potential Outcomes and Confounders via Optimal Transport

Pith reviewed 2026-06-28 18:09 UTC · model grok-4.3

classification 📊 stat.ME
keywords partial identificationoptimal transporthigh-dimensional datacausal boundssliced Wasserstein distancepotential outcomesconfounders
0
0 comments X

The pith

Splitting the optimal transport problem into a low-dimensional signal subspace and recovering residual energy with sliced Wasserstein distance produces tighter causal bounds in high dimensions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to overcome the curse of dimensionality that makes optimal transport methods for partial identification of causal effects intractable when both potential outcomes and confounders are high-dimensional. Classical approaches either become computationally infeasible or rely on projections that discard residual information and lose transport energy. The proposed estimator decomposes the transport problem so that a data-driven rule isolates the signal subspace while the sliced Wasserstein distance recovers energy in the residual subspace. This yields more informative bounds than projection-only baselines and stays tractable. Interpretable conditions on residual structure control the resulting approximation gap.

Core claim

The estimator decomposes the transport problem into a low-dimensional signal subspace and a high-dimensional residual subspace. Unlike projection-based methods that discard residual information, it recovers the residual transport energy using the Sliced Wasserstein distance. The approach establishes interpretable conditions controlling the approximation gap based on residual structure and supplies a data-driven rule for signal dimension selection, yielding more informative causal bounds while remaining computationally tractable in high dimensions.

What carries the argument

Decomposition of the transport problem into low-dimensional signal subspace and high-dimensional residual subspace, with sliced Wasserstein distance recovering residual transport energy.

If this is right

  • The estimator yields more informative causal bounds by recovering lost transport energy.
  • The method remains computationally tractable when both potential outcomes and confounders are high-dimensional.
  • Conditions on residual structure keep the approximation gap under explicit control.
  • A data-driven rule selects signal dimension without invalidating the causal bounds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition might be applied to other optimal-transport-based estimators that currently rely on full-dimensional computation.
  • Empirical checks on real datasets with varying residual correlation patterns would show the range of settings where the recovered energy meaningfully tightens bounds.
  • Hybrid use with existing projection techniques could further reduce computation while preserving the residual recovery step.

Load-bearing premise

The residual structure admits interpretable conditions that control the approximation gap, and a data-driven rule can select the signal dimension without invalidating the resulting causal bounds.

What would settle it

Run the estimator on simulated high-dimensional data where the true partial identification interval is known and check whether the recovered bounds are strictly narrower than those from projection baselines while preserving valid coverage.

Figures

Figures reproduced from arXiv: 2606.00847 by Yunfeng Wang, Zhiheng Zhang, Zijun Gao.

Figure 1
Figure 1. Figure 1: Estimator comparison across subspace dimensions. Synthetic Gaussian pushforward with d = 30, signal rank r = 5, and anisotropy α ∈ {0, 1.0, 1.5}. Curves show WPP, CSS, the full SW baseline, and the analytic W2 2 ground truth across candidate dimensions k ⋆ . The selected k ⋆ is chosen by the WPP plateau rule. CSS stays closer to the true W2 2 than projection-only WPP by adding a sliced residual correction … view at source ↗
Figure 2
Figure 2. Figure 2: Finite-sample subspace recovery at a fixed dimension. We fix k ⋆ = 5 and report the projection error ∥Ωb − Ω ⋆ ∥F as a function of the sample size n for residual decay profiles α ∈ {0, 1.0, 1.5}, with residual energy fraction ρ = 0.1. Curves show the median across trials, and shaded bands show quantile ranges. Optimization uses RBCD given by (Huang et al., 2021). 4 Experiments In this section, we present n… view at source ↗
Figure 3
Figure 3. Figure 3: Real-data RHC comparison of WPP, CSS, and the full scaled sliced-Wasserstein baseline across subspace dimensions [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

Partial identification provides informative causal guarantees when point identification is impossible, but existing approaches based on optimal transport (OT) become computationally and statistically intractable in high-dimensional settings. This limitation is particularly severe when both potential outcomes and confounders are high-dimensional, where classical OT-based bounds suffer from the curse of dimensionality and unfavorable convergence rates. To address this challenge, we propose a novel estimator that decomposes the transport problem into a low-dimensional signal subspace and a high-dimensional residual subspace. Unlike existing projection-based methods that discard residual information, we recover the residual transport energy using the Sliced Wasserstein distance, which is computationally efficient and robust to high dimensions. We establish interpretable conditions controlling the approximation gap based on residual structure and provide a data-driven rule for signal dimension selection. Empirical results show that our estimator consistently outperforms projection-only baselines by recovering lost transport energy, yielding more informative causal bounds while remaining computationally tractable in high dimensions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims that existing OT-based partial identification methods suffer from the curse of dimensionality when both potential outcomes and confounders are high-dimensional. It proposes an estimator that decomposes the transport problem into a low-dimensional signal subspace (chosen via a data-driven rule) and a residual subspace whose transport energy is recovered via the Sliced Wasserstein distance. Interpretable conditions on residual structure are said to control the approximation gap, yielding tighter causal bounds than projection-only baselines while remaining computationally tractable.

Significance. If the post-selection validity of the resulting bounds can be established, the approach would offer a practical route to informative partial identification in high-dimensional settings where classical OT methods fail. The use of sliced Wasserstein to recover residual information is a concrete technical contribution that could be useful beyond this specific application.

major comments (1)
  1. [Abstract / estimator construction] Abstract and the description of the estimator: the data-driven rule for selecting the signal dimension introduces a dependence between the chosen subspace and the observed sample. The manuscript must establish that the partial-identification property holds uniformly over the (random) choice of dimension; without such a uniformity argument the validity guarantees for the final bounds are at risk of being invalidated by the adaptive selection step.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the importance of post-selection validity. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract / estimator construction] Abstract and the description of the estimator: the data-driven rule for selecting the signal dimension introduces a dependence between the chosen subspace and the observed sample. The manuscript must establish that the partial-identification property holds uniformly over the (random) choice of dimension; without such a uniformity argument the validity guarantees for the final bounds are at risk of being invalidated by the adaptive selection step.

    Authors: We agree that the data-driven selection of the signal dimension creates a dependence on the sample and that a uniformity argument is required to guarantee that the partial-identification property continues to hold. The manuscript currently states the selection rule and the conditions controlling the approximation gap but does not supply an explicit post-selection guarantee. In the revision we will add a theorem establishing uniform validity of the bounds over the random choice of dimension. The argument will bound the additional error induced by the selection rule (e.g., a threshold on the leading singular values) under the same residual-structure conditions already used to control the sliced-Wasserstein approximation gap, thereby preserving the validity of the final causal bounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces a decomposition of the OT problem into signal subspace plus sliced-Wasserstein residual, together with a data-driven dimension rule and residual-structure conditions. No quoted equations or steps reduce any claimed bound, estimator, or prediction to a fitted input by construction, a self-definition, or a load-bearing self-citation chain. The construction is presented as an independent augmentation of projection baselines, and the provided text supplies no self-citation that is invoked to justify uniqueness or to smuggle an ansatz. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No specific free parameters, axioms, or invented entities are described in the abstract; the ledger is therefore empty pending full text.

pith-pipeline@v0.9.1-grok · 5687 in / 1010 out tokens · 22243 ms · 2026-06-28T18:09:48.019010+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    Annals of Statistics , volume=

    Plugin estimation of smooth optimal transport maps , author=. Annals of Statistics , volume=. 2024 , publisher=

  2. [2]

    Advances in neural information processing systems , volume=

    Projection robust Wasserstein distance and Riemannian optimization , author=. Advances in neural information processing systems , volume=

  3. [3]

    Bernoulli , volume=

    Estimation of Wasserstein distances in the spiked transport model , author=. Bernoulli , volume=. 2022 , publisher=

  4. [4]

    arXiv preprint arXiv:2310.08115 , year=

    Model-agnostic covariate-assisted inference on partially identified causal effects , author=. arXiv preprint arXiv:2310.08115 , year=

  5. [5]

    arXiv preprint arXiv:2406.07868 , year=

    Bridging multiple worlds: multi-marginal optimal transport for causal partial-identification problem , author=. arXiv preprint arXiv:2406.07868 , year=

  6. [6]

    Communications on Pure and Applied Mathematics , volume=

    Polar factorization and monotone rearrangement of vector-valued functions , author=. Communications on Pure and Applied Mathematics , volume=

  7. [7]

    Asymptotic Guarantees for Learning with the Sliced-Wasserstein Distance , booktitle =

    Kenza Nadjahi and Alain Durmus and Gabriel Peyr. Asymptotic Guarantees for Learning with the Sliced-Wasserstein Distance , booktitle =

  8. [8]

    Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections , booktitle =

    Kenza Nadjahi and Alain Durmus and R\'. Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections , booktitle =

  9. [9]

    arXiv preprint arXiv:2210.09160 , year =

    Sebastian Nietert and Rajarshi Sadhu and Jonathan Niles-Weed and Kristjan Greenewald and Ziv Goldfeld , title =. arXiv preprint arXiv:2210.09160 , year =

  10. [10]

    Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates , journal =

    R\'. Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates , journal =

  11. [11]

    Aequationes Mathematicae , volume =

    Gianluca Auricchio , title =. Aequationes Mathematicae , volume =

  12. [12]

    Advances in Neural Information Processing Systems (NeurIPS) , volume =

    Soheil Kolouri and Kimia Nadjahi and Ulysse Simsekli and Gustavo Rohde , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

  13. [13]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

    Ishaan Deshpande and Zizhao Zhang and Alexander Schwing and Raquel Urtasun , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

  14. [14]

    Boedihardjo , title =

    Matthew T. Boedihardjo , title =. arXiv preprint arXiv:2403.00666 , year =

  15. [15]

    International Conference on Learning Representations (ICLR) , year =

    Khai Nguyen and Nhat Ho , title =. International Conference on Learning Representations (ICLR) , year =

  16. [16]

    Estimation of

    Niles-Weed, Jonathan and Rigollet, Philippe , journal =. Estimation of. 2022 , volume =

  17. [17]

    Sliced and

    Bonneel, Nicolas and Rabin, Julien and Peyr. Sliced and. Journal of Mathematical Imaging and Vision , year =

  18. [18]

    Foundations and Trends in Machine Learning , year =

    Computational Optimal Transport: With Applications to Data Science , author =. Foundations and Trends in Machine Learning , year =

  19. [19]

    2003 , publisher =

    Partial Identification of Probability Distributions , author =. 2003 , publisher =

  20. [20]

    Econometrica , volume =

    Confidence intervals for partially identified parameters , author =. Econometrica , volume =. 2004 , doi =

  21. [21]

    2024 , eprint =

    Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects , author =. 2024 , eprint =

  22. [22]

    2025 , eprint =

    Tightening Causal Bounds via Covariate-Aware Optimal Transport , author =. 2025 , eprint =

  23. [23]

    2025 , eprint =

    Estimation of Optimal Causal Bounds via Covariate-Assisted Optimal Transport , author =. 2025 , eprint =

  24. [24]

    On the rate of convergence in

    Fournier, Nicolas and Guillin, Arnaud , journal =. On the rate of convergence in. 2015 , doi =

  25. [25]

    Sharp asymptotic and finite-sample rates of convergence of empirical measures in

    Weed, Jonathan and Bach, Francis , journal =. Sharp asymptotic and finite-sample rates of convergence of empirical measures in. 2019 , doi =

  26. [26]

    Subspace Robust

    Paty, Fran. Subspace Robust. Proceedings of the 36th International Conference on Machine Learning , series =. 2019 , publisher =

  27. [27]

    , booktitle =

    Lin, Tianyi and Fan, Chenyou and Ho, Nhat and Cuturi, Marco and Jordan, Michael I. , booktitle =. Projection Robust

  28. [28]

    Proceedings of the 38th International Conference on Machine Learning , volume =

    A Riemannian Block Coordinate Descent Method for Computing the Projection Robust Wasserstein Distance , author =. Proceedings of the 38th International Conference on Machine Learning , volume =

  29. [29]

    Scale Space and Variational Methods in Computer Vision , pages =

    Wasserstein Barycenter and Its Application to Texture Mixing , author =. Scale Space and Variational Methods in Computer Vision , pages =. 2011 , publisher =

  30. [30]

    <Real Paper Title>

    Anonymous Authors. <Real Paper Title>

  31. [31]

    arXiv preprint arXiv:2006.07788 , year=

    Dynamic Window-Level Granger Causality of Multi-channel Time Series , author=. arXiv preprint arXiv:2006.07788 , year=

  32. [32]

    UAI , year=

    Partial Identification with Proxy of Latent Confoundings via Sum-of-Ratios Fractional Programming , author=. UAI , year=

  33. [33]

    arXiv preprint arXiv:2210.11450 , year=

    Bounding the Largest Intersection Number of Regular Simplicial Partitions , author=. arXiv preprint arXiv:2210.11450 , year=

  34. [34]

    SIGIR , year=

    Robust Causal Inference for Recommender System to Overcome Noisy Confounders , author=. SIGIR , year=

  35. [35]

    ICML , year=

    Tight Partial Identification of Causal Effects with Marginal Distribution of Unmeasured Confounders , author=. ICML , year=

  36. [36]

    NeurIPS , year=

    Unveiling Environmental Sensitivity of Individual Gains in Influence Maximization , author=. NeurIPS , year=

  37. [37]

    NeurIPS , year=

    Design-Based Bandits Under Network Interference: Trade-Off Between Regret and Statistical Inference , author=. NeurIPS , year=

  38. [38]

    ICML , year=

    Active Treatment Effect Estimation via Limited Samples , author=. ICML , year=

  39. [39]

    NeurIPS , year=

    Online Experimental Design with Estimation-Regret Trade-Off Under Network Interference , author=. NeurIPS , year=

  40. [40]

    Group Permutation Testing in Linear Model: Sharp Validity, Power Improvement, and Extension Beyond Exchangeability

    Group Permutation Testing for Linear Models: Sharp Validity, Power Improvement, and Extension Beyond Exchangeability , author=. https://arxiv.org/pdf/2601.17734 , year=

  41. [41]

    International Conference on Neural Information Processing , pages=

    Causal-Inspired Influence Maximization in Hypergraphs Under Temporal Constraints , author=. International Conference on Neural Information Processing , pages=. 2023 , organization=

  42. [42]

    International Conference on Neural Information Processing , pages=

    Dynamic Group Effects Analysis for Online Influence Maximization in Hypergraphs , author=. International Conference on Neural Information Processing , pages=. 2024 , organization=

  43. [43]

    arXiv preprint arXiv:2602.19851 , year=

    Orthogonal Uplift Learning with Permutation-Invariant Representations for Combinatorial Treatments , author=. arXiv preprint arXiv:2602.19851 , year=

  44. [44]

    arXiv preprint arXiv:2602.19738 , year=

    Individualized Causal Effects under Network Interference with Combinatorial Treatments , author=. arXiv preprint arXiv:2602.19738 , year=

  45. [45]

    Informs Journal on Optimization , number=

    Partial Identification with Proxy of Latent Confoundings via Sum-of-ratios Fractional Programming , author=. Informs Journal on Optimization , number=

  46. [46]

    Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

    Causal inference on distribution functions , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=