Distributional Causal Mediation via Conditional Generative Modeling

Chunquan Ou; Haoneng Huang; Jinlun Zhang; Zishu Zhan

arxiv: 2605.01765 · v1 · submitted 2026-05-03 · 📊 stat.ML · cs.LG

Distributional Causal Mediation via Conditional Generative Modeling

Jinlun Zhang , Haoneng Huang , Zishu Zhan , Chunquan Ou This is my paper

Pith reviewed 2026-05-09 17:11 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords distributional causal mediationconditional generative modelsMonte Carlo simulationinterventional distributionsmediation analysiscausal inferenceWasserstein distance

0 comments

The pith

Distributional Causal Mediation Analysis learns conditional generative models to reconstruct full interventional outcome distributions transmitted through multiple mediators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes DCMA, a framework that learns conditional generative models for mediators and the outcome from observational data. It uses these models to simulate interventional distributions via Monte Carlo noise resampling, following identification formulas. This approach captures not only average treatment effects but also full distributional changes measured by distances such as energy or Wasserstein. Analytical bounds track how errors in the learned models affect the simulated outcomes. The method is tested in numerical experiments and real data applications.

Core claim

DCMA learns conditional generative models for the mediators and the outcome, recovering the relevant conditional distributions from observational data. Leveraging the identification formulas, it reconstructs interventional outcome distributions via Monte Carlo forward simulation by noise resampling, enabling the capture of both classical summary effects and rich distributional contrasts such as energy distance and the Wasserstein distance.

What carries the argument

Conditional generative models for mediators and outcome combined with Monte Carlo forward simulation by noise resampling to reconstruct interventional distributions.

If this is right

Enables measurement of treatment effects on entire outcome distributions rather than just means.
Supports computation of distributional contrasts including energy distance and Wasserstein distance.
Provides analytical error bounds that decompose propagation from model estimation errors to final distributional estimates.
Handles multiple mediators simultaneously in the generative simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The simulation approach could be adapted to settings with time-varying mediators by extending the generative models to sequential conditioning.
Integration with sensitivity analysis tools might quantify robustness to the no-unmeasured-confounding assumption in practice.
Policy applications could shift from optimizing average outcomes to minimizing tail risks or inequality in full distributions.

Load-bearing premise

The learned conditional generative models must accurately recover the true conditional distributions from observational data without unmeasured confounding or model misspecification.

What would settle it

A controlled simulation or randomized experiment where the true interventional outcome distribution is known independently; if DCMA's Monte Carlo reconstructions deviate substantially from this known distribution, the method fails.

Figures

Figures reproduced from arXiv: 2605.01765 by Chunquan Ou, Haoneng Huang, Jinlun Zhang, Zishu Zhan.

**Figure 1.** Figure 1: illustrates the interventional mediation framework with two mediators, which can be extended naturally to S mediators. The green edge represents the interventional direct effect (IDE), while the blue and orange edges correspond to the interventional path-specific effects (IPSE) operating through M1 and M2, respectively, excluding any pathways through their descendants in the graph. A M1 M2 Y Z view at source ↗

**Figure 2.** Figure 2: Overview of DCMA: data preprocessing, conditional generative modeling, interventional distribution generation, distributional causal effects estimation. 4.2. Conditional Generators for Mediators and Outcome We model each conditional distribution using a noise-driven generator that maps exogenous randomness to samples following the desired conditional distribution. For the mediators, let εM be a noise vec… view at source ↗

**Figure 3.** Figure 3: True and DCMA estimated interventional outcome distributions. Panels (A)–(C) show the true (black) and DCMA–ES estimated (red) densities for Y0M0 , Y1M0 , and Y1M1 , respectively. The DCMA estimate is obtained by pointwise averaging over 100 replications, and the shaded region denotes the 95% inter-replication interval defined by the 2.5th and 97.5th percentiles across replications. narios, the treatment i… view at source ↗

**Figure 4.** Figure 4: Quantile-based mediation effect estimates. Panels (A) and (B) correspond to Scenarios S1 and S2, respectively. Solid lines denote the true quantile effects, dashed lines denote the DCMA estimates, and shaded regions denote the 95% interreplication interval, defined by the 2.5th and 97.5th percentiles across Monte Carlo replications view at source ↗

**Figure 5.** Figure 5: displays the empirical distributions of BMI, total cholesterol, and SBP. BMI exhibits a right-skewed distribution (skewness = 1.10), whereas both total cholesterol and SBP are approximately normally distributed view at source ↗

**Figure 6.** Figure 6: Estimated interventional outcome distributions of systolic blood pressure (SBP) under different exposure–mediator contrasts. The dashed vertical line indicates the hypertension threshold at SBP = 140 mmHg. 19 view at source ↗

**Figure 7.** Figure 7: True and DCMA-estimated interventional outcome distributions. Panels (A)–(C) display the true (black) and DCMA-estimated (red) densities for Y0M0 , Y1M0 , and Y1M1 , respectively. The DCMA estimate is obtained by pointwise averaging over 100 Monte Carlo replications, and the shaded region denotes the 95% inter-replication interval constructed from the 2.5th and 97.5th percentiles. 0.000 0.025 0.050 0.075 0… view at source ↗

**Figure 8.** Figure 8: Energy distances (ED) between the estimated and true interventional outcome distributions across 100 repeats. The proposed DCMA method is shown in red, while the ablated Linear–Gaussian outcome model is shown in blue. 20 view at source ↗

read the original abstract

Mediation analysis has traditionally focused on outcome-level summary contrasts, such as mean effects, which may obscure substantial distributional changes induced by complex and nonlinear causal mechanisms. We propose Distributional Causal Mediation Analysis (DCMA), a generative learning framework for identifying and estimating treatment effects on entire outcome distributions transmitted through multiple mediators. DCMA learns conditional generative models for the mediators and the outcome, recovering the relevant conditional distributions from observational data. Leveraging the identification formulas, it reconstructs interventional outcome distributions via Monte Carlo forward simulation by noise resampling, enabling the capture of both classical summary effects and rich distributional contrasts such as energy distance and the Wasserstein distance. Analytical error bounds are derived to decompose how estimation errors in the learned conditional models propagate to the reconstructed interventional outcome distributions. The empirical effectiveness of DCMA is demonstrated through numerical experiments and real-world data applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DCMA uses generative models to simulate full interventional outcome distributions for mediation, which is a practical extension but one that inherits the usual limits on recovering exact conditionals.

read the letter

The paper's main move is to replace summary-mean mediation with a generative setup that learns conditional models for mediators and outcome, then uses Monte Carlo noise resampling to build the full interventional distribution under different treatment levels. This lets them compute contrasts like energy distance or Wasserstein distance instead of just average effects, and they derive bounds on how errors in the fitted models propagate to those contrasts. The abstract and setup show they handle multiple mediators without forcing parametric forms on the distributions themselves. That part is straightforward and fills a gap for settings where heterogeneity or tail behavior matters more than the mean. The numerical experiments and real-data examples are presented as evidence that the pipeline runs and produces plausible results. The error bounds are a concrete addition that decomposes the propagation under an oracle approximation, which is better than nothing. The soft spots sit where the stress-test note points: the Monte Carlo step only recovers the right interventional distribution if the learned conditionals are close to the true observational ones, and standard generative training does not guarantee that in finite samples or with high-dimensional covariates. Any systematic mismatch feeds straight into the reported distances. The usual sequential ignorability assumptions are still required and not relaxed here. If the full paper only validates under well-behaved simulations, the bounds may overstate robustness once model misspecification enters. This is the kind of work that belongs in a causal-inference reading group or methods journal. Readers who already use generative models for causal tasks will see a direct way to extend their toolkit to distributional mediation. It is worth sending to peer review because the framework is clearly stated, the bounds are derived, and the experiments exist; referees can press on the finite-sample behavior and the strength of the identification checks.

Referee Report

2 major / 2 minor

Summary. The paper proposes Distributional Causal Mediation Analysis (DCMA), a generative learning framework that learns conditional generative models for multiple mediators and the outcome from observational data. It reconstructs interventional outcome distributions under treatment via Monte Carlo forward simulation by resampling noise through the fitted conditionals, enabling estimation of both classical summary effects and distributional contrasts such as energy distance and Wasserstein distance. Analytical error bounds are derived to quantify propagation of estimation errors from the learned models to the reconstructed distributions, with validation via numerical experiments and real-world applications.

Significance. If the identification, reconstruction, and error bounds hold under the stated assumptions, the work would extend mediation analysis from mean effects to full distributional contrasts using modern conditional generative models, which could be valuable in applications where heterogeneity or tail behavior matters. The Monte Carlo approach and explicit error decomposition are strengths when the generative models are reliable; however, significance is tempered by the dependence on unverified recovery of the true observational conditionals.

major comments (2)

[Abstract and error bounds section] The analytical error bounds (described in the abstract and presumably §4) decompose propagation under an oracle approximation error but do not address model misspecification or finite-sample bias in the generative training step itself. Since the Monte Carlo reconstruction of interventional distributions relies on the fitted models converging to the true P(M|A,X) and P(Y|M,A,X), any systematic mismatch directly biases all reported contrasts; this is load-bearing for the central claim of valid distributional estimation.
[Identification and method sections] The identification step invokes standard formulas but the manuscript must explicitly state and justify the sequential ignorability / no unmeasured confounding assumptions for the full mediator chain (including how they interact with the generative modeling of multiple mediators); without this, the claim that the method 'identifies' the interventional distributions cannot be evaluated.

minor comments (2)

[Notation and definitions] Clarify notation distinguishing the true conditional distributions from the learned generative approximations throughout the text.
[Experiments section] The description of numerical experiments lacks detail on model architectures, training objectives, hyperparameter selection, and sensitivity analyses; add these to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We address the two major comments point by point below and describe the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract and error bounds section] The analytical error bounds (described in the abstract and presumably §4) decompose propagation under an oracle approximation error but do not address model misspecification or finite-sample bias in the generative training step itself. Since the Monte Carlo reconstruction of interventional distributions relies on the fitted models converging to the true P(M|A,X) and P(Y|M,A,X), any systematic mismatch directly biases all reported contrasts; this is load-bearing for the central claim of valid distributional estimation.

Authors: Our error bounds in §4 are derived to quantify propagation of approximation errors from the fitted conditional generative models to the Monte Carlo-reconstructed interventional distributions, with an explicit decomposition separating generative-model error from simulation error. They are stated under the assumption that the learned models approximate the true conditionals P(M|A,X) and P(Y|M,A,X). We acknowledge that the bounds do not themselves bound finite-sample training error or address model misspecification. In revision we will add a clarifying paragraph in §4 that states these scope limitations, notes the dependence on accurate generative-model training, and recommends practical diagnostics (e.g., held-out likelihood or posterior predictive checks) to assess model adequacy before applying the bounds. revision: partial
Referee: [Identification and method sections] The identification step invokes standard formulas but the manuscript must explicitly state and justify the sequential ignorability / no unmeasured confounding assumptions for the full mediator chain (including how they interact with the generative modeling of multiple mediators); without this, the claim that the method 'identifies' the interventional distributions cannot be evaluated.

Authors: We agree that the identifying assumptions require explicit statement. The manuscript currently invokes the standard identification formulas for distributional mediation but does not spell them out for the multi-mediator case. In the revised version we will insert a dedicated subsection (new §3.1) that states the sequential ignorability assumptions for treatment, the ordered mediators, and the outcome, together with the no-unmeasured-confounding conditions. We will justify how these assumptions license the use of the learned conditional generative models to identify the interventional distributions and how they interact with the Monte Carlo forward-simulation procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on standard identification and simulation

full rationale

The paper applies established causal mediation identification formulas to learn conditional generative models P(M|A,X) and P(Y|M,A,X) from observational data, then reconstructs interventional distributions P(Y|do(A=a)) via Monte Carlo noise resampling. This simulation step produces the target distributional contrasts (energy distance, Wasserstein) as downstream quantities rather than redefining them as the fitted parameters themselves. Analytical error bounds decompose propagation of model estimation error under oracle assumptions but do not equate the interventional estimates to the training loss or fitted conditionals by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing premises. The method remains falsifiable against external benchmarks of causal identification and generative model recovery.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; limited information prevents detailed ledger.

pith-pipeline@v0.9.0 · 5440 in / 1059 out tokens · 46040 ms · 2026-05-09T17:11:50.781121+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

[1]

Epidemiology , volume=

Identifiability and exchangeability for direct and indirect effects , author=. Epidemiology , volume=. 1992 , publisher=

work page 1992
[2]

, author=

A general approach to causal mediation analysis. , author=. Psychological methods , volume=. 2010 , publisher=

work page 2010
[3]

Epidemiologic methods , volume=

Mediation analysis with multiple mediators , author=. Epidemiologic methods , volume=. 2014 , publisher=

work page 2014
[4]

, author=

Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn. , author=. Psychological methods , volume=. 2021 , publisher=

work page 2021
[5]

wiley interdisciplinary reviews: Computational statistics , volume=

Energy distance , author=. wiley interdisciplinary reviews: Computational statistics , volume=. 2016 , publisher=

work page 2016
[6]

Journal of the American Statistical Association , volume=

A deep generative approach to conditional sampling , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=

work page 2023
[7]

Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=

Wasserstein generative regression , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=. 2025 , publisher=

work page 2025
[8]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Engression: extrapolation through the lens of distributional regression , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

work page 2025
[9]

Epidemiology , volume=

Interventional effects for mediation analysis with multiple mediators , author=. Epidemiology , volume=. 2017 , publisher=

work page 2017
[10]

Journal of Causal Inference , volume=

Interventional approach for path-specific effects , author=. Journal of Causal Inference , volume=. 2017 , publisher=

work page 2017
[11]

, author=

Disentangling indirect effects through multiple mediators without assuming any causal structure among the mediators. , author=. Psychological Methods , volume=. 2022 , publisher=

work page 2022
[12]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Mediation analysis with time varying exposures and mediators , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2017 , publisher=

work page 2017
[13]

Biometrics , volume=

Nonlinear mediation analysis with high-dimensional mediators whose causal structure is unknown , author=. Biometrics , volume=. 2022 , publisher=

work page 2022
[14]

Statistical Methods in Medical Research , volume=

Mediation effects that emulate a target randomised trial: simulation-based evaluation of ill-defined interventions on multiple mediators , author=. Statistical Methods in Medical Research , volume=. 2021 , publisher=

work page 2021
[15]

Journal of Causal Inference , volume=

Nonparametric inference for interventional effects with multiple mediators , author=. Journal of Causal Inference , volume=. 2021 , publisher=

work page 2021
[16]

arXiv preprint arXiv:2506.14019 , year=

Causal Mediation Analysis with Multiple Mediators: A Simulation Approach , author=. arXiv preprint arXiv:2506.14019 , year=

work page arXiv
[17]

Epidemiology , volume=

Effect decomposition in the presence of an exposure-induced mediator-outcome confounder , author=. Epidemiology , volume=. 2014 , publisher=

work page 2014
[18]

2008 , publisher=

Optimal transport: old and new , author=. 2008 , publisher=

work page 2008
[19]

arXiv preprint arXiv:2506.05945 , year=

On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization , author=. arXiv preprint arXiv:2506.05945 , year=

work page arXiv
[20]

Journal of the American statistical Association , volume=

Strictly proper scoring rules, prediction, and estimation , author=. Journal of the American statistical Association , volume=. 2007 , publisher=

work page 2007
[21]

International conference on machine learning , pages=

Wasserstein generative adversarial networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017
[22]

Advances in neural information processing systems , volume=

Improved training of wasserstein gans , author=. Advances in neural information processing systems , volume=

work page
[23]

The annals of statistics , pages=

Equivalence of distance-based and RKHS-based statistics in hypothesis testing , author=. The annals of statistics , pages=. 2013 , publisher=

work page 2013
[24]

Journal of the American Statistical Association , volume=

De-confounding causal inference using latent multiple-mediator pathways , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024

[1] [1]

Epidemiology , volume=

Identifiability and exchangeability for direct and indirect effects , author=. Epidemiology , volume=. 1992 , publisher=

work page 1992

[2] [2]

, author=

A general approach to causal mediation analysis. , author=. Psychological methods , volume=. 2010 , publisher=

work page 2010

[3] [3]

Epidemiologic methods , volume=

Mediation analysis with multiple mediators , author=. Epidemiologic methods , volume=. 2014 , publisher=

work page 2014

[4] [4]

, author=

Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn. , author=. Psychological methods , volume=. 2021 , publisher=

work page 2021

[5] [5]

wiley interdisciplinary reviews: Computational statistics , volume=

Energy distance , author=. wiley interdisciplinary reviews: Computational statistics , volume=. 2016 , publisher=

work page 2016

[6] [6]

Journal of the American Statistical Association , volume=

A deep generative approach to conditional sampling , author=. Journal of the American Statistical Association , volume=. 2023 , publisher=

work page 2023

[7] [7]

Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=

Wasserstein generative regression , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , pages=. 2025 , publisher=

work page 2025

[8] [8]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Engression: extrapolation through the lens of distributional regression , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

work page 2025

[9] [9]

Epidemiology , volume=

Interventional effects for mediation analysis with multiple mediators , author=. Epidemiology , volume=. 2017 , publisher=

work page 2017

[10] [10]

Journal of Causal Inference , volume=

Interventional approach for path-specific effects , author=. Journal of Causal Inference , volume=. 2017 , publisher=

work page 2017

[11] [11]

, author=

Disentangling indirect effects through multiple mediators without assuming any causal structure among the mediators. , author=. Psychological Methods , volume=. 2022 , publisher=

work page 2022

[12] [12]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Mediation analysis with time varying exposures and mediators , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2017 , publisher=

work page 2017

[13] [13]

Biometrics , volume=

Nonlinear mediation analysis with high-dimensional mediators whose causal structure is unknown , author=. Biometrics , volume=. 2022 , publisher=

work page 2022

[14] [14]

Statistical Methods in Medical Research , volume=

Mediation effects that emulate a target randomised trial: simulation-based evaluation of ill-defined interventions on multiple mediators , author=. Statistical Methods in Medical Research , volume=. 2021 , publisher=

work page 2021

[15] [15]

Journal of Causal Inference , volume=

Nonparametric inference for interventional effects with multiple mediators , author=. Journal of Causal Inference , volume=. 2021 , publisher=

work page 2021

[16] [16]

arXiv preprint arXiv:2506.14019 , year=

Causal Mediation Analysis with Multiple Mediators: A Simulation Approach , author=. arXiv preprint arXiv:2506.14019 , year=

work page arXiv

[17] [17]

Epidemiology , volume=

Effect decomposition in the presence of an exposure-induced mediator-outcome confounder , author=. Epidemiology , volume=. 2014 , publisher=

work page 2014

[18] [18]

2008 , publisher=

Optimal transport: old and new , author=. 2008 , publisher=

work page 2008

[19] [19]

arXiv preprint arXiv:2506.05945 , year=

On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization , author=. arXiv preprint arXiv:2506.05945 , year=

work page arXiv

[20] [20]

Journal of the American statistical Association , volume=

Strictly proper scoring rules, prediction, and estimation , author=. Journal of the American statistical Association , volume=. 2007 , publisher=

work page 2007

[21] [21]

International conference on machine learning , pages=

Wasserstein generative adversarial networks , author=. International conference on machine learning , pages=. 2017 , organization=

work page 2017

[22] [22]

Advances in neural information processing systems , volume=

Improved training of wasserstein gans , author=. Advances in neural information processing systems , volume=

work page

[23] [23]

The annals of statistics , pages=

Equivalence of distance-based and RKHS-based statistics in hypothesis testing , author=. The annals of statistics , pages=. 2013 , publisher=

work page 2013

[24] [24]

Journal of the American Statistical Association , volume=

De-confounding causal inference using latent multiple-mediator pathways , author=. Journal of the American Statistical Association , volume=. 2024 , publisher=

work page 2024