Wasserstein Policy Learning for Distributional Outcomes

Cheuk Hang Leung; Qi Wu; Yiyan Huang; Zhiheng Zhang

arxiv: 2606.19117 · v1 · pith:AK4R3COXnew · submitted 2026-06-17 · 📊 stat.ME · cs.LG· econ.EM· stat.ML

Wasserstein Policy Learning for Distributional Outcomes

Yiyan Huang , Cheuk Hang Leung , Qi Wu , Zhiheng Zhang This is my paper

Pith reviewed 2026-06-26 19:40 UTC · model grok-4.3

classification 📊 stat.ME cs.LGecon.EMstat.ML

keywords offline policy learningdistributional outcomesWasserstein barycenterinverse probability weightingdoubly robust estimatorsregret boundsminimax lower boundcausal inference

0 comments

The pith

Offline policy learning extends to distribution-valued outcomes by optimizing utilities on Wasserstein barycenters, with regret bounds driven by policy class complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops methods for learning treatment policies when each potential outcome is a full probability distribution on the real line rather than a scalar. Welfare is measured by applying a utility functional to the Wasserstein barycenter of the distributions induced by a given policy. Statistical guarantees are provided for inverse probability weighting and doubly robust estimators by controlling uniform deviations over the product of a combinatorial policy class and the infinite-dimensional quantile domain. The resulting finite-sample regret scales as the square root of policy complexity over sample size, and a minimax lower bound shows this dependence is sharp.

Core claim

In the one-dimensional Wasserstein setting and under the stated regularity conditions, the finite-sample regret for the policy learning framework based on both IPW and DR estimators has leading dependence ilde O(sqrt(N-dim(Π)/N)). The leading regret rate remains governed by the policy-class complexity even though the quantile domain is infinite-dimensional. A minimax lower bound establishes the sharpness of the leading dependence on N and N-dim(Π).

What carries the argument

Utility functional applied to the Wasserstein barycenter of the outcome distributions induced by a policy, estimated via IPW and DR estimators.

If this is right

The regret rate depends only on the complexity of the policy class and not on the infinite dimensionality of the outcome distributions.
Both IPW and DR estimators achieve the same leading regret rate.
The minimax lower bound confirms that no estimator can improve on the sqrt dependence on policy complexity and sample size.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same uniform-deviation technique could be applied to other distances between distributions provided analogous regularity conditions can be verified.
The framework suggests that individualized treatment rules could be learned directly from histogram or density data without first reducing each outcome to a scalar summary.
Empirical tests on real distributional data would reveal how large N must be before the sqrt rate becomes visible.

Load-bearing premise

Unspecified regularity conditions hold that allow uniform deviation to be controlled over the product of the combinatorial policy class and the infinite-dimensional quantile domain.

What would settle it

A concrete data-generating process and policy class satisfying the regularity conditions for which the observed regret exceeds ilde O(sqrt(N-dim(Π)/N)) by more than logarithmic factors.

read the original abstract

Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N\text{-}dim}(\Pi)$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Extends offline policy learning to distributional outcomes via Wasserstein barycenters and gets matching regret and minimax bounds, but the key uniform deviation step rests on regularity conditions that the abstract leaves unspecified.

read the letter

The paper takes the standard offline policy learning setup and replaces scalar outcomes with probability measures on the line. Welfare is defined through a utility on the Wasserstein barycenter of the induced distributions under a policy. They give IPW and DR estimators and prove finite-sample regret bounds whose leading term is the usual ilde O(sqrt(N-dim(Π)/N)) rate, plus a matching minimax lower bound that shows the dependence on sample size and policy-class complexity is sharp.

What stands out is that the rate does not degrade when moving from means to full distributions; the extra work is in controlling the supremum of the IPW/DR process over the product of the policy class and the quantile domain. If that chaining argument closes under the stated conditions, the result is a clean extension rather than a completely new rate.

The soft spot is exactly the one the stress-test flags. The abstract repeatedly invokes “stated regularity conditions” to handle the uniform deviation, yet gives no explicit list—entropy numbers, Lipschitz constants on the utility, or moment restrictions on the outcome measures. Without those details it is impossible to judge whether the conditions are mild or whether they rule out the very applications (heavy tails, multimodal outcomes) where distributional policy learning would be most useful. The circularity burden looks low, since the leading term still comes from policy-class complexity rather than any estimated quantity.

This is aimed at researchers already working on offline policy learning who want to move beyond mean outcomes. A reader who already knows the scalar IPW/DR literature will see the technical increment immediately. The work is coherent on its own terms and shows clear engagement with the existing regret analysis, so it deserves a serious referee even if the conditions turn out to need tightening or clarification in revision.

Referee Report

2 major / 1 minor

Summary. The manuscript develops an offline policy learning framework for distribution-valued outcomes, where the objective is to maximize a utility functional of the Wasserstein barycenter of induced distributions. It introduces IPW and DR estimators for this setting and claims finite-sample regret bounds of leading order ilde{\mathcal{O}}(\sqrt{ m N-dim}(\Pi)/N) together with a matching minimax lower bound, with the rate governed by policy-class complexity under regularity conditions that control uniform deviations over the product of the policy class and the quantile domain.

Significance. If the claimed bounds hold under explicitly verifiable conditions, the work meaningfully extends scalar-outcome policy learning to distributional outcomes while preserving sharp dependence on policy complexity rather than outcome dimension. The matching upper and lower bounds constitute a clear strength.

major comments (2)

[Abstract] Abstract and the paragraph on statistical guarantees: the regret bound ilde{\mathcal{O}}(\sqrt{\rm N-dim}(\Pi)/N) and its minimax sharpness are asserted to follow from controlling the uniform deviation over the product of the combinatorial policy class \Pi and the infinite-dimensional quantile domain, yet the required regularity conditions (entropy integrability, Lipschitz constants on the utility, moment bounds on outcome measures, etc.) are referenced but never explicitly enumerated or shown to be sufficient for the chaining argument to close.
[Statistical guarantees section] Section deriving the finite-sample regret (IPW/DR estimators): the leading term is obtained by applying standard IPW/DR theory to the new functional, but without explicit error-bar details, the precise statement of the regularity conditions, or verification that they hold uniformly over the product space, the claimed rate cannot be confirmed.

minor comments (1)

[Notation] Clarify the precise definition of N-dim(\Pi) (e.g., whether it is the Natarajan dimension) and ensure consistent notation between the abstract and the body.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which correctly identify areas where greater explicitness will strengthen the manuscript. We will revise to enumerate the regularity conditions and provide the missing derivation details while preserving the core results on the regret bounds.

read point-by-point responses

Referee: [Abstract] Abstract and the paragraph on statistical guarantees: the regret bound ilde{ m O}(\sqrt{N-dim}(\Pi)/N) and its minimax sharpness are asserted to follow from controlling the uniform deviation over the product of the combinatorial policy class \Pi and the infinite-dimensional quantile domain, yet the required regularity conditions (entropy integrability, Lipschitz constants on the utility, moment bounds on outcome measures, etc.) are referenced but never explicitly enumerated or shown to be sufficient for the chaining argument to close.

Authors: We agree that the conditions are referenced rather than enumerated. In the revision we will add a dedicated subsection listing them explicitly: (i) finite entropy integral of \Pi with respect to the covering metric on the quantile domain, (ii) Lipschitz continuity of the utility functional w.r.t. the 1-Wasserstein distance with constant independent of the policy, (iii) uniform fourth-moment bounds on the outcome measures, and (iv) overlap and boundedness conditions on the propensity scores. We will then include a short chaining argument showing that these conditions suffice to control the supremum over \Pi \times [0,1] and thereby close the proof of the stated regret rate. revision: yes
Referee: [Statistical guarantees section] Section deriving the finite-sample regret (IPW/DR estimators): the leading term is obtained by applying standard IPW/DR theory to the new functional, but without explicit error-bar details, the precise statement of the regularity conditions, or verification that they hold uniformly over the product space, the claimed rate cannot be confirmed.

Authors: The observation is accurate. The revised section will contain (a) an explicit error decomposition separating the IPW/DR bias term from the stochastic term with explicit constants, (b) a uniform deviation lemma that states the bound under the enumerated conditions, and (c) a verification paragraph confirming that the moment and Lipschitz assumptions propagate uniformly over the product space because the quantile functions remain controlled. These additions will make the derivation of the leading \tilde O(\sqrt{N-dim(\Pi)/N}) term directly verifiable without changing the stated results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard IPW/DR theory applied to new functional

full rationale

The paper derives finite-sample regret bounds of order ilde O(sqrt(N-dim(Π)/N)) and a matching minimax lower bound by applying existing IPW and DR estimation theory to the Wasserstein-barycenter utility functional. The leading term is governed by the combinatorial complexity of the policy class Π, not by any fitted parameter, self-referential normalization, or self-citation chain. The abstract explicitly invokes 'stated regularity conditions' for the uniform deviation argument over the policy-quantile product space, but these are external to the derivation itself and do not reduce the claimed result to a tautology. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the provided text. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard causal-inference assumptions plus regularity conditions that are invoked but not enumerated in the abstract.

axioms (1)

domain assumption Regularity conditions on outcome distributions and utility functional that enable uniform deviation bounds over policy class times quantile domain
Explicitly referenced as prerequisite for the leading regret term and lower bound.

pith-pipeline@v0.9.1-grok · 5745 in / 1318 out tokens · 25102 ms · 2026-06-26T19:40:39.505249+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

82 extracted references · 2 canonical work pages

[1]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Causal inference on distribution functions , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2023 , publisher=

2023
[2]

Journal of Machine Learning Research , volume=

Causal effect of functional treatment , author=. Journal of Machine Learning Research , volume=
[3]

Journal of the Royal Statistical Society Series C: Applied Statistics , volume=

Causal inference with a functional outcome , author=. Journal of the Royal Statistical Society Series C: Applied Statistics , volume=. 2024 , publisher=

2024
[4]

Journal of the American Statistical Association , pages=

Policy learning with distributional welfare , author=. Journal of the American Statistical Association , pages=. 2025 , publisher=

2025
[5]

arXiv preprint arXiv:2501.06024 , year=

Doubly-robust functional average treatment effect estimation , author=. arXiv preprint arXiv:2501.06024 , year=

Pith/arXiv arXiv
[6]

Econometrica , volume=

Who should be treated? empirical welfare maximization methods for treatment choice , author=. Econometrica , volume=. 2018 , publisher=

2018
[7]

Econometrica , volume=

Policy learning with observational data , author=. Econometrica , volume=. 2021 , publisher=

2021
[8]

Econometrica , volume=

Statistical treatment rules for heterogeneous populations , author=. Econometrica , volume=. 2004 , publisher=

2004
[9]

Econometrica , volume=

Asymptotics for statistical treatment rules , author=. Econometrica , volume=. 2009 , publisher=

2009
[10]

Journal of Econometrics , volume=

Minimax regret treatment choice with finite samples , author=. Journal of Econometrics , volume=. 2009 , publisher=

2009
[11]

Operations Research , volume=

Offline multi-action policy learning: Generalization and optimization , author=. Operations Research , volume=. 2023 , publisher=

2023
[12]

Journal of the American Statistical Association , volume=

Estimating individualized treatment rules using outcome weighted learning , author=. Journal of the American Statistical Association , volume=. 2012 , publisher=

2012
[13]

The Journal of Machine Learning Research , volume=

Batch learning from logged bandit feedback through counterfactual risk minimization , author=. The Journal of Machine Learning Research , volume=. 2015 , publisher=

2015
[14]

Journal of the American Statistical Association , volume=

Residual weighted learning for estimating individualized treatment rules , author=. Journal of the American Statistical Association , volume=. 2017 , publisher=

2017
[15]

The Annals of Statistics , volume=

Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality , author=. The Annals of Statistics , volume=. 2025 , publisher=

2025
[16]

Advances in neural information processing systems , volume=

Balanced policy evaluation and learning , author=. Advances in neural information processing systems , volume=
[17]

Management Science , volume=

Minimax-optimal policy learning under unobserved confounding , author=. Management Science , volume=. 2021 , publisher=

2021
[18]

arXiv preprint arXiv:2305.11812 , year=

Off-policy evaluation beyond overlap: partial identification through smoothness , author=. arXiv preprint arXiv:2305.11812 , year=

arXiv
[19]

International Conference on Artificial Intelligence and Statistics , pages=

Positivity-free policy learning with observational data , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

2024
[20]

International conference on artificial intelligence and statistics , pages=

Policy evaluation and optimization with continuous treatments , author=. International conference on artificial intelligence and statistics , pages=. 2018 , organization=

2018
[21]

Journal of Econometrics , volume=

Data-driven policy learning for continuous treatments , author=. Journal of Econometrics , volume=. 2026 , publisher=

2026
[22]

Advances in Neural Information Processing Systems , volume=

Semi-parametric efficient policy learning with continuous actions , author=. Advances in Neural Information Processing Systems , volume=
[23]

arXiv preprint arXiv:2512.19230 , year=

Semiparametric Efficiency in Policy Learning with General Treatments , author=. arXiv preprint arXiv:2512.19230 , year=

Pith/arXiv arXiv
[24]

Advances in neural information processing systems , volume=

Confounding-robust policy improvement , author=. Advances in neural information processing systems , volume=
[25]

Management Science , volume=

Policy learning with adaptively collected data , author=. Management Science , volume=. 2024 , publisher=

2024
[26]

Advances in neural information processing systems , volume=

Risk minimization from adaptively collected data: Guarantees for supervised and policy learning , author=. Advances in neural information processing systems , volume=
[27]

Journal of the American Statistical Association , volume=

Quantile-optimal treatment regimes , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

2018
[28]

Journal of Econometrics , volume=

Treatment recommendation with distributional targets , author=. Journal of Econometrics , volume=. 2023 , publisher=

2023
[29]

arXiv preprint arXiv:2401.17909 , year=

Regularizing Discrimination in Optimal Policy Learning with Distributional Targets , author=. arXiv preprint arXiv:2401.17909 , year=

arXiv
[30]

The Japanese Economic Review , volume=

Treatment choice, mean square regret and partial identification , author=. The Japanese Economic Review , volume=. 2023 , publisher=

2023
[31]

The Japanese Economic Review , volume=

Statistical decision theory respecting stochastic dominance , author=. The Japanese Economic Review , volume=. 2023 , publisher=

2023
[32]

arXiv preprint arXiv:2406.19604 , year=

Geodesic causal inference , author=. arXiv preprint arXiv:2406.19604 , year=

arXiv
[33]

arXiv preprint arXiv:2503.05024 , year=

Kernel-based estimators for functional causal effects , author=. arXiv preprint arXiv:2503.05024 , year=

arXiv
[34]

arXiv preprint arXiv:2506.22754 , year=

Doubly robust estimation of causal effects for random object outcomes with continuous treatments , author=. arXiv preprint arXiv:2506.22754 , year=

arXiv
[35]

Journal of the American Statistical Association , volume=

Learning optimal distributionally robust individualized treatment rules , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

2021
[36]

International Conference on Machine Learning , pages=

Doubly robust distributionally robust off-policy evaluation and learning , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022
[37]

Advances in Neural Information Processing Systems , volume=

Factored DRO: Factored distributionally robust policies for contextual bandits , author=. Advances in Neural Information Processing Systems , volume=
[38]

Management Science , volume=

Distributionally robust batch contextual bandits , author=. Management Science , volume=. 2023 , publisher=

2023
[39]

arXiv preprint arXiv:2205.05561 , volume=

Externally valid treatment choice , author=. arXiv preprint arXiv:2205.05561 , volume=

arXiv
[40]

arXiv preprint arXiv:2205.04637 , year=

Distributionally robust policy learning with wasserstein distance , author=. arXiv preprint arXiv:2205.04637 , year=

arXiv
[41]

Transactions on Machine Learning Research , issn=

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits , author=. Transactions on Machine Learning Research , issn=. 2024 , url=

2024
[42]

arXiv preprint arXiv:2402.02535 , year=

Data-driven Policy Learning for a Continuous Treatment , author=. arXiv preprint arXiv:2402.02535 , year=

arXiv
[43]

Handbook of econometrics , volume=

Empirical process methods in econometrics , author=. Handbook of econometrics , volume=. 1994 , publisher=

1994
[44]

2013 , publisher=

Probability theory: a comprehensive course , author=. 2013 , publisher=

2013
[45]

Publications Math

Concentration of measure and isoperimetric inequalities in product spaces , author=. Publications Math. 1995 , publisher=

1995
[46]

Econometrica , volume=

Model selection for treatment choice: Penalized welfare maximization , author=. Econometrica , volume=. 2021 , publisher=

2021
[47]

Causal Inference on Distribution Functions , publisher =

Lin, Zhenhua and Kong, Dehan and Wang, Linbo , keywords =. Causal Inference on Distribution Functions , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2101.01599 , url =

work page doi:10.48550/arxiv.2101.01599 2021
[48]

Stat , volume=

Variable selection in function-on-scalar regression , author=. Stat , volume=. 2016 , publisher=

2016
[49]

Journal of the American statistical association , volume=

Functional data analysis for sparse longitudinal data , author=. Journal of the American statistical association , volume=. 2005 , publisher=

2005
[50]

Journal of the American Statistical Association , volume=

An accelerated-time model for response curves , author=. Journal of the American Statistical Association , volume=. 1997 , publisher=

1997
[51]

arXiv preprint arXiv:1410.8516 , year=

Nice: Non-linear independent components estimation , author=. arXiv preprint arXiv:1410.8516 , year=

Pith/arXiv arXiv
[52]

International conference on machine learning , pages=

Variational inference with normalizing flows , author=. International conference on machine learning , pages=. 2015 , organization=

2015
[53]

Advances in neural information processing systems , volume=

Neural ordinary differential equations , author=. Advances in neural information processing systems , volume=
[54]

International Conference on Learning Representations , year=

FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models , author=. International Conference on Learning Representations , year=
[55]

The Econometrics Journal , volume=

Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning , author=. The Econometrics Journal , volume=. 2018 , publisher=

2018
[56]

International Conference on Machine Learning , pages=

Orthogonal machine learning: Power and limitations , author=. International Conference on Machine Learning , pages=. 2018 , organization=

2018
[57]

Advances in neural information processing systems , volume=

Optimization over continuous and multi-dimensional decisions with observational data , author=. Advances in neural information processing systems , volume=
[58]

The Econometrics Journal , volume=

Debiased machine learning of conditional average treatment effects and other causal functions , author=. The Econometrics Journal , volume=. 2021 , publisher=

2021
[59]

2017 , institution=

Efficient Policy Learning , author=. 2017 , institution=

2017
[60]

Operations Research , year=

Offline multi-action policy learning: Generalization and optimization , author=. Operations Research , year=
[61]

Journal of Machine Learning Research , volume=

Rademacher and Gaussian complexities: Risk bounds and structural results , author=. Journal of Machine Learning Research , volume=
[62]

Advances in neural information processing systems , volume=

On the complexity of linear prediction: Risk bounds, margin bounds, and regularization , author=. Advances in neural information processing systems , volume=
[63]

Optimization Online , volume=

Kullback-Leibler divergence constrained distributionally robust optimization , author=. Optimization Online , volume=
[64]

2013 , publisher=

Perturbation analysis of optimization problems , author=. 2013 , publisher=

2013
[65]

1980 , publisher=

The Central Limit Theorem for Real and Banach Valued Random Variables , author=. 1980 , publisher=

1980
[66]

2019 , publisher=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , publisher=

2019
[67]

2000 , publisher=

Asymptotic statistics , author=. 2000 , publisher=

2000
[68]

Journal of Machine Learning Research , volume=

Covering number bounds of certain regularized linear function classes , author=. Journal of Machine Learning Research , volume=
[69]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Multinomial goodness-of-fit tests , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1984 , publisher=

1984
[70]

The Annals of Statistics , volume=

Learning models with uniform performance via distributionally robust optimization , author=. The Annals of Statistics , volume=. 2021 , publisher=

2021
[71]

Journal of Machine Learning Research , volume=

Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks , author=. Journal of Machine Learning Research , volume=
[72]

The Review of Economic Studies , volume =

Schennach, Susanne M , title =. The Review of Economic Studies , volume =. 2020 , month =. doi:10.1093/restud/rdz065 , url =

work page doi:10.1093/restud/rdz065 2020
[73]

Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

Hypothesis testing using pairwise distances and associated kernels , author=. Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=
[74]

Machine Learning , volume=

On learning sets and functions , author=. Machine Learning , volume=. 1989 , publisher=

1989
[75]

2005 , publisher=

Introduction to nonparametric regression , author=. 2005 , publisher=

2005
[76]

Journal of Combinatorial Theory, Series A , volume=

A generalization of Sauer's lemma , author=. Journal of Combinatorial Theory, Series A , volume=. 1995 , publisher=

1995
[77]

2015 , publisher=

Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, volume 87 of Progress in Nonlinear Differential Equations and Their Applications , author=. 2015 , publisher=

2015
[78]

2022 , institution =

The Dynamics of the Racial Wealth Gap , author =. 2022 , institution =

2022
[79]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Dynamic modelling of sparse longitudinal data and functional snippets with stochastic differential equations , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

2025
[80]

Annual review of statistics and its application , volume=

Statistical aspects of Wasserstein distances , author=. Annual review of statistics and its application , volume=. 2019 , publisher=

2019

Showing first 80 references.

[1] [1]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Causal inference on distribution functions , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2023 , publisher=

2023

[2] [2]

Journal of Machine Learning Research , volume=

Causal effect of functional treatment , author=. Journal of Machine Learning Research , volume=

[3] [3]

Journal of the Royal Statistical Society Series C: Applied Statistics , volume=

Causal inference with a functional outcome , author=. Journal of the Royal Statistical Society Series C: Applied Statistics , volume=. 2024 , publisher=

2024

[4] [4]

Journal of the American Statistical Association , pages=

Policy learning with distributional welfare , author=. Journal of the American Statistical Association , pages=. 2025 , publisher=

2025

[5] [5]

arXiv preprint arXiv:2501.06024 , year=

Doubly-robust functional average treatment effect estimation , author=. arXiv preprint arXiv:2501.06024 , year=

Pith/arXiv arXiv

[6] [6]

Econometrica , volume=

Who should be treated? empirical welfare maximization methods for treatment choice , author=. Econometrica , volume=. 2018 , publisher=

2018

[7] [7]

Econometrica , volume=

Policy learning with observational data , author=. Econometrica , volume=. 2021 , publisher=

2021

[8] [8]

Econometrica , volume=

Statistical treatment rules for heterogeneous populations , author=. Econometrica , volume=. 2004 , publisher=

2004

[9] [9]

Econometrica , volume=

Asymptotics for statistical treatment rules , author=. Econometrica , volume=. 2009 , publisher=

2009

[10] [10]

Journal of Econometrics , volume=

Minimax regret treatment choice with finite samples , author=. Journal of Econometrics , volume=. 2009 , publisher=

2009

[11] [11]

Operations Research , volume=

Offline multi-action policy learning: Generalization and optimization , author=. Operations Research , volume=. 2023 , publisher=

2023

[12] [12]

Journal of the American Statistical Association , volume=

Estimating individualized treatment rules using outcome weighted learning , author=. Journal of the American Statistical Association , volume=. 2012 , publisher=

2012

[13] [13]

The Journal of Machine Learning Research , volume=

Batch learning from logged bandit feedback through counterfactual risk minimization , author=. The Journal of Machine Learning Research , volume=. 2015 , publisher=

2015

[14] [14]

Journal of the American Statistical Association , volume=

Residual weighted learning for estimating individualized treatment rules , author=. Journal of the American Statistical Association , volume=. 2017 , publisher=

2017

[15] [15]

The Annals of Statistics , volume=

Policy learning “without” overlap: Pessimism and generalized empirical Bernstein’s inequality , author=. The Annals of Statistics , volume=. 2025 , publisher=

2025

[16] [16]

Advances in neural information processing systems , volume=

Balanced policy evaluation and learning , author=. Advances in neural information processing systems , volume=

[17] [17]

Management Science , volume=

Minimax-optimal policy learning under unobserved confounding , author=. Management Science , volume=. 2021 , publisher=

2021

[18] [18]

arXiv preprint arXiv:2305.11812 , year=

Off-policy evaluation beyond overlap: partial identification through smoothness , author=. arXiv preprint arXiv:2305.11812 , year=

arXiv

[19] [19]

International Conference on Artificial Intelligence and Statistics , pages=

Positivity-free policy learning with observational data , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2024 , organization=

2024

[20] [20]

International conference on artificial intelligence and statistics , pages=

Policy evaluation and optimization with continuous treatments , author=. International conference on artificial intelligence and statistics , pages=. 2018 , organization=

2018

[21] [21]

Journal of Econometrics , volume=

Data-driven policy learning for continuous treatments , author=. Journal of Econometrics , volume=. 2026 , publisher=

2026

[22] [22]

Advances in Neural Information Processing Systems , volume=

Semi-parametric efficient policy learning with continuous actions , author=. Advances in Neural Information Processing Systems , volume=

[23] [23]

arXiv preprint arXiv:2512.19230 , year=

Semiparametric Efficiency in Policy Learning with General Treatments , author=. arXiv preprint arXiv:2512.19230 , year=

Pith/arXiv arXiv

[24] [24]

Advances in neural information processing systems , volume=

Confounding-robust policy improvement , author=. Advances in neural information processing systems , volume=

[25] [25]

Management Science , volume=

Policy learning with adaptively collected data , author=. Management Science , volume=. 2024 , publisher=

2024

[26] [26]

Advances in neural information processing systems , volume=

Risk minimization from adaptively collected data: Guarantees for supervised and policy learning , author=. Advances in neural information processing systems , volume=

[27] [27]

Journal of the American Statistical Association , volume=

Quantile-optimal treatment regimes , author=. Journal of the American Statistical Association , volume=. 2018 , publisher=

2018

[28] [28]

Journal of Econometrics , volume=

Treatment recommendation with distributional targets , author=. Journal of Econometrics , volume=. 2023 , publisher=

2023

[29] [29]

arXiv preprint arXiv:2401.17909 , year=

Regularizing Discrimination in Optimal Policy Learning with Distributional Targets , author=. arXiv preprint arXiv:2401.17909 , year=

arXiv

[30] [30]

The Japanese Economic Review , volume=

Treatment choice, mean square regret and partial identification , author=. The Japanese Economic Review , volume=. 2023 , publisher=

2023

[31] [31]

The Japanese Economic Review , volume=

Statistical decision theory respecting stochastic dominance , author=. The Japanese Economic Review , volume=. 2023 , publisher=

2023

[32] [32]

arXiv preprint arXiv:2406.19604 , year=

Geodesic causal inference , author=. arXiv preprint arXiv:2406.19604 , year=

arXiv

[33] [33]

arXiv preprint arXiv:2503.05024 , year=

Kernel-based estimators for functional causal effects , author=. arXiv preprint arXiv:2503.05024 , year=

arXiv

[34] [34]

arXiv preprint arXiv:2506.22754 , year=

Doubly robust estimation of causal effects for random object outcomes with continuous treatments , author=. arXiv preprint arXiv:2506.22754 , year=

arXiv

[35] [35]

Journal of the American Statistical Association , volume=

Learning optimal distributionally robust individualized treatment rules , author=. Journal of the American Statistical Association , volume=. 2021 , publisher=

2021

[36] [36]

International Conference on Machine Learning , pages=

Doubly robust distributionally robust off-policy evaluation and learning , author=. International Conference on Machine Learning , pages=. 2022 , organization=

2022

[37] [37]

Advances in Neural Information Processing Systems , volume=

Factored DRO: Factored distributionally robust policies for contextual bandits , author=. Advances in Neural Information Processing Systems , volume=

[38] [38]

Management Science , volume=

Distributionally robust batch contextual bandits , author=. Management Science , volume=. 2023 , publisher=

2023

[39] [39]

arXiv preprint arXiv:2205.05561 , volume=

Externally valid treatment choice , author=. arXiv preprint arXiv:2205.05561 , volume=

arXiv

[40] [40]

arXiv preprint arXiv:2205.04637 , year=

Distributionally robust policy learning with wasserstein distance , author=. arXiv preprint arXiv:2205.04637 , year=

arXiv

[41] [41]

Transactions on Machine Learning Research , issn=

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits , author=. Transactions on Machine Learning Research , issn=. 2024 , url=

2024

[42] [42]

arXiv preprint arXiv:2402.02535 , year=

Data-driven Policy Learning for a Continuous Treatment , author=. arXiv preprint arXiv:2402.02535 , year=

arXiv

[43] [43]

Handbook of econometrics , volume=

Empirical process methods in econometrics , author=. Handbook of econometrics , volume=. 1994 , publisher=

1994

[44] [44]

2013 , publisher=

Probability theory: a comprehensive course , author=. 2013 , publisher=

2013

[45] [45]

Publications Math

Concentration of measure and isoperimetric inequalities in product spaces , author=. Publications Math. 1995 , publisher=

1995

[46] [46]

Econometrica , volume=

Model selection for treatment choice: Penalized welfare maximization , author=. Econometrica , volume=. 2021 , publisher=

2021

[47] [47]

Causal Inference on Distribution Functions , publisher =

Lin, Zhenhua and Kong, Dehan and Wang, Linbo , keywords =. Causal Inference on Distribution Functions , publisher =. 2021 , copyright =. doi:10.48550/ARXIV.2101.01599 , url =

work page doi:10.48550/arxiv.2101.01599 2021

[48] [48]

Stat , volume=

Variable selection in function-on-scalar regression , author=. Stat , volume=. 2016 , publisher=

2016

[49] [49]

Journal of the American statistical association , volume=

Functional data analysis for sparse longitudinal data , author=. Journal of the American statistical association , volume=. 2005 , publisher=

2005

[50] [50]

Journal of the American Statistical Association , volume=

An accelerated-time model for response curves , author=. Journal of the American Statistical Association , volume=. 1997 , publisher=

1997

[51] [51]

arXiv preprint arXiv:1410.8516 , year=

Nice: Non-linear independent components estimation , author=. arXiv preprint arXiv:1410.8516 , year=

Pith/arXiv arXiv

[52] [52]

International conference on machine learning , pages=

Variational inference with normalizing flows , author=. International conference on machine learning , pages=. 2015 , organization=

2015

[53] [53]

Advances in neural information processing systems , volume=

Neural ordinary differential equations , author=. Advances in neural information processing systems , volume=

[54] [54]

International Conference on Learning Representations , year=

FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models , author=. International Conference on Learning Representations , year=

[55] [55]

The Econometrics Journal , volume=

Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning , author=. The Econometrics Journal , volume=. 2018 , publisher=

2018

[56] [56]

International Conference on Machine Learning , pages=

Orthogonal machine learning: Power and limitations , author=. International Conference on Machine Learning , pages=. 2018 , organization=

2018

[57] [57]

Advances in neural information processing systems , volume=

Optimization over continuous and multi-dimensional decisions with observational data , author=. Advances in neural information processing systems , volume=

[58] [58]

The Econometrics Journal , volume=

Debiased machine learning of conditional average treatment effects and other causal functions , author=. The Econometrics Journal , volume=. 2021 , publisher=

2021

[59] [59]

2017 , institution=

Efficient Policy Learning , author=. 2017 , institution=

2017

[60] [60]

Operations Research , year=

Offline multi-action policy learning: Generalization and optimization , author=. Operations Research , year=

[61] [61]

Journal of Machine Learning Research , volume=

Rademacher and Gaussian complexities: Risk bounds and structural results , author=. Journal of Machine Learning Research , volume=

[62] [62]

Advances in neural information processing systems , volume=

On the complexity of linear prediction: Risk bounds, margin bounds, and regularization , author=. Advances in neural information processing systems , volume=

[63] [63]

Optimization Online , volume=

Kullback-Leibler divergence constrained distributionally robust optimization , author=. Optimization Online , volume=

[64] [64]

2013 , publisher=

Perturbation analysis of optimization problems , author=. 2013 , publisher=

2013

[65] [65]

1980 , publisher=

The Central Limit Theorem for Real and Banach Valued Random Variables , author=. 1980 , publisher=

1980

[66] [66]

2019 , publisher=

High-dimensional statistics: A non-asymptotic viewpoint , author=. 2019 , publisher=

2019

[67] [67]

2000 , publisher=

Asymptotic statistics , author=. 2000 , publisher=

2000

[68] [68]

Journal of Machine Learning Research , volume=

Covering number bounds of certain regularized linear function classes , author=. Journal of Machine Learning Research , volume=

[69] [69]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Multinomial goodness-of-fit tests , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1984 , publisher=

1984

[70] [70]

The Annals of Statistics , volume=

Learning models with uniform performance via distributionally robust optimization , author=. The Annals of Statistics , volume=. 2021 , publisher=

2021

[71] [71]

Journal of Machine Learning Research , volume=

Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks , author=. Journal of Machine Learning Research , volume=

[72] [72]

The Review of Economic Studies , volume =

Schennach, Susanne M , title =. The Review of Economic Studies , volume =. 2020 , month =. doi:10.1093/restud/rdz065 , url =

work page doi:10.1093/restud/rdz065 2020

[73] [73]

Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

Hypothesis testing using pairwise distances and associated kernels , author=. Proceedings of the 29th International Coference on International Conference on Machine Learning , pages=

[74] [74]

Machine Learning , volume=

On learning sets and functions , author=. Machine Learning , volume=. 1989 , publisher=

1989

[75] [75]

2005 , publisher=

Introduction to nonparametric regression , author=. 2005 , publisher=

2005

[76] [76]

Journal of Combinatorial Theory, Series A , volume=

A generalization of Sauer's lemma , author=. Journal of Combinatorial Theory, Series A , volume=. 1995 , publisher=

1995

[77] [77]

2015 , publisher=

Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling, volume 87 of Progress in Nonlinear Differential Equations and Their Applications , author=. 2015 , publisher=

2015

[78] [78]

2022 , institution =

The Dynamics of the Racial Wealth Gap , author =. 2022 , institution =

2022

[79] [79]

Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=

Dynamic modelling of sparse longitudinal data and functional snippets with stochastic differential equations , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2025 , publisher=

2025

[80] [80]

Annual review of statistics and its application , volume=

Statistical aspects of Wasserstein distances , author=. Annual review of statistics and its application , volume=. 2019 , publisher=

2019