arxiv: 2605.08485 · v1 · submitted 2026-05-08 · 📊 stat.ML · cs.LG· math.ST· stat.ME· stat.TH

Recognition: 2 theorem links

· Lean Theorem

Sinkhorn Treatment Effects: A Causal Optimal Transport Measure

Alex Luedtke, Medha Agarwal

Pith reviewed 2026-05-12 01:25 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.MEstat.TH

keywords Sinkhorn treatment effectentropic optimal transportcounterfactual distributionsdistributional treatment effectsdebiased estimationpathwise differentiabilitycausal inferenceregularized optimal transport

0 comments

The pith

The Sinkhorn treatment effect measures divergence between entire counterfactual distributions via entropic optimal transport and admits debiased estimators for valid tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the Sinkhorn treatment effect as an entropic optimal transport divergence between the counterfactual outcome distributions under treatment and control. Unlike average treatment effects, this quantity registers any difference in the full shapes of those distributions. The authors express the divergence as a smooth transformation of counterfactual mean embeddings, which establishes pathwise differentiability and permits construction of debiased estimators. These estimators yield asymptotically normal limits, supporting valid tests of the null that the counterfactual distributions coincide, at a fixed regularization level. An aggregated test combines results over a grid of regularization values to guard against power loss from a poor choice of that level. If the development is correct, analysts obtain a nonparametric, distribution-wide tool for detecting treatment effects that standard mean-based methods would miss.

Core claim

The Sinkhorn treatment effect is introduced as the entropic optimal transport divergence between counterfactual distributions. This functional is shown to equal a smooth map applied to the counterfactual mean embeddings under an appropriate kernel. The smoothness yields first-order pathwise differentiability in general and second-order pathwise differentiability under the null of equal counterfactual distributions. These properties allow construction of debiased estimators that are asymptotically normal, thereby delivering asymptotically valid tests for distributional treatment effects at any fixed entropic regularization parameter. An aggregated test is further proposed that pools evidence,

What carries the argument

The Sinkhorn treatment effect, defined as the entropic optimal transport divergence between counterfactual outcome distributions and represented as a smooth functional of their mean embeddings.

If this is right

Debiased estimators for the Sinkhorn treatment effect converge to a normal limit at the expected rate.
Hypothesis tests for equality of counterfactual distributions control type-I error asymptotically at a fixed regularization level.
An aggregated test over a grid of regularization values combines evidence without requiring knowledge of the optimal level in advance.
The procedure detects distributional shifts on both simulated data and real image datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same kernel-embedding route may extend inference to other causal functionals that involve optimal transport distances.
The method supplies a concrete way to test for treatment effects when only shape or tail differences are expected rather than mean shifts.
Data-driven aggregation rules could replace the fixed grid while preserving asymptotic control.

Load-bearing premise

Counterfactual mean embeddings must exist in a reproducing kernel Hilbert space so that the entropic divergence becomes a differentiable functional of those embeddings.

What would settle it

A large-sample simulation in which the two counterfactual distributions are identical yet the test rejects the null at a rate exceeding the nominal level would refute the claim of asymptotic validity.

Figures

Figures reproduced from arXiv: 2605.08485 by Alex Luedtke, Medha Agarwal.

**Figure 1.** Figure 1: MMD vs. Sinkhorn divergence across increasing separation θ between counterfactual outcome distribution under control P0 = N (02, I2) and under treatment P1,θ = 1 2N (−θ12, I2) + 1 2N (θ12, I2). The average treatment effect is zero for all θ > 0. Here Dθ denotes either MMD or Sinkhorn divergence between P0 and P1,θ. As θ increases, the distributions diverge and MMD saturates, failing to distinguish between … view at source ↗

**Figure 2.** Figure 2: I: Type I error of the MTE and STE under null (θ = 0.0); II: Power of MTE and STE under increasing separation between counterfactual distributions (increasing θ) under Exp (i); III: Power of MTE and STE under increasing separation between counterfactual distributions (increasing θ) under Exp (ii); IV: Mean squared error of plugin vs one-step STE for θ = 1.6 from Exp (i); V: Coverage of Wald-type 95% confid… view at source ↗

**Figure 3.** Figure 3: Mean and covariance ellipsoids (95%) of counterfactual outcome distributions under varying gap between P0 and P1, parametrized by θ. Exp (i): Mean difference experiment P0 = N(02, Σ) and P1 = N(θ12, Σ). Exp (ii): Covariance difference experiment P0 = N(02, Σ) and P1 = N(02, Σ + θ∆). Simulations for aggregated test. We replicate the simulation setup of Exp (ii), but now evaluate the tests over a grid of ε v… view at source ↗

**Figure 4.** Figure 4: Type-I error and power in Exp (ii) for the aggregated procedures MTEAgg and STEAgg, together with the corresponding MTEand STE-based tests evaluated on a finite grid of kernel bandwidth parameters ε = ηm; m = median heuristic I.1.2. PCAM DATASET We provide here the exact data-generating mechanism used for the image-outcome experiments. For each unit, covariates are generated as X ∼ N (0, I5). Conditional … view at source ↗

**Figure 5.** Figure 5: Type 1 error (far left point) and power (all other points) of STE and MTE as a function of the treatment success probability for the PCam dataset. I.2. Compute details The code was written in Python 3 and we use PyTorch for automatic differentiation. All our experiments were conducted on a CUDA-enabled machine with 12GB GPU memory, 64GB RAM, and 24 vCPUs. Although the experiments were run on a GPU, we obse… view at source ↗

**Figure 6.** Figure 6: Wall-clock runtime (in seconds) and memory usage (in MB) for the simulation setup across increasing sample sizes n, averaged over 20 Monte Carlo simulations for both GPU and CPU implementations. I.3. Acceleration recommendations We now make the computational bottlenecks of the second-order one-step STE estimator explicit and summarize practical acceleration strategies. From Sec. 5, the total computational … view at source ↗

read the original abstract

We introduce the Sinkhorn treatment effect, an entropic optimal transport measure of divergence between counterfactual distributions. Unlike classical quantities such as the average treatment effect, this measure captures differences across entire distributions. We analyze this divergence as a statistical functional and show it can be written as a smooth transformation of counterfactual mean embeddings with an appropriate kernel. This characterization allows us to establish first-order pathwise differentiability in general, and second-order pathwise differentiability under the null hypothesis of equal counterfactual distributions. Leveraging this smoothness, we construct debiased estimators and use them to obtain asymptotically valid tests for distributional treatment effects with a fixed entropic regularization parameter. Because the power of the test depends on this unknown parameter, we further propose an aggregated test that combines evidence across a grid of regularization choices. Experiments on simulated and image data demonstrate the practical advantages of our estimator and testing procedure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a Sinkhorn treatment effect via entropic OT on counterfactual distributions and uses mean embeddings to get pathwise differentiability for debiased estimators and tests.

read the letter

The core contribution is a new divergence that measures treatment effects across full counterfactual distributions instead of just averages. They represent it as a smooth map of counterfactual mean embeddings, which gives first-order pathwise differentiability in general and second-order under the null. That smoothness supports debiased estimators and asymptotically valid tests, plus an aggregated version over a grid of regularization values to deal with the fact that power depends on the unknown parameter. Experiments on simulated data and images are included to show the approach in practice. This combination of entropic OT with causal functionals looks new relative to standard ATE or plain OT distances. The technical program is coherent and the aggregation step directly addresses a practical limitation. Soft spots are modest but real. The differentiability claims and estimator construction depend on kernel choice and fixed regularization, and the abstract does not spell out the full error bounds or proof steps. The second-order result only under the null also narrows some uses. Experiments would benefit from clearer baselines and sensitivity checks on the kernel and grid. This is for causal inference researchers who already work with distributional outcomes or embeddings and want tests beyond means. A reader focused on statistical guarantees for OT-based functionals will find the most value. The work is coherent enough on its own terms to deserve a serious referee, even if revisions will likely be needed on the proof details and experiment depth.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces the Sinkhorn treatment effect, an entropic optimal transport divergence between counterfactual distributions, as a measure of distributional treatment effects. It characterizes the quantity as a smooth transformation of counterfactual mean embeddings under a suitable kernel, establishes first-order pathwise differentiability in general and second-order pathwise differentiability under the null of equal counterfactual distributions, constructs debiased estimators, and derives asymptotically valid tests for fixed entropic regularization. An aggregated test over a grid of regularization values is proposed to mitigate power dependence on the unknown parameter. The approach is illustrated on simulated data and image data.

Significance. If the differentiability and asymptotic results hold, the work supplies a computationally tractable, kernel-based causal OT functional that enables rigorous inference on full distributional shifts rather than moments alone. The explicit treatment of the regularization parameter via aggregation and the construction of debiased estimators are practical strengths that could support applications in causal machine learning where testing equality of counterfactual laws is required.

major comments (1)

The abstract asserts first- and second-order pathwise differentiability together with asymptotic validity of the debiased estimators, yet the provided text supplies no explicit conditions on the kernel, no error bounds, and no verification steps for the second-order expansion under the null. Without these details the support for the central claims on differentiability and test validity cannot be fully assessed.

minor comments (2)

The dependence of test power on the regularization parameter is acknowledged, but the precise aggregation procedure (weights, grid construction) would benefit from an explicit algorithmic statement.
Notation for the counterfactual mean embeddings and the Sinkhorn divergence should be introduced with a self-contained definition before the differentiability arguments.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We have revised the manuscript to supply the missing explicit conditions, error bounds, and verification steps for the differentiability claims.

read point-by-point responses

Referee: The abstract asserts first- and second-order pathwise differentiability together with asymptotic validity of the debiased estimators, yet the provided text supplies no explicit conditions on the kernel, no error bounds, and no verification steps for the second-order expansion under the null. Without these details the support for the central claims on differentiability and test validity cannot be fully assessed.

Authors: We agree that the original submission did not provide sufficient explicit conditions or verification details. In the revised manuscript we have added a dedicated subsection (Section 3.2) stating the required kernel assumptions (bounded, continuous, and characteristic kernels with finite RKHS norm), derived explicit first- and second-order pathwise derivative bounds under these conditions, and included a full verification of the second-order expansion under the null (Appendix B.3). These additions directly support the asymptotic validity of the debiased estimators and the proposed tests. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper defines the Sinkhorn treatment effect directly as an entropic OT divergence on counterfactual distributions, represents it as a smooth functional of mean embeddings via an appropriate kernel, and derives first- and second-order pathwise differentiability from that representation using standard functional analysis. Debiased estimators and asymptotic tests follow from the differentiability, with the fixed regularization parameter explicitly acknowledged and addressed via a separate aggregation proposal. No load-bearing step reduces a claimed result to a fitted input, self-citation chain, or definitional tautology; all steps rest on external OT and statistical functional theory.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on the abstract alone, the central claims rest on the existence of a suitable kernel for mean embeddings and on standard regularity conditions for pathwise differentiability of statistical functionals; no explicit free parameters beyond the fixed regularization strength are named, and no new physical entities are postulated.

axioms (2)

domain assumption Existence of an appropriate positive definite kernel that induces the mean embeddings of counterfactual distributions
Invoked to represent the Sinkhorn divergence as a smooth transformation of mean embeddings
domain assumption Standard regularity conditions for pathwise differentiability of the statistical functional
Required to establish first- and second-order differentiability

pith-pipeline@v0.9.0 · 5449 in / 1449 out tokens · 40492 ms · 2026-05-12T01:25:00.242923+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel (J uniqueness) unclear
We define the Sinkhorn treatment effect ... S(P)=Sε∘J∘Ψ(P) ... first-order pathwise differentiability ... second-order ... under the null ... ¨SP = [I−(I−(I−PX)PA|X)PY|A,X]⊗2 gP with gP=ω⊗2P ⊙ kP1
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
Kμ=ε(I−T²μ)⁻¹Hμ ... Hadamard operator of Sinkhorn divergence ... bilinear form γ1(Kμγ2)

Reference graph

Works this paper leans on

219 extracted references · 219 canonical work pages · 1 internal anchor

[1]

2003 , publisher=

Topics in optimal transportation , author=. 2003 , publisher=

work page 2003
[2]

The Annals of Statistics , pages=

On differentiable functionals , author=. The Annals of Statistics , pages=. 1991 , publisher=

work page 1991
[3]

1993 , publisher=

Efficient and adaptive estimation for semiparametric models , author=. 1993 , publisher=

work page 1993
[4]

Transportation cost-information inequalities and applications to random dynamical systems and diffusions , author=. Ann. Probab. , volume=

work page
[5]

Journal of the American statistical Association , volume=

Estimation of regression coefficients when some regressors are not always observed , author=. Journal of the American statistical Association , volume=. 1994 , publisher=

work page 1994
[6]

Journal of mathematical physics , volume=

Dynamics and kinematics of reciprocal diffusions , author=. Journal of mathematical physics , volume=. 1993 , pages=

work page 1993
[7]

1953 , PAGES =

Rudin, Walter , TITLE =. 1953 , PAGES =

work page 1953
[8]

Optimal transport for applied mathematicians:

Santambrogio, Filippo , volume=. Optimal transport for applied mathematicians:. 2015 , publisher=

work page 2015
[9]

Carlier, Guillaume and Galichon, Alfred and Santambrogio, Filippo , journal=. From. 2010 , publisher=

work page 2010
[10]

The annals of mathematical statistics , volume=

Remarks on a multivariate transformation , author=. The annals of mathematical statistics , volume=. 1952 , publisher=

work page 1952
[11]

, author=

Contributions to the theory of convex bodies. , author=. Michigan Mathematical Journal , volume=. 1957 , publisher=

work page 1957
[12]

2009 , publisher=

Optimal transport: old and new , author=. 2009 , publisher=

work page 2009
[13]

International Conference on Artificial Intelligence and Statistics , volume=

The expressive power of a class of normalizing flow models , author=. International Conference on Artificial Intelligence and Statistics , volume=. 2020 , publisher=

work page 2020
[14]

Handbook of Uncertainty Quantification , volume=

An introduction to sampling via measure transport , author=. Handbook of Uncertainty Quantification , volume=. 2016 , publisher=

work page 2016
[15]

Advances in neural information processing systems , volume=

Improved variational inference with inverse autoregressive flow , author=. Advances in neural information processing systems , volume=

work page
[16]

Advances in neural information processing systems , volume=

Masked autoregressive flow for density estimation , author=. Advances in neural information processing systems , volume=

work page
[17]

, author=

Normalizing Flows for Probabilistic Modeling and Inference. , author=. J. Mach. Learn. Res. , volume=

work page
[18]

NICE: Non-linear Independent Components Estimation

Nice: Non-linear independent components estimation , author=. arXiv preprint arXiv:1410.8516 , year=

work page internal anchor Pith review arXiv
[19]

Normalizing flows:

Kobyzev, Ivan and Prince, Simon JD and Brubaker, Marcus A , journal=. Normalizing flows:. 2020 , publisher=

work page 2020
[20]

Sbornik: Mathematics , volume=

Triangular transformations of measures , author=. Sbornik: Mathematics , volume=. 2005 , publisher=

work page 2005
[21]

International Conference on Machine Learning , pages=

Neural autoregressive flows , author=. International Conference on Machine Learning , pages=. 2018 , organization=

work page 2018
[22]

International Conference on Machine Learning , pages=

Sum-of-squares polynomial flow , author=. International Conference on Machine Learning , pages=. 2019 , organization=

work page 2019
[23]

International Conference on Machine Learning , pages=

Input convex neural networks , author=. International Conference on Machine Learning , pages=. 2017 , organization=

work page 2017
[24]

A Style-Based Generator Architecture for Generative Adversarial Networks , year=

Karras, Tero and Laine, Samuli and Aila, Timo , booktitle=. A Style-Based Generator Architecture for Generative Adversarial Networks , year=

work page
[25]

Advances in neural information processing systems , volume=

Unconstrained monotonic neural networks , author=. Advances in neural information processing systems , volume=

work page
[26]

Advances in neural information processing systems , volume=

The expressive power of neural networks: A view from the width , author=. Advances in neural information processing systems , volume=

work page
[27]

Advances in neural information processing systems , volume=

Resnet with one-neuron hidden layers is a universal approximator , author=. Advances in neural information processing systems , volume=

work page
[28]

Optimal transport mapping via input convex neural networks , year=

Makkuva, Ashol and Amirhossein, Taghvaei and Lee, Jason and Oh, Sewoong , booktitle=. Optimal transport mapping via input convex neural networks , year=

work page
[29]

Optimal Control Via Neural Networks:

Chen, Yize and Shi,Yuanyuan and Zhang, Baosen , booktitle=. Optimal Control Via Neural Networks:. 2018 , volume=

work page 2018
[30]

Kakade and Shai Shalev-Shwartz , year=

Sham M. Kakade and Shai Shalev-Shwartz , year=. On the duality of strong convexity and strong smoothness :

work page
[31]

Archiv der Mathematik , author=

A note on the measurability of convex sets , volume=. Archiv der Mathematik , author=. 1986 , pages=. doi:10.1007/bf01202504 , number=

work page doi:10.1007/bf01202504 1986
[32]

Peyr´ e and M

Computational Optimal Transport , publisher =. 2018 , copyright =. doi:10.48550/ARXIV.1803.00567 , author =

work page doi:10.48550/arxiv.1803.00567 2018
[33]

Attention is All you Need , url =

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, ukasz and Polosukhin, Illia , booktitle =. Attention is All you Need , url =

work page
[34]

and Ablin, Pierre and Blondel, Mathieu and Peyr\'e, Gabriel , booktitle =

Sander, Michael E. and Ablin, Pierre and Blondel, Mathieu and Peyr\'e, Gabriel , booktitle =. Sinkformers:. 2022 , editor =

work page 2022
[35]

and Yor, M

Revuz, D. and Yor, M. , isbn=. Continuous Martingales and. 2004 , publisher=

work page 2004
[36]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=

work page
[37]

A survey of the

L. A survey of the. Discrete Contin. Dyn. Syst. , FJOURNAL =. 2014 , NUMBER =. doi:10.3934/dcds.2014.34.1533 , URL =

work page doi:10.3934/dcds.2014.34.1533 2014
[38]

Note on the

R\". Note on the. Statist. Probab. Lett. , FJOURNAL =. 1993 , NUMBER =. doi:10.1016/0167-7152(93)90257-J , URL =

work page doi:10.1016/0167-7152(93)90257-j 1993
[39]

Gradient estimates for the

Alberto Chiarini and Giovanni Conforti and Giacomo Greco and Luca Tamanini , year=. Gradient estimates for the. 2207.14262 , archivePrefix=

work page arXiv
[40]

2013 , publisher=

Introduction to Partial Differential Equations , author=. 2013 , publisher=

work page 2013
[41]

Entropic Optimal Transport between Unbalanced

Janati, Hicham and Muzellec, Boris and Peyr\'. Entropic Optimal Transport between Unbalanced. 2020 , isbn =

work page 2020
[42]

Gradient flows in metric spaces and in the space of probability measures , SERIES =

Ambrosio, Luigi and Gigli, Nicola and Savar\'. Gradient flows in metric spaces and in the space of probability measures , SERIES =. 2008 , PAGES =

work page 2008
[44]

Conforti, Giovanni and Tamanini, Luca , TITLE =. J. Funct. Anal. , FJOURNAL =. 2021 , NUMBER =. doi:10.1016/j.jfa.2021.108964 , URL =

work page doi:10.1016/j.jfa.2021.108964 2021
[45]

Csisz\'. Ann. Probability , FJOURNAL =. 1975 , PAGES =. doi:10.1214/aop/1176996454 , URL =

work page doi:10.1214/aop/1176996454 1975
[46]

2019 , note=

On the difference between entropic cost and the optimal transport cost , author=. 2019 , note=

work page 2019
[47]

, TITLE =

Berman, Robert J. , TITLE =. Numer. Math. , FJOURNAL =. 2020 , NUMBER =. doi:10.1007/s00211-020-01127-x , URL =

work page doi:10.1007/s00211-020-01127-x 2020
[48]

Penalized discriminant analysis.The Annals of Statistics, 23(1), February 1995

R\". Convergence of the iterative proportional fitting procedure , JOURNAL =. 1995 , NUMBER =. doi:10.1214/aos/1176324703 , URL =

work page doi:10.1214/aos/1176324703 1995
[49]

Nutz, Marcel and Wiesel, Johannes , TITLE =. Probab. Theory Related Fields , FJOURNAL =. 2022 , NUMBER =. doi:10.1007/s00440-021-01096-8 , URL =

work page doi:10.1007/s00440-021-01096-8 2022
[50]

2022 , eprint=

Entropic estimation of optimal transport maps , author=. 2022 , eprint=

work page 2022
[51]

Sample Complexity of

Aude Genevay and Lénaic Chizat and Francis Bach and Marco Cuturi and Gabriel Peyré , year=. Sample Complexity of. 1810.02733 , archivePrefix=

work page arXiv
[52]

An entropic generalization of

Sinho Chewi and Aram-Alexandre Pooladian , year=. An entropic generalization of. 2203.04954 , archivePrefix=

work page arXiv
[53]

Weak semiconvexity estimates for

Giovanni Conforti , year=. Weak semiconvexity estimates for. 2301.00083 , archivePrefix=

work page arXiv
[54]

Gigli, Nicola and Tamanini, Luca , TITLE =. Probab. Theory Related Fields , FJOURNAL =. 2020 , NUMBER =. doi:10.1007/s00440-019-00909-1 , URL =

work page doi:10.1007/s00440-019-00909-1 2020
[55]

Lipschitz Continuity of the

Guillaume Carlier and Lénaïc Chizat and Maxime Laborde , year=. Lipschitz Continuity of the. 2210.00225 , archivePrefix=

work page arXiv
[56]

Carlier, Guillaume and Laborde, Maxime , TITLE =. SIAM J. Math. Anal. , FJOURNAL =. 2020 , NUMBER =. doi:10.1137/19M1253800 , URL =

work page doi:10.1137/19m1253800 2020
[57]

An entropic generalization of

Chewi, Sinho and Pooladian, Aram-Alexandre , journal=. An entropic generalization of

work page
[58]

Jordan, Richard and Kinderlehrer, David and Otto, Felix , TITLE =. SIAM J. Math. Anal. , FJOURNAL =. 1998 , NUMBER =. doi:10.1137/S0036141096303359 , URL =

work page doi:10.1137/s0036141096303359 1998
[59]

Weighted

Bolley, Fran. Weighted. Ann. Fac. Sci. Toulouse Math. (6) , FJOURNAL =. 2005 , NUMBER =

work page 2005
[60]

, TITLE =

Karatzas, Ioannis and Shreve, Steven E. , TITLE =. 1991 , PAGES =. doi:10.1007/978-1-4612-0949-2 , URL =

work page doi:10.1007/978-1-4612-0949-2 1991
[61]

Fathi, Max and Gozlan, Nathael and Prod'homme, Maxime , TITLE =. Calc. Var. Partial Differential Equations , FJOURNAL =. 2020 , NUMBER =. doi:10.1007/s00526-020-01754-0 , URL =

work page doi:10.1007/s00526-020-01754-0 2020
[62]

Mallasto, Anton and Gerolin, Augusto and Minh, H\`a Quang , TITLE =. Inf. Geom. , FJOURNAL =. 2022 , NUMBER =. doi:10.1007/s41884-021-00052-8 , URL =

work page doi:10.1007/s41884-021-00052-8 2022
[63]

Stochastic derivatives and generalized h-transforms of

Christian L. Stochastic derivatives and generalized h-transforms of. 2011 , eprint=

work page 2011
[64]

A systematic approach to

Moucer, C. A systematic approach to. SIAM Journal on Optimization , volume=. 2023 , publisher=

work page 2023
[65]

Mathematical Programming , volume=

A simplified view of first order methods for optimization , author=. Mathematical Programming , volume=. 2018 , publisher=

work page 2018
[66]

Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki , volume=

Gradient methods for minimizing functionals , author=. Zhurnal Vychislitel'noi Matematiki i Matematicheskoi Fiziki , volume=. 1963 , publisher=

work page 1963
[67]

Knight, Philip A , journal=. The. 2008 , publisher=

work page 2008
[68]

2014 , PAGES =

Bakry, Dominique and Gentil, Ivan and Ledoux, Michel , TITLE =. 2014 , PAGES =. doi:10.1007/978-3-319-00227-9 , URL =

work page doi:10.1007/978-3-319-00227-9 2014
[69]

, TITLE =

Bobkov, Sergey G. , TITLE =. Electron. J. Probab. , FJOURNAL =. 2022 , PAGES =. doi:10.1214/22-ejp834 , URL =

work page doi:10.1214/22-ejp834 2022
[70]

Stochastic analysis, filtering, and stochastic optimization , PAGES =

Karatzas, Ioannis and Tschiderer, Bertram , TITLE =. Stochastic analysis, filtering, and stochastic optimization , PAGES =. [2022] 2022 , ISBN =. doi:10.1007/978-3-030-98519-6\_10 , URL =

work page doi:10.1007/978-3-030-98519-6 2022
[71]

ESAIM Control Optim

Clerc, Gauthier , TITLE =. ESAIM Control Optim. Calc. Var. , FJOURNAL =. 2022 , PAGES =. doi:10.1051/cocv/2022033 , URL =

work page doi:10.1051/cocv/2022033 2022
[72]

From the

L. From the. J. Funct. Anal. , FJOURNAL =. 2012 , NUMBER =. doi:10.1016/j.jfa.2011.11.026 , URL =

work page doi:10.1016/j.jfa.2011.11.026 2012
[73]

2018 , publisher=

Lectures on convex optimization , author=. 2018 , publisher=

work page 2018
[74]

2010 , PAGES =

Evans, Lawrence C. , TITLE =. 1998 , PAGES =. doi:10.1090/gsm/019 , URL =

work page doi:10.1090/gsm/019 1998
[75]

2007 , PAGES =

Royer, Gilles , TITLE =. 2007 , PAGES =

work page 2007
[76]

Conforti, Giovanni and Von Renesse, Max , TITLE =. Probab. Theory Related Fields , FJOURNAL =. 2018 , NUMBER =. doi:10.1007/s00440-017-0814-9 , URL =

work page doi:10.1007/s00440-017-0814-9 2018
[77]

Advances in Neural Information Processing Systems , volume=

Integration methods and optimization algorithms , author=. Advances in Neural Information Processing Systems , volume=

work page
[78]

2018 , publisher =

Vershynin, Roman , TITLE =. 2018 , PAGES =. doi:10.1017/9781108231596 , URL =

work page doi:10.1017/9781108231596 2018
[79]

Duke Math

Bernton, Espen and Ghosal, Promit and Nutz, Marcel , TITLE =. Duke Math. J. , FJOURNAL =. 2022 , NUMBER =. doi:10.1215/00127094-2022-0035 , URL =

work page doi:10.1215/00127094-2022-0035 2022
[80]

2024 , note=

The emergence of clusters in self-attention dynamics , author=. 2024 , note=

work page 2024
[81]

and Coifman, Ronald R

Marshall, Nicholas F. and Coifman, Ronald R. , TITLE =. IMA J. Appl. Math. , FJOURNAL =. 2019 , NUMBER =. doi:10.1093/imamat/hxy065 , URL =

work page doi:10.1093/imamat/hxy065 2019

Showing first 80 references.