pith. machine review for the scientific record. sign in

arxiv: 2604.26942 · v1 · submitted 2026-04-29 · 💻 cs.LG · math.ST· q-bio.GN· stat.ME· stat.ML· stat.TH

Recognition: unknown

Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport

Insung Kong, Johannes Schmidt-Hieber, Shayan Hundrieser

Authors on Pith no claims yet

Pith reviewed 2026-05-07 08:21 UTC · model grok-4.3

classification 💻 cs.LG math.STq-bio.GNstat.MEstat.MLstat.TH
keywords input convex neural networksmaxout networksoptimal transportconvex regressionshape constrained learningneural optimal transporthigh-dimensional approximationsingle-cell data
0
0 comments X

The pith

HyCNNs require exponentially fewer parameters than ICNNs to approximate quadratic functions to a given precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Hyper Input Convex Neural Networks (HyCNNs) that combine maxout units with input convex neural network designs to guarantee convexity in the input while allowing depth to improve expressivity. The central proof shows that this structure approximates any quadratic function to fixed precision using exponentially fewer parameters than standard ICNNs. Experiments on synthetic convex regression and interpolation tasks demonstrate lower error rates than both ICNNs and ordinary MLPs. The same networks are applied to learning high-dimensional optimal transport maps, where they match or exceed ICNN-based methods on synthetic distributions and single-cell RNA data.

Core claim

HyCNNs integrate maxout activations into the ICNN framework so that the output remains convex in the input by construction while depth can be used effectively. The key theoretical result is that the parameter count needed to approximate quadratic functions to any given accuracy scales exponentially better than in prior ICNN designs. This efficiency translates into more stable training at scale and stronger performance on convex regression, interpolation, and optimal transport map estimation tasks.

What carries the argument

Hyper Input Convex Neural Networks (HyCNNs), which replace selected layers in ICNNs with maxout units to preserve guaranteed input convexity while improving parameter efficiency and depth utilization.

If this is right

  • HyCNNs achieve any fixed approximation error on quadratic targets with exponentially fewer parameters than ICNNs.
  • HyCNNs produce lower prediction error than ICNNs and standard MLPs on synthetic convex regression and interpolation tasks.
  • HyCNNs learn high-dimensional optimal transport maps that often outperform those obtained from ICNN-based neural optimal transport methods on both synthetic and single-cell RNA datasets.
  • HyCNN training remains reliable when the network depth and width are increased, addressing a known limitation of earlier ICNN constructions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The exponential parameter savings could enable convex-constrained models in problem dimensions where ICNNs become computationally infeasible.
  • The architecture may transfer to other tasks that require monotonicity or convexity guarantees, such as utility function estimation or certain physics-informed learning problems.
  • Practitioners could adopt HyCNNs as a drop-in replacement in existing optimal transport pipelines to gain accuracy without altering the overall optimization setup.

Load-bearing premise

The approach assumes that the target functions of interest are well approximated by the HyCNN structure and that input convexity together with training stability continue to hold when the networks are scaled up.

What would settle it

Compare the smallest number of parameters required by a HyCNN versus a standard ICNN to approximate the squared Euclidean norm (a simple quadratic) to within a fixed small error in dimension 10 or higher; absence of an exponential gap in parameter count would refute the efficiency claim.

Figures

Figures reproduced from arXiv: 2604.26942 by Insung Kong, Johannes Schmidt-Hieber, Shayan Hundrieser.

Figure 1
Figure 1. Figure 1: Input convex neural networks (ICNNs) versus Hyper Input Convex Neural Networks (HyCNNs). (a) Depiction of generic input convex network architecture. ICNN blocks comprise one lane, whereas Hyper ICNN (HyCNN) blocks involve two lanes of matrix operations. (b) Averaged Empirical MSE over 10 repetitions with (10% − 90%) confidence bands for learning the function f0(x) = ∥x∥ 2 2 on R d with d = 50 based on n = … view at source ↗
Figure 2
Figure 2. Figure 2: Signal propagation and gradient profiles across relative depth in HyCNN at initialization for width W = 48 and depths L ∈ {2, 4, . . . , 16} (colors) for input dimension d = 50. Relative depth denotes the respective layer index normalized by network depth. (a) Signals are computed from a single forward pass at initialization with standard Gaussian input X ∼ N (0, Id). Each curve is the mean hidden-state ℓ2… view at source ↗
Figure 3
Figure 3. Figure 3: Pushforwards under learned HyCNN OT map (width 48, depth 4). The HyCNN is trained for n = m = M = 2000 data points for 2500 outer iterations for learning f, with S = 10 steps per iterations for learning g, via Adam and decaying learning rates and smoothness parameter τ . 5 Experiments 5.1 Convex regression We draw covariates (Xi) n i=1 from Unif[−1, 1] d with responses according to the model Yi = f(Xi) + ϵ… view at source ↗
Figure 4
Figure 4. Figure 4: Prediction MSE for different training sample sizes n for the 50-dimensional regression tasks with 10th-90th percentile bands across 10 runs. Each panel shows HyCNNs with depths 2, 4, 8, 16 and the best MLP, ICNN, GMN at n = 104 among depths ∈ {2, 4, 8, 16}. 10 view at source ↗
Figure 5
Figure 5. Figure 5: Performance of OT map estimation for synthetic and 4i single cell RNA-seq data. (a). Prediction MSE (normalized by dimension d = 50) for n = 5000 training samples for OT map estimation tasks with 10th-90th percentile bands across 10 runs. Both panels show the same HyCNN and the best ICNN with ReLU or leaky ReLU, or with a quadratic term in the first layer with ReLU or Softplus, and the Monge Gap MLP estima… view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of the proof of Theorem 3.3 for L = 2 and (d1, d2) = (4, 4). uℓ(x), φℓ(x) and ψℓ(x) are defined by (B.4), (B.5) and (B.6), respectively. From (B.7) and (B.8), z2,0(x) = u1(x) and z2,j (x) = max(φ1(x), ψ1(x) + 7 16x − 7j 256 ) for j ∈ {1, 2, 3}. Gray vertical lines are drawn at the kink locations of the functions. 22 view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of the proof of Proposition A.2 for d = 8 and p = q = 3. Proof of Proposition A.2. We write x = (x1, . . . , xd) ⊺ . We define κ ∶ N → 2N0 as κ(n) ∶= ⎧⎪⎪ ⎨ ⎪⎪⎩ n n is even, n − 1 n is odd. Consider arbitrary p, q ∈ N with pq ≥ d. By Theorem 3.3, there exists a HyCNN g∶ R → R with ⌊L/p⌋ hidden layers, κ(⌊(m − 1)/q⌋) neurons per layer and σ(a, b) = max(a, b), such that sup x∈[0,1] ∣g(x) − x 2 ∣ … view at source ↗
Figure 8
Figure 8. Figure 8: HyCNNs at initialization across different depths. Each panel shows a single randomly initialized HyCNN according to the initialization scheme from Section 2.2 with maxout activation, input dimension d = 2, domain x ∈ [−10, 10] 2 , width m = 32, and depths L ∈ {2, 4, 6, 8}. The surface plot is shown together with its contour projection on the base plane. The visualization demonstrates that the initialized f… view at source ↗
Figure 9
Figure 9. Figure 9: Univariate regression results (d = 1) for the six ground-truth functions (a–f) from Section D.2. Each panel shows the fitted curves of HyCNNs (width 48) with depths L ∈ {2, 4, 8, 16, 32} and the best MLP (width 64), ICNN (width 64), or GMN (width 48) among the same depths; the green dots depict the training samples and the black curve depicts the ground-truth function. For each function, the visualization … view at source ↗
Figure 10
Figure 10. Figure 10: Prediction MSE for different training sample sizes n for the 50-dimensional regression tasks with 10th-90th percentile bands across 10 runs. Each panel shows HyCNNs with depths 2, 4, 8, 16 and the best MLP, ICNN, GMN at n = 104 among similar depths. Across all settings, the key observations are consistent: HyCNNs with depth at least four consistently achieve the lowest prediction MSE, particularly in high… view at source ↗
Figure 11
Figure 11. Figure 11: Reverse OT map estimation in dimension d = 2. Each panel shows the pushforward of the target distribution under the learned reverse HyCNN OT map (width 48, depth 4), trained with n = m = M = 2000 data points, T = 2500 outer iterations, and S = 5 inner steps, using Adam with cosine-decayed learning rate (λ = 10−2 , final ratio 0.01) and smoothness parameter τ0 = 1 (cosine decay). From left to right: (i) fi… view at source ↗
Figure 12
Figure 12. Figure 12: Contour plots of the learned OT potential φ̂ in dimension d = 2. Each panel shows the contour lines of the HyCNN potential (width 48, depth 4) learned from n = m = M = 2000 data points for the same collection of source–target pairs as in view at source ↗
Figure 13
Figure 13. Figure 13: Enlarged view of view at source ↗
read the original abstract

We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the principles of Maxout networks with input convex neural networks (ICNNs) to create a neural network that is always convex in the input, theoretically capable of leveraging depth, and performs reliable when trained at scale compared to ICNNs. Concretely, we prove that HyCNNs require exponentially fewer parameters than ICNNs to approximate quadratic functions up to a given precision. Throughout a series of synthetic experiments, we demonstrate that HyCNNs outperform existing ICNNs and MLPs in terms of predictive performance for convex regression and interpolation tasks. We further apply HyCNNs to learn high-dimensional optimal transport maps for synthetic examples and for single-cell RNA sequencing data, where they oftentimes outperform ICNN-based neural optimal transport methods and other baselines across a wide range of settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The paper introduces Hyper Input Convex Neural Networks (HyCNNs) that combine maxout networks with input convex neural networks (ICNNs) to produce models that are convex in the input. The central claim is a proof that HyCNNs require exponentially fewer parameters than ICNNs to approximate quadratic functions to a given precision. The authors further report that HyCNNs outperform ICNNs and MLPs on synthetic convex regression and interpolation tasks and yield competitive or superior results when used to learn high-dimensional optimal transport maps on both synthetic data and single-cell RNA sequencing data.

Significance. If the exponential parameter-reduction result survives comparison to strengthened ICNN baselines and the reported empirical gains are statistically reliable, the architecture could improve scalability for shape-constrained regression and neural optimal transport. The explicit use of maxout units to increase expressivity while preserving convexity is a concrete technical contribution, and the real-data OT experiments provide a useful existence proof of practical applicability.

major comments (3)
  1. [§4, Theorem on quadratic approximation] §4 (theoretical analysis, Theorem on quadratic approximation): The proof that HyCNNs require exponentially fewer parameters than ICNNs for quadratic approximation compares against a standard ICNN using ReLU-style activations and fixed positive weights. It does not address whether an ICNN variant that incorporates maxout units while still obeying the non-negative input-weight constraints required for convexity could achieve comparable scaling. Without ruling out or comparing against such variants, the claimed exponential gap may be an artifact of the chosen baseline rather than an intrinsic advantage of the HyCNN construction.
  2. [§5] §5 (experimental evaluation): The synthetic and real-data results claim consistent outperformance, yet the manuscript provides no error bars, standard deviations across random seeds, or detailed descriptions of hyperparameter selection and network-depth choices. For the single-cell RNA-seq optimal-transport experiments, it is unclear how the high-dimensional maps were regularized and whether post-training convexity was verified numerically.
  3. [§3] §3 (architecture definition): The precise weight constraints and activation rules that guarantee input-convexity after the maxout combination are stated at a high level but lack an explicit inductive proof or set of sufficient conditions that survive depth scaling. This detail is load-bearing for the claim that HyCNNs remain convex while leveraging depth.
minor comments (3)
  1. [Abstract] Abstract: 'performs reliable when trained at scale' should read 'performs reliably when trained at scale'.
  2. [§5] Figures in §5: Add captions that explicitly state the plotted metric (e.g., mean squared error, Wasserstein distance) and whether shaded regions represent standard error over multiple runs.
  3. [§3] Notation: The definition of the hyper-network parameters and how they interact with the maxout units should be introduced with a single consolidated equation block rather than scattered across paragraphs.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [§4, Theorem on quadratic approximation] The proof that HyCNNs require exponentially fewer parameters than ICNNs for quadratic approximation compares against a standard ICNN using ReLU-style activations and fixed positive weights. It does not address whether an ICNN variant that incorporates maxout units while still obeying the non-negative input-weight constraints required for convexity could achieve comparable scaling. Without ruling out or comparing against such variants, the claimed exponential gap may be an artifact of the chosen baseline rather than an intrinsic advantage of the HyCNN construction.

    Authors: We appreciate this insightful comment. Our theoretical analysis in §4 establishes the parameter efficiency of HyCNNs relative to the standard ICNN architecture as introduced in prior work. The HyCNN construction integrates maxout units in a manner that preserves input convexity through a hypernetwork parameterization, which differs from simply augmenting an ICNN with maxout while enforcing non-negative weights on input connections. We acknowledge that a direct comparison to a hypothetical maxout-enhanced ICNN variant would further strengthen the result. In the revised manuscript, we will add a discussion in §4 clarifying the distinction between our approach and such variants, and note that no such maxout-ICNN has been proposed or analyzed in the literature to date. revision: partial

  2. Referee: [§5] The synthetic and real-data results claim consistent outperformance, yet the manuscript provides no error bars, standard deviations across random seeds, or detailed descriptions of hyperparameter selection and network-depth choices. For the single-cell RNA-seq optimal-transport experiments, it is unclear how the high-dimensional maps were regularized and whether post-training convexity was verified numerically.

    Authors: Thank you for highlighting these important aspects of the experimental section. We agree that the current presentation lacks sufficient statistical rigor and implementation details. In the revised version, we will include error bars and standard deviations computed over multiple random seeds for all synthetic and real-data experiments. We will also expand the experimental setup subsection to provide full details on hyperparameter selection, network architectures, and depth choices. For the single-cell RNA-seq OT experiments, we will describe the regularization techniques employed and report numerical verification of post-training convexity, such as checking the Hessian or gradient monotonicity on held-out samples. These changes will be incorporated as a full revision to §5. revision: yes

  3. Referee: [§3] The precise weight constraints and activation rules that guarantee input-convexity after the maxout combination are stated at a high level but lack an explicit inductive proof or set of sufficient conditions that survive depth scaling. This detail is load-bearing for the claim that HyCNNs remain convex while leveraging depth.

    Authors: We thank the referee for pointing out the need for greater rigor in the architectural definition. While §3 outlines the weight constraints and the use of maxout to maintain convexity, we agree that an explicit inductive proof would enhance clarity. In the revised manuscript, we will include a formal inductive proof in §3 or an appendix demonstrating that the HyCNN architecture preserves input convexity at arbitrary depth under the specified constraints. This will involve showing that each layer's output remains convex in the input when composed appropriately. This constitutes a full revision to address the concern. revision: yes

Circularity Check

0 steps flagged

No circularity: HyCNN parameter-efficiency claim is a new construction with independent proof content

full rationale

The paper defines HyCNNs as a novel combination of maxout units with the non-negative weight constraints of ICNNs, then states a separate theorem proving exponential parameter reduction for quadratic approximation. No quoted step reduces the claimed advantage to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled from prior work by the same authors. The baseline ICNN is the standard construction from external literature; the proof is presented as comparing against that fixed baseline rather than re-deriving it from HyCNN itself. Empirical sections on regression and OT maps are downstream validations, not load-bearing for the theoretical claim. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central contribution is the new HyCNN architecture itself. No free parameters are mentioned. The main domain assumption is that the functions to be learned are convex. The invented entity is the HyCNN itself.

axioms (1)
  • domain assumption Target functions are convex
    The entire architecture and all experiments are built around enforcing and learning convex functions.
invented entities (1)
  • Hyper Input Convex Neural Network (HyCNN) no independent evidence
    purpose: A neural network that is always convex in the input, leverages depth, and approximates quadratics with exponentially fewer parameters than ICNNs
    New architecture introduced and analyzed in the paper; no independent evidence outside this work is provided in the abstract.

pith-pipeline@v0.9.0 · 5474 in / 1253 out tokens · 57199 ms · 2026-05-07T08:21:08.675048+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 10 canonical work pages · 2 internal anchors

  1. [1]

    Amos, B., Xu, L., & Kolter, J. Z. (2017). Input convex neural networks. In International conference on machine learning \!\!, pages 146--155.: PMLR

  2. [2]

    Bal \'a zs, G., Gy \"o rgy, A., & Szepesv \'a ri, C. (2015). Near-optimal max-affine estimators for convex regression. In Artificial Intelligence and Statistics \!\!, pages 56--64.: PMLR

  3. [3]

    Brenier, Y. (1991). Polar factorization and monotone rearrangement of vector-valued functions. Communications on pure and applied mathematics , 44(4), 375--417

  4. [4]

    G., Gut, G., Del Castillo, J

    Bunne, C., Stark, S. G., Gut, G., Del Castillo, J. S., Levesque, M., Lehmann, K.-V., Pelkmans, L., Krause, A., & R \"a tsch, G. (2023). Learning single-cell perturbation responses using neural optimal transport. Nature methods , 20(11), 1759--1768

  5. [5]

    Chen, Y., Shi, Y., & Zhang, B. (2018). Optimal control via neural networks: A convex approach. Preprint arXiv:1805.11835

  6. [6]

    Courty, N., Flamary, R., Tuia, D., & Rakotomamonjy, A. (2016). Optimal transport for domain adaptation. IEEE transactions on pattern analysis and machine intelligence , 39(9), 1853--1865

  7. [7]

    Cuturi, M., Meng-Papaxanthos, L., Tian, Y., Bunne, C., Davis, G., & Teboul, O. (2022). Optimal transport tools ( OTT ): A JAX toolbox for all things W asserstein. Preprint arXiv:2201.12324

  8. [8]

    De Lara, L., Gonz \'a lez-Sanz, A., Asher, N., Risser, L., & Loubes, J.-M. (2024). Transport-based counterfactual models. Journal of Machine Learning Research , 25(136), 1--59

  9. [9]

    & Warin, X

    Deschatre, T. & Warin, X. (2025). Input convex Kolmogorov Arnold Networks . Preprint arXiv:2505.21208

  10. [10]

    Divol, V., Niles-Weed, J., & Pooladian, A.-A. (2025). Optimal transport map estimation in general function spaces. The Annals of Statistics , 53(3), 963--988

  11. [11]

    & Schmidt-Hieber, J

    Eckle, K. & Schmidt-Hieber, J. (2019). A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Networks , 110, 232--242

  12. [12]

    Y., Kao, Y.-C., Xu, M., & Samworth, R

    Feng, O. Y., Kao, Y.-C., Xu, M., & Samworth, R. J. (2026). Optimal convex m-estimation via score matching. The Annals of Statistics , 54(1), 408--441

  13. [13]

    Feydy, J., S \'e journ \'e , T., Vialard, F.-X., Amari, S.-i., Trouv \'e , A., & Peyr \'e , G. (2019). Interpolating between optimal transport and MMD using Sinkhorn divergences. In The 22nd international conference on artificial intelligence and statistics \!\!, pages 2681--2690.: PMLR

  14. [14]

    Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. In S. Dasgupta & D. McAllester (Eds.), Proceedings of the 30th International Conference on Machine Learning , number 28(3) in Proceedings of Machine Learning Research \!\!, pages 1319--1327

  15. [15]

    Gordaliza, P., Del Barrio, E., Fabrice, G., & Loubes, J.-M. (2019). Obtaining fairness using optimal transport theory. In International conference on machine learning \!\!, pages 2357--2365.: PMLR

  16. [16]

    & Hundrieser, S

    Groppe, M. & Hundrieser, S. (2024). Lower complexity adaptation for empirical entropic optimal transport. Journal of Machine Learning Research , 25(344), 1--55

  17. [17]

    & Sen, B

    Guntuboyina, A. & Sen, B. (2015). Global risk bounds and adaptation in univariate convex regression. Probability Theory and Related Fields , 163(1), 379--411

  18. [18]

    Hallin, M., del Barrio, E., Cuesta-Albertos, J., & Matr \'a n, C. (2021). Distribution and quantile functions, ranks and signs in dimension d . The Annals of Statistics , 49(2), 1139--1165

  19. [19]

    Hannah, L. A. & Dunson, D. B. (2013). Multivariate convex regression with adaptive partitioning. Journal of Machine Learning Research , 14(1), 3261--3294

  20. [20]

    He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision \!\!, pages 1026--1034

  21. [21]

    & Klambauer, G

    Hoedt, P.-J. & Klambauer, G. (2023). Principled weight initialisation for input-convex neural networks. Advances in Neural Information Processing Systems , 36, 46093--46104

  22. [22]

    T., Tsirigotis, C., & Courville, A

    Huang, C.-W., Chen, R. T., Tsirigotis, C., & Courville, A. (2020). Convex potential flows: Universal probability distributions with optimal transport and convex optimization. Preprint arXiv:2012.05942

  23. [23]

    Hundrieser, S., Staudt, T., & Munk, A. (2024). Empirical optimal transport between different measures adapts to lower complexity. Annales de l'Institut Henri Poincare (B) Probabilites et statistiques , 60(2), 824--846

  24. [24]

    & Rigollet, P

    H \"u tter, J.-C. & Rigollet, P. (2021). Minimax estimation of smooth optimal transport maps. The Annals of Statistics , 49(2), 1166--1194

  25. [25]

    Hyv \"a rinen, A. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research , 6(4)

  26. [26]

    Kingma, D. P. & Ba, J. (2014). Adam: A method for stochastic optimization. Preprint arXiv:1412.6980

  27. [27]

    B., & M \"u ller, K.-R

    LeCun, Y., Bottou, L., Orr, G. B., & M \"u ller, K.-R. (2002). Efficient backprop. In Neural networks: Tricks of the trade \!\!, pages 9--50. Springer

  28. [28]

    & Srikant, R

    Liang, S. & Srikant, R. (2017). Why deep neural networks for function approximation? In International Conference on Learning Representations . https://openreview.net/forum?id=SkpSlKIel

  29. [29]

    Lin, A. & Ba, D. E. (2023). How to train your FALCON : Learning log-concave densities with energy-based neural networks. In Fifth Symposium on Advances in Approximate Bayesian Inference

  30. [30]

    Makkuva, A., Taghvaei, A., Oh, S., & Lee, J. (2020). Optimal transport mapping via input convex neural networks. In International Conference on Machine Learning \!\!, pages 6672--6681.: PMLR

  31. [31]

    Manole, T., Balakrishnan, S., Niles-Weed, J., & Wasserman, L. (2024). Plugin estimation of smooth optimal transport maps. The Annals of Statistics , 52(3), 966--998

  32. [32]

    McClure, D. E. (1975). Nonlinear segmented function approximation and analysis of line patterns. Quarterly of Applied Mathematics , 33(1), 1--37

  33. [33]

    Nesterov , V., Arend Torres , F., Nagy-Huber , M., Samarin , M., & Roth , V. (2022). Learning invariances with generalised input-convex neural networks . Preprint arXiv:2204.07009

  34. [34]

    & Cuturi, M

    Peyr \'e , G. & Cuturi, M. (2019). Computational optimal transport: With applications to data science. Foundations and Trends in Machine Learning , 11(5-6), 355--607

  35. [35]

    arXiv preprint arXiv:2109.12004 , year =

    Pooladian, A.-A. & Niles-Weed, J. (2021). Entropic estimation of optimal transport maps. Preprint arXiv:2109.12004

  36. [36]

    & Mohamed, S

    Rezende, D. & Mohamed, S. (2015). Variational inference with normalizing flows. In International conference on machine learning \!\!, pages 1530--1538.: PMLR

  37. [37]

    Samworth, R. J. (2018). Recent progress in log-concave density estimation. Statistical Science , 33(4), 493

  38. [38]

    Santambrogio, F. (2015). Optimal transport for applied mathematicians. Calculus of variations, PDEs, and modeling . Birkh \"a user Basel

  39. [39]

    Schiebinger, G., Shu, J., Tabaka, M., Cleary, B., Subramanian, V., Solomon, A., Gould, J., Liu, S., Lin, S., Berube, P., Lee, L., Chen, J., Brumbaugh, J., Rigollet, P., Hochedlinger, K., Jaenisch, R., Regev, A., & Lander, E. S. (2019). Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell , ...

  40. [40]

    Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function. The Annals of Statistics , 48(4), 1875--1897

  41. [41]

    B., Flamary, R., Courty, N., Rolet, A., & Blondel, M

    Seguy, V., Damodaran, B. B., Flamary, R., Courty, N., Rolet, A., & Blondel, M. (2017). Large-scale optimal transport and mapping estimation. Preprint arXiv:1711.02283

  42. [42]

    & Sen, B

    Seijo, E. & Sen, B. (2011). Nonparametric least squares estimation of a multivariate convex regression function. The Annals of Statistics , 39(3), 1633--1657

  43. [43]

    Sivaprasad, S., Singh, A., Manwani, N., & Gandhi, V. (2021). The curious case of convex neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases \!\!, pages 738--754.: Springer

  44. [44]

    Song, Y., Durkan, C., Murray, I., & Ermon, S. (2021). Maximum likelihood training of score-based diffusion models. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, & J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems , volume 34 \!\!, pages 1415--1428

  45. [45]

    Stromme, A. J. (2024). Minimum intrinsic dimension scaling for entropic optimal transport. In International Conference on Soft Methods in Probability and Statistics \!\!, pages 491--499.: Springer

  46. [46]

    & Jalali, A

    Taghvaei, A. & Jalali, A. (2019). 2- Wasserstein approximation via restricted convex potentials with application to improved training for GANs . Preprint arXiv:1902.07197

  47. [47]

    Tameling, C., Stoldt, S., Stephan, T., Naas, J., Jakobs, S., & Munk, A. (2021). Colocalization for super-resolution microscopy via optimal transport. Nature computational science , 1(3), 199--211

  48. [48]

    Y., Mukherjee, S., Tang, J., & Sch \"o nlieb, C.-B

    Tan, H. Y., Mukherjee, S., Tang, J., & Sch \"o nlieb, C.-B. (2023). Data-driven mirror descent with input-convex neural networks. SIAM Journal on Mathematics of Data Science , 5(2), 558--587

  49. [49]

    Telgarsky, M. (2016). Benefits of depth in neural networks. In V. Feldman, A. Rakhlin, & O. Shamir (Eds.), 29th Annual Conference on Learning Theory , volume 49 of Proceedings of Machine Learning Research \!\!, pages 1517--1539. Columbia University, New York, New York, USA: PMLR

  50. [50]

    Thakolkaran, P., Guo, Y., Saini, S., Peirlinck, M., Alheit, B., & Kumar, S. (2025). Can KAN CAN s? I nput-convex Kolmogorov-Arnold networks (KANs) as hyperelastic constitutive artificial neural networks ( CAN s). Computer Methods in Applied Mechanics and Engineering , 443, 118089

  51. [51]

    & Cuturi, M

    Uscidda, T. & Cuturi, M. (2023). The Monge gap: A regularizer to learn all transport maps. In International Conference on Machine Learning \!\!, pages 34709--34733.: PMLR

  52. [52]

    Villani, C. (2008). Optimal transport: old and new , volume 338. Springer

  53. [53]

    A., Slep c ev, D., Lee, A

    Wang, W., Ozolek, J. A., Slep c ev, D., Lee, A. B., Chen, C., & Rohde, G. K. (2010). An optimal transportation approach for nuclear structure-based pathology. IEEE transactions on medical imaging , 30(3), 621--631

  54. [54]

    Warin, X. (2023). The GroupMax neural network approximation of convex functions. IEEE Transactions on Neural Networks and Learning Systems , 35(8), 11608--11612

  55. [55]

    Warin, X. (2024). P1-KAN : An effective Kolmogorov-Arnold network with application to hydraulic valley optimization. Preprint arXiv:2410.03801

  56. [56]

    & Bach, F

    Weed, J. & Bach, F. (2019). Sharp asymptotic and finite-sample rates of convergence of empirical measures in wasserstein distance. Bernoulli , 25(4A), 2620--2648

  57. [57]

    Yarotsky, D. (2017). Error bounds for approximations with deep ReLU networks. Neural Networks , 94, 103 -- 114