Weighted universal approximation of differentiable maps on infinite-dimensional manifolds

Josef Teichmann; Philipp Schmocker

arxiv: 2606.09820 · v2 · pith:L2JUOU5Enew · submitted 2026-06-08 · 🧮 math.FA · cs.LG· math.PR· q-fin.MF· stat.ML

Weighted universal approximation of differentiable maps on infinite-dimensional manifolds

Philipp Schmocker , Josef Teichmann This is my paper

Pith reviewed 2026-06-30 11:30 UTC · model grok-4.3

classification 🧮 math.FA cs.LGmath.PRq-fin.MFstat.ML

keywords universal approximationdifferentiable mapsinfinite-dimensional manifoldsNachbin theoremfunctional neural networkspath signaturesnon-anticipative functionalsweighted approximation

0 comments

The pith

A weighted Nachbin theorem establishes universal approximation for differentiable maps on infinite-dimensional manifolds including their derivatives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper generalizes the universal approximation theorem for functional input neural networks so that they approximate not only maps from weighted manifolds to Banach spaces but also the derivatives of those maps. The key step is a new weighted version of Nachbin's density theorem that works simultaneously for the function values and the derivatives. This removes the usual restriction to compact sets and yields approximation results for non-anticipative functionals together with their horizontal and vertical derivatives. One concrete application is that linear functions of the signature can approximate path-space functionals along with their directional derivatives. A reader would care because many models in stochastic analysis and infinite-dimensional settings require control over both the output and its sensitivity to changes in the input.

Core claim

By proving a weighted Nachbin theorem, the authors show that the function algebras generated by functional neural networks are dense in the space of differentiable maps on weighted manifolds, where the topology controls both the maps and their derivatives. This produces universal approximation theorems for differentiable maps that hold on infinite-dimensional weighted manifolds rather than only on compact sets, and it implies approximation results for non-anticipative functionals including horizontal and vertical derivatives. Linear functions of the signature are shown to approximate path-space functionals together with their directional derivatives.

What carries the argument

The weighted Nachbin theorem, which supplies density of suitable subalgebras in the space of differentiable functions on a weighted manifold with respect to a topology that includes derivative control.

If this is right

Functional neural networks approximate non-anticipative functionals together with their horizontal and vertical derivatives.
Linear functions of the signature approximate path-space functionals and their directional derivatives.
The approximation result applies simultaneously to a map and its derivatives on infinite-dimensional weighted manifolds.
The density holds in a topology that controls derivatives, extending beyond the usual compact-set setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the weighted manifold condition holds for a given class of activations, the same density argument could be checked for higher-order derivatives.
The signature approximation result suggests testing the method on concrete path-dependent problems arising in stochastic processes.
The framework may connect to existing signature-based models by supplying derivative control that those models currently lack.

Load-bearing premise

The input space must admit a weighted manifold structure that makes the Nachbin-type density result hold simultaneously for the maps and their derivatives under the chosen activation functions.

What would settle it

A concrete counterexample would be a specific weighted manifold together with a differentiable map such that no sequence of functional neural networks approximates both the map and its derivative uniformly to arbitrary precision.

Figures

Figures reproduced from arXiv: 2606.09820 by Josef Teichmann, Philipp Schmocker.

**Figure 1.** Figure 1: A FNN φ : M → Y with additive family A, activation function ρ ∈ C k (R), linear readout L ⊆ Y , and N = 3 number of neurons. Remark 4.5. Definition 4.4 extends the notion of classical neural networks between Euclidean spaces. Indeed, let φ : R d → R m be a classical neural network of the form (4.2) R d ∋ x 7→ φ(x) = W ρ(Ax + b) = X N n=1 ynρ [PITH_FULL_IMAGE:figures/full_fig_p030_1.png] view at source ↗

**Figure 2.** Figure 2: Learning f1 defined in (7.1) by path-NN φ ∈ FN ρ,ρe Λ α,r T,R (label FNN) and linear function of the signature P 0≤|I|≤NSig aI ⟨eI , S(Xb )t, eI ⟩ (label Sig). In (a), the weighted mean squared error (7.3) is evaluated on the training set in each epoch (continuous line) as well as on the test after every 200-th epoch (dots). In (b)–(d), three samples x(m) of the test set are shown together with f1(·, x(m… view at source ↗

**Figure 3.** Figure 3: Learning f2 defined in (7.2) by path-NN φ ∈ FN ρ,ρe Λ α,r T,R (label FNN) and linear function of the signature P 0≤|I|≤NSig aI ⟨eI , S(Xb )t, eI ⟩ (label Sig). In (a), the weighted mean squared error (7.3) is evaluated on the training set in each epoch (continuous line) as well as on the test after every 200-th epoch (dots). In (b)–(d), three samples x(m) of the test set are shown together with f2(·, x(m… view at source ↗

read the original abstract

We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the derivatives. A FNN maps the input from a possibly infinite-dimensional weighted manifold to the real-valued hidden layer, on which a non-linear scalar activation function is applied, and then returns the output into a Banach space via some linear readouts. By proving a weighted Nachbin theorem, we establish a universal approximation theorem for differentiable maps, which goes beyond the usual formulation on compact sets and also includes the approximation of the derivatives. This leads us to approximation results for non-anticipative functionals including the horizontal and vertical derivatives. As a further application, we show that linear functions of the signature are able to approximate path space functionals including their directional derivatives.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves a weighted Nachbin theorem that extends universal approximation to differentiable maps and their derivatives on non-compact infinite-dimensional manifolds.

read the letter

The key thing to know is that they prove a weighted Nachbin theorem giving universal approximation for differentiable maps on weighted infinite-dimensional manifolds, including the derivatives, and this works beyond the usual compact-set restrictions.

What the paper does well is take the functional neural network setup and add derivative approximation while handling non-compact spaces through weights. The applications to non-anticipative functionals with horizontal and vertical derivatives, plus the result on linear signature functions approximating path space functionals with directional derivatives, are concrete and tie into existing work in rough paths and functional data analysis.

The soft spots are that the abstract leaves the precise conditions on the weights, the manifold structure, and the activation functions unspecified, so it is not obvious how restrictive they turn out to be or how the infinite-dimensional topology is controlled in the proof. Those details matter in this area and would need checking in the full text.

This is for specialists in functional analysis or machine learning theory who want rigorous approximation guarantees on infinite-dimensional spaces. A reader working on path-dependent problems or signature methods would get direct value from the applications. It deserves a serious referee because the claimed theorem is new and the applications are relevant.

Recommendation: send it to peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript generalizes the universal approximation theorem for functional neural networks (FNNs) mapping from possibly infinite-dimensional weighted manifolds to Banach spaces. It proves a weighted Nachbin theorem to establish density results that simultaneously approximate differentiable maps and their derivatives, extending beyond compact sets. Applications include approximation of non-anticipative functionals (with horizontal and vertical derivatives) and linear functionals of the signature for path-space functionals including directional derivatives.

Significance. If the weighted Nachbin theorem and its applications hold, the result would extend classical UATs to differentiable maps on non-compact infinite-dimensional spaces, providing a tool for approximation theory in functional analysis with potential relevance to stochastic analysis and signature methods. The explicit inclusion of derivative approximation and the weighted setting are the main novelties.

major comments (2)

[Abstract] The central claim rests on the existence of a weighted manifold structure compatible with the Nachbin density result for differentiable maps (abstract and the paragraph introducing the weighted Nachbin theorem). Without an explicit definition of this structure, the precise conditions on the weight function, and verification that the activation functions satisfy the required density simultaneously for the map and derivatives, the proof cannot be assessed for correctness.
The transition from the weighted Nachbin theorem to the approximation of non-anticipative functionals (including horizontal and vertical derivatives) and to signature approximations requires additional technical steps that are not verifiable from the provided abstract; these steps appear load-bearing for the applications but lack outlined proofs or references to prior results.

minor comments (1)

Clarify the precise statement of the weighted Nachbin theorem (e.g., the topology on the space of differentiable maps and the form of the weight) in the introduction or dedicated section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their report and the opportunity to clarify points from the manuscript. We address each major comment below, noting that the referee's concerns appear to stem from the abstract alone; the full text contains the requested definitions, conditions, and proof outlines.

read point-by-point responses

Referee: [Abstract] The central claim rests on the existence of a weighted manifold structure compatible with the Nachbin density result for differentiable maps (abstract and the paragraph introducing the weighted Nachbin theorem). Without an explicit definition of this structure, the precise conditions on the weight function, and verification that the activation functions satisfy the required density simultaneously for the map and derivatives, the proof cannot be assessed for correctness.

Authors: The full manuscript explicitly defines the weighted manifold structure immediately after introducing the weighted Nachbin theorem, specifies the precise conditions on the weight function required for compatibility with the density result, and verifies in the proof that the chosen activation functions achieve simultaneous density for both the maps and their derivatives. These elements are standardly located in the body rather than the abstract, which is a high-level summary. revision: no
Referee: [—] The transition from the weighted Nachbin theorem to the approximation of non-anticipative functionals (including horizontal and vertical derivatives) and to signature approximations requires additional technical steps that are not verifiable from the provided abstract; these steps appear load-bearing for the applications but lack outlined proofs or references to prior results.

Authors: The full manuscript details the technical transitions in the dedicated applications section. The steps from the weighted Nachbin theorem to the approximation results for non-anticipative functionals (including horizontal and vertical derivatives) and signature-based approximations are outlined with explicit references to prior results on these topics, and the proofs are provided in the text. revision: no

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper claims to prove a weighted Nachbin theorem that directly yields a UAT for differentiable maps on weighted infinite-dimensional manifolds, including derivative approximation. No equations, definitions, or derivation steps are exhibited that reduce any claimed result to its inputs by construction, fitted parameters renamed as predictions, or load-bearing self-citations. The abstract and description frame the work as an independent proof extending prior results without self-referential reductions, making the derivation self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is a pure existence proof in functional analysis; no numerical fitting or new postulated objects are described in the abstract.

axioms (1)

standard math Standard results from functional analysis on Banach spaces, manifolds, and weighted topologies
The proof invokes known properties of infinite-dimensional manifolds and weighted approximation to establish the Nachbin-type density.

pith-pipeline@v0.9.1-grok · 5664 in / 1141 out tokens · 34007 ms · 2026-06-30T11:30:57.148058+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

124 extracted references · 8 canonical work pages · 4 internal anchors

[1]

Acciaio, A

B. Acciaio, A. Kratsios, and G. Pammer. Designing universal causal deep learning models: The geometric (hyper)transformer.Mathematical Finance, 34(2):671–735, 2024

2024
[2]

Ambrosio, N

L. Ambrosio, N. Fusco, and D. Pallara.Functions of bounded variation and free discontinuity problems. Oxford science publications. Clarendon Press, Oxford, 2000

2000
[3]

R. M. Aron and J. B. Prolla. Polynomial approximation of differentiable functions on Banach spaces.Journal für die reine und angewandte Mathematik, 313:195–216, 1980

1980
[4]

A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory, 39(3):930–945, 1993

1993
[5]

Bastiani

A. Bastiani. Applications différentiables et variétés différentiables de dimension infinie.Journal d’Analyse Mathématique, 13:1–114, 1964

1964
[6]

Bayer, P

C. Bayer, P. P. Hager, S. Riedel, and J. Schoenmakers. Optimal stopping with signatures.The Annals of Applied Probability, 33(1):238–273, 2023

2023
[7]

Bayer, L

C. Bayer, L. Pelizzari, and J. Schoenmakers. Primal and dual optimal stopping with signatures.Finance and Stochastics, 29:981–1014, 2025

2025
[8]

F. E. Benth, N. Detering, and L. Galimberti. Neural networks in Fréchet spaces.Annals of Mathematics and Artificial Intelligence, 91:75–103, 2023

2023
[9]

M. S. Berger.Nonlinearity and functional analysis: Lectures on nonlinear problems in mathematical analysis. Pure and applied mathematics, a series of monographs and textbooks; v. 74. Academic Press, New York, 1977

1977
[10]

Bernstein

S. Bernstein. Le problème de l’approximation des fonctions continues sur tout l’axe réel et l’une de ses applications.Bulletin de la Société Mathématique de France, 52:399–410, 1924

1924
[11]

Bierstedt.Gewichtete Räume stetiger vektorwertiger Funktionen und das injektive Tensorprodukt

K.-D. Bierstedt.Gewichtete Räume stetiger vektorwertiger Funktionen und das injektive Tensorprodukt. PhD thesis, Johannes-Gutenberg Universität Mainz, Mainz, 1971

1971
[12]

Billingsley.Convergence of probability measures

P. Billingsley.Convergence of probability measures. Wiley series in probability and statistics. Probability and statistics section. Wiley, New York, 2nd ed. edition, 1999

1999
[13]

Blessing, R

J. Blessing, R. Denk, M. Kupper, and M. Nendel. Convex monotone semigroups and their generators with respect toΓ-convergence.Journal of Functional Analysis, 288(8):110841, 2025

2025
[14]

Boedihardjo, X

H. Boedihardjo, X. Geng, T. Lyons, and D. Yang. The signature of a rough path: Uniqueness.Advances in Mathematics, 293:720–737, 2016

2016
[15]

Bölcskei, P

H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen. Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1:8–45, 2019

2019
[16]

Brézis.Functional Analysis, Sobolev Spaces and Partial Differential Equations

H. Brézis.Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext. Springer, New York, 2011

2011
[17]

E. J. Candès.Ridgelets: Theory and Applications. PhD thesis, Stanford University, 1998.https://candes. su.domains/publications/downloads/Thesis.pdf

1998
[18]

Ceylan and D

M. Ceylan and D. J. Prömel. Global universal approximation with Brownian signatures.Preprint arXiv:2512.16396, 2025

work page arXiv 2025
[19]

K.-T. Chen. Integration of paths, geometric invariants and a generalized Baker-Hausdorff formula.Annals of Mathematics, 65(1):163–178, 1957

1957
[20]

Chen and H

T. Chen and H. Chen. Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks.IEEE Transactions on Neural Networks, 6(4):904–910, 1995

1995
[21]

Cont.Functional Ito Calculus and functional Kolmogorov equations, pages 123–208

R. Cont.Functional Ito Calculus and functional Kolmogorov equations, pages 123–208. Advanced Courses in Mathematics. Birkhauser, Basel, 2016. Lecture Notes of the Barcelona Summer School in Stochastic Analysis, July 2012. 74 P. SCHMOCKER AND J. TEICHMANN

2016
[22]

Cont and D.-A

R. Cont and D.-A. Fournié. Change of variable formulas for non-anticipative functionals on path space. Journal of Functional Analysis, 259(4):1043–1072, 2010

2010
[23]

Cont and D.-A

R. Cont and D.-A. Fournié. Functional Itô calculus and stochastic integral representation of martingales. Annals of Probability, 41(1):109–133, 01 2013

2013
[24]

S. Cox, A. Khedher, and T. Maessen. Universal approximation by signatures for infinite-dimensional rough paths.Preprint arXiv:2603.03058, 2026

work page arXiv 2026
[25]

Cuchiero, G

C. Cuchiero, G. Gazzani, and S. Svaluto-Ferro. Signature-based models: Theory and calibration.SIAM Journal on Financial Mathematics, 14(3):910–957, 2023

2023
[26]

Cuchiero and J

C. Cuchiero and J. Möller. Signature methods in stochastic portfolio theory.SIAM Journal on Financial Mathematics, 16(4):1239–1303, 2025

2025
[27]

Cuchiero, F

C. Cuchiero, F. Primavera, and S. Svaluto-Ferro. Universal approximation theorems for continuous functions of càdlàg paths and Lévy-type signature models.Finance and Stochastics, 29:289–342, 2025

2025
[28]

Cuchiero, P

C. Cuchiero, P. Schmocker, and J. Teichmann. Global universal approximation of functional input maps on weighted spaces.Constructive Approximation, 63:537–612, 2026

2026
[29]

Cuchiero and J

C. Cuchiero and J. Teichmann. Generalized Feller processes and Markovian lifts of stochastic Volterra pro- cesses: the affine case.Journal of Evolution Equations, 20:1–48, 2020

2020
[30]

G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

1989
[31]

Dieudonné.Foundations of modern analysis

J. Dieudonné.Foundations of modern analysis. Enlarged and corrected printing. Academic Press, New York, London, 1969

1969
[32]

J. Dixmier. Sur un théorème de Banach.Duke Mathematical Journal, 15(4):1057–1071, 1948

1948
[33]

P.DörsekandJ.Teichmann.Asemigrouppointofviewonsplittingschemesforstochastic(partial)differential equations.Preprint arXiv:1011.2651, 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010
[34]

B. Dupire. Functional Itô calculus. Technical report, Bloomberg, 2009. Bloomberg Portfolio Research Paper No. 2009-04-FRONTIERS

2009
[35]

S. N. Ethier and T. G. Kurtz.Markov processes: Characterization and convergence. John Wiley & Sons, 2005

2005
[36]

M. Fliess. Fonctionnelles causales non linéaires et indéterminées non commutatives.Bulletin de la Société Mathématique de France, 109:3–40, 1981

1981
[37]

G. B. Folland.Fourier analysis and its applications. Brooks/Cole Publishing Company, Belmont, California, 1st edition, 1992

1992
[38]

H. Föllmer. Calcul d’Ito sans probabilités.Séminaire de probabilités de Strasbourg, 15:143–150, 1981

1981
[39]

P. K. Friz and M. Hairer.A Course on Rough Paths: With an Introduction to Regularity Structures. Uni- versitext. Springer International Publishing, Cham, 2nd edition, 2020

2020
[40]

Cambridge Studies in Advanced Mathematics

P.K.FrizandN.B.Victoir.Multidimensional Stochastic Processes as Rough Paths: Theory and Applications. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2010

2010
[41]

Galimberti

L. Galimberti. Neural networks in non-metric spaces.forthcoming in Analysis and Applications, 2026

2026
[42]

Galimberti, A

L. Galimberti, A. Kratsios, and G. Livieri. Designing universal causal deep learning models: The case of infinite-dimensional dynamical systems from stochastic analysis.forthcoming in Constructive Approximation, 2026

2026
[43]

Gierjatowicz, M

P. Gierjatowicz, M. Sabate-Vidales, D. Šiška, L. Szpruch, and Ž. Žurič. Robust pricing and hedging via neural SDEs.Journal of Computational Finance, 26:1–32, 2020

2020
[44]

Glöckner

H. Glöckner. Infinite-dimensional Lie groups without completeness restrictions.Banach Center Publications, 55:43–59, 2002

2002
[45]

Fundamentals of submersions and immersions between infinite-dimensional manifolds

H. Glöckner. Fundamentals of submersions and immersions between infinite-dimensional manifolds.Preprint arXiv:1502.05795, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[46]

Gómez Gil and J

J. Gómez Gil and J. G. Llavona. Polynomial approximation of weakly differentiable functions on Banach spaces.Proceedings of the Royal Irish Academy. Section A: Mathematical and Physical Sciences, 82A(2):141– 150, 1982

1982
[47]

Gonon, L

L. Gonon, L. Grigoryeva, and J.-P. Ortega. Infinite-dimensional reservoir computing.Neural Networks, 179:106486, 2024

2024
[48]

Goodfellow, Y

I. Goodfellow, Y. Bengio, and A. Courville.Deep Learning. MIT Press, 2016

2016
[49]

Grigoryeva and J.-P

L. Grigoryeva and J.-P. Ortega. Echo state networks are universal.Neural Networks, 108:495–508, 2018

2018
[50]

Hambly and T

B. Hambly and T. Lyons. Uniqueness for the signature of a path of bounded variation and the reduced path group.Annals of Mathematics, 171(1):109–167, 2010

2010
[51]

Hausdorff

F. Hausdorff. Die symbolische Exponentialformel in der Gruppentheorie.Ber. Verh. Kgl. Sächs. Ges. Wiss., 58:19–48, 1906. WEIGHTED UNIVERSAL APPROXIMATION OF DIFFERENTIABLE MAPS 75

1906
[52]

Hinton, L

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups.IEEE Signal Processing Magazine, 29(6):82–97, 2012

2012
[53]

K. Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991

1991
[54]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989

1989
[55]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White. Universal approximation of an unknown mapping and its deriva- tives using multilayer feedforward networks.Neural Networks, 3(5):551–560, 1990

1990
[56]

Hytönen, J

T. Hytönen, J. van Neerven, M. Veraar, and L. Weis.Analysis in Banach Spaces, volume 63 ofErgebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. Springer, Cham, 2016

2016
[57]

Iserles and S

A. Iserles and S. P. Nørsett. On the solution of linear differential equations in Lie groups.Philos. Trans. Roy. Soc. A, 357:983–1019, 1999

1999
[58]

Ismailov

V. Ismailov. On shallow feedforward neural networks with inputs from a topological space.forthcoming in Annals of Mathematics and Artificial Intelligence, 2026

2026
[59]

Jakubowski

A. Jakubowski. The Skorokhod space in functional convergence: a short introduction. InInternational con- ference: Skorokhod Space, volume 50, pages 11–18, 2007

2007
[60]

S. Kaijser. A note on dual Banach spaces.Mathematica Scandinavica, 41(2):325–330, 1977

1977
[61]

Keller.Differential calculus in locally convex spaces

H. Keller.Differential calculus in locally convex spaces. Lecture Notes in Mathematics; 417. Springer-Verlag, Berlin, Germany, 1st edition, 1974

1974
[62]

Kidger, J

P. Kidger, J. Foster, X. Li, and T. J. Lyons. Neural SDEs as infinite-dimensional GANs. InInternational Conference on Machine Learning, pages 5453–5463. PMLR, 2021

2021
[63]

D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Y. Bengio and Y. LeCun, editors, 3rd International Conference on Learning Representations, 2015, Conference Track Proceedings, May 2015

2015
[64]

F. J. Kiraly and H. Oberhauser. Kernels for sequentially ordered data.Journal of Machine Learning Research, 20(31):1–45, 2019

2019
[65]

Korevaar

J. Korevaar. Distribution proof of Wiener’s Tauberian theorem.Proceedings of the American Mathematical Society, 16(3):353–355, 1965

1965
[66]

Kovachki, Z

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

2023
[67]

Kratsios and I

A. Kratsios and I. Bilokopytov. Non-Euclidean universal approximation.Advances in Neural Information Processing Systems, 33:10635–10646, 2020

2020
[68]

Kratsios, C

A. Kratsios, C. Liu, M. Lassas, M. V. de Hoop, and I. Dokmanić. An approximation theory for metric space-valued functions with a view towards deep learning.Preprint arXiv:2304.12231, 2023

work page arXiv 2023
[69]

Kratsios, A

A. Kratsios, A. Neufeld, and P. Schmocker. Generative neural operators of log-complexity can simultaneously solve infinitely many convex programs.Preprint arXiv:2508.14995, 2025

work page arXiv 2025
[70]

Kriegl and P

A. Kriegl and P. W. Michor.The convenient setting of global analysis, volume 53 ofMathematical surveys and monographs. American Mathematical Society, Providence, R.I, 1997

1997
[71]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural net- works. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012

2012
[72]

Lancien and E

G. Lancien and E. Pernecká. Approximation properties and Schauder decompositions in Lipschitz-free spaces. Journal of Functional Analysis, 264(10):2323–2334, 2013

2013
[73]

Lanthaler, S

S. Lanthaler, S. Mishra, and G. E. Karniadakis. Error estimates for DeepONets: a deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications, 6(1), 2022

2022
[74]

Leshno, Vladimir Ya

M. Leshno, Vladimir Ya. Lin, A. Pinkus, and S. Schocken. Multilayer feedforward networks with a nonpoly- nomial activation function can approximate any function.Neural Networks, 6(6):861–867, 1993

1993
[75]

Learning from the past, predicting the statistics for the future, learning an evolving system

D. Levin, T. Lyons, and H. Ni. Learning from the past, predicting the statistics for the future, learning an evolving system.Preprint arXiv:1309.0260, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[76]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations.Preprint arXiv:2010.08895, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[77]

Lindenstrauss and L

J. Lindenstrauss and L. Tzafriri.Classical Banach spaces I and II. Classics in mathematics. Springer, Berlin, 1996

1996
[78]

J. G. Llavona.Approximation of Continuously Differentiable Functions, volume 130 ofNorth-Holland Math- ematics Studies. North-Holland, 1986. 76 P. SCHMOCKER AND J. TEICHMANN

1986
[79]

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

2021
[80]

Lyons, S

T. Lyons, S. Nejad, and I. Perez Arribas. Non-parametric pricing and hedging of exotic derivatives.Applied Mathematical Finance, 27(6):457–494, 2020

2020

Showing first 80 references.

[1] [1]

Acciaio, A

B. Acciaio, A. Kratsios, and G. Pammer. Designing universal causal deep learning models: The geometric (hyper)transformer.Mathematical Finance, 34(2):671–735, 2024

2024

[2] [2]

Ambrosio, N

L. Ambrosio, N. Fusco, and D. Pallara.Functions of bounded variation and free discontinuity problems. Oxford science publications. Clarendon Press, Oxford, 2000

2000

[3] [3]

R. M. Aron and J. B. Prolla. Polynomial approximation of differentiable functions on Banach spaces.Journal für die reine und angewandte Mathematik, 313:195–216, 1980

1980

[4] [4]

A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory, 39(3):930–945, 1993

1993

[5] [5]

Bastiani

A. Bastiani. Applications différentiables et variétés différentiables de dimension infinie.Journal d’Analyse Mathématique, 13:1–114, 1964

1964

[6] [6]

Bayer, P

C. Bayer, P. P. Hager, S. Riedel, and J. Schoenmakers. Optimal stopping with signatures.The Annals of Applied Probability, 33(1):238–273, 2023

2023

[7] [7]

Bayer, L

C. Bayer, L. Pelizzari, and J. Schoenmakers. Primal and dual optimal stopping with signatures.Finance and Stochastics, 29:981–1014, 2025

2025

[8] [8]

F. E. Benth, N. Detering, and L. Galimberti. Neural networks in Fréchet spaces.Annals of Mathematics and Artificial Intelligence, 91:75–103, 2023

2023

[9] [9]

M. S. Berger.Nonlinearity and functional analysis: Lectures on nonlinear problems in mathematical analysis. Pure and applied mathematics, a series of monographs and textbooks; v. 74. Academic Press, New York, 1977

1977

[10] [10]

Bernstein

S. Bernstein. Le problème de l’approximation des fonctions continues sur tout l’axe réel et l’une de ses applications.Bulletin de la Société Mathématique de France, 52:399–410, 1924

1924

[11] [11]

Bierstedt.Gewichtete Räume stetiger vektorwertiger Funktionen und das injektive Tensorprodukt

K.-D. Bierstedt.Gewichtete Räume stetiger vektorwertiger Funktionen und das injektive Tensorprodukt. PhD thesis, Johannes-Gutenberg Universität Mainz, Mainz, 1971

1971

[12] [12]

Billingsley.Convergence of probability measures

P. Billingsley.Convergence of probability measures. Wiley series in probability and statistics. Probability and statistics section. Wiley, New York, 2nd ed. edition, 1999

1999

[13] [13]

Blessing, R

J. Blessing, R. Denk, M. Kupper, and M. Nendel. Convex monotone semigroups and their generators with respect toΓ-convergence.Journal of Functional Analysis, 288(8):110841, 2025

2025

[14] [14]

Boedihardjo, X

H. Boedihardjo, X. Geng, T. Lyons, and D. Yang. The signature of a rough path: Uniqueness.Advances in Mathematics, 293:720–737, 2016

2016

[15] [15]

Bölcskei, P

H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen. Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1:8–45, 2019

2019

[16] [16]

Brézis.Functional Analysis, Sobolev Spaces and Partial Differential Equations

H. Brézis.Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext. Springer, New York, 2011

2011

[17] [17]

E. J. Candès.Ridgelets: Theory and Applications. PhD thesis, Stanford University, 1998.https://candes. su.domains/publications/downloads/Thesis.pdf

1998

[18] [18]

Ceylan and D

M. Ceylan and D. J. Prömel. Global universal approximation with Brownian signatures.Preprint arXiv:2512.16396, 2025

work page arXiv 2025

[19] [19]

K.-T. Chen. Integration of paths, geometric invariants and a generalized Baker-Hausdorff formula.Annals of Mathematics, 65(1):163–178, 1957

1957

[20] [20]

Chen and H

T. Chen and H. Chen. Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks.IEEE Transactions on Neural Networks, 6(4):904–910, 1995

1995

[21] [21]

Cont.Functional Ito Calculus and functional Kolmogorov equations, pages 123–208

R. Cont.Functional Ito Calculus and functional Kolmogorov equations, pages 123–208. Advanced Courses in Mathematics. Birkhauser, Basel, 2016. Lecture Notes of the Barcelona Summer School in Stochastic Analysis, July 2012. 74 P. SCHMOCKER AND J. TEICHMANN

2016

[22] [22]

Cont and D.-A

R. Cont and D.-A. Fournié. Change of variable formulas for non-anticipative functionals on path space. Journal of Functional Analysis, 259(4):1043–1072, 2010

2010

[23] [23]

Cont and D.-A

R. Cont and D.-A. Fournié. Functional Itô calculus and stochastic integral representation of martingales. Annals of Probability, 41(1):109–133, 01 2013

2013

[24] [24]

S. Cox, A. Khedher, and T. Maessen. Universal approximation by signatures for infinite-dimensional rough paths.Preprint arXiv:2603.03058, 2026

work page arXiv 2026

[25] [25]

Cuchiero, G

C. Cuchiero, G. Gazzani, and S. Svaluto-Ferro. Signature-based models: Theory and calibration.SIAM Journal on Financial Mathematics, 14(3):910–957, 2023

2023

[26] [26]

Cuchiero and J

C. Cuchiero and J. Möller. Signature methods in stochastic portfolio theory.SIAM Journal on Financial Mathematics, 16(4):1239–1303, 2025

2025

[27] [27]

Cuchiero, F

C. Cuchiero, F. Primavera, and S. Svaluto-Ferro. Universal approximation theorems for continuous functions of càdlàg paths and Lévy-type signature models.Finance and Stochastics, 29:289–342, 2025

2025

[28] [28]

Cuchiero, P

C. Cuchiero, P. Schmocker, and J. Teichmann. Global universal approximation of functional input maps on weighted spaces.Constructive Approximation, 63:537–612, 2026

2026

[29] [29]

Cuchiero and J

C. Cuchiero and J. Teichmann. Generalized Feller processes and Markovian lifts of stochastic Volterra pro- cesses: the affine case.Journal of Evolution Equations, 20:1–48, 2020

2020

[30] [30]

G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989

1989

[31] [31]

Dieudonné.Foundations of modern analysis

J. Dieudonné.Foundations of modern analysis. Enlarged and corrected printing. Academic Press, New York, London, 1969

1969

[32] [32]

J. Dixmier. Sur un théorème de Banach.Duke Mathematical Journal, 15(4):1057–1071, 1948

1948

[33] [33]

P.DörsekandJ.Teichmann.Asemigrouppointofviewonsplittingschemesforstochastic(partial)differential equations.Preprint arXiv:1011.2651, 2010

work page internal anchor Pith review Pith/arXiv arXiv 2010

[34] [34]

B. Dupire. Functional Itô calculus. Technical report, Bloomberg, 2009. Bloomberg Portfolio Research Paper No. 2009-04-FRONTIERS

2009

[35] [35]

S. N. Ethier and T. G. Kurtz.Markov processes: Characterization and convergence. John Wiley & Sons, 2005

2005

[36] [36]

M. Fliess. Fonctionnelles causales non linéaires et indéterminées non commutatives.Bulletin de la Société Mathématique de France, 109:3–40, 1981

1981

[37] [37]

G. B. Folland.Fourier analysis and its applications. Brooks/Cole Publishing Company, Belmont, California, 1st edition, 1992

1992

[38] [38]

H. Föllmer. Calcul d’Ito sans probabilités.Séminaire de probabilités de Strasbourg, 15:143–150, 1981

1981

[39] [39]

P. K. Friz and M. Hairer.A Course on Rough Paths: With an Introduction to Regularity Structures. Uni- versitext. Springer International Publishing, Cham, 2nd edition, 2020

2020

[40] [40]

Cambridge Studies in Advanced Mathematics

P.K.FrizandN.B.Victoir.Multidimensional Stochastic Processes as Rough Paths: Theory and Applications. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2010

2010

[41] [41]

Galimberti

L. Galimberti. Neural networks in non-metric spaces.forthcoming in Analysis and Applications, 2026

2026

[42] [42]

Galimberti, A

L. Galimberti, A. Kratsios, and G. Livieri. Designing universal causal deep learning models: The case of infinite-dimensional dynamical systems from stochastic analysis.forthcoming in Constructive Approximation, 2026

2026

[43] [43]

Gierjatowicz, M

P. Gierjatowicz, M. Sabate-Vidales, D. Šiška, L. Szpruch, and Ž. Žurič. Robust pricing and hedging via neural SDEs.Journal of Computational Finance, 26:1–32, 2020

2020

[44] [44]

Glöckner

H. Glöckner. Infinite-dimensional Lie groups without completeness restrictions.Banach Center Publications, 55:43–59, 2002

2002

[45] [45]

Fundamentals of submersions and immersions between infinite-dimensional manifolds

H. Glöckner. Fundamentals of submersions and immersions between infinite-dimensional manifolds.Preprint arXiv:1502.05795, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[46] [46]

Gómez Gil and J

J. Gómez Gil and J. G. Llavona. Polynomial approximation of weakly differentiable functions on Banach spaces.Proceedings of the Royal Irish Academy. Section A: Mathematical and Physical Sciences, 82A(2):141– 150, 1982

1982

[47] [47]

Gonon, L

L. Gonon, L. Grigoryeva, and J.-P. Ortega. Infinite-dimensional reservoir computing.Neural Networks, 179:106486, 2024

2024

[48] [48]

Goodfellow, Y

I. Goodfellow, Y. Bengio, and A. Courville.Deep Learning. MIT Press, 2016

2016

[49] [49]

Grigoryeva and J.-P

L. Grigoryeva and J.-P. Ortega. Echo state networks are universal.Neural Networks, 108:495–508, 2018

2018

[50] [50]

Hambly and T

B. Hambly and T. Lyons. Uniqueness for the signature of a path of bounded variation and the reduced path group.Annals of Mathematics, 171(1):109–167, 2010

2010

[51] [51]

Hausdorff

F. Hausdorff. Die symbolische Exponentialformel in der Gruppentheorie.Ber. Verh. Kgl. Sächs. Ges. Wiss., 58:19–48, 1906. WEIGHTED UNIVERSAL APPROXIMATION OF DIFFERENTIABLE MAPS 75

1906

[52] [52]

Hinton, L

G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups.IEEE Signal Processing Magazine, 29(6):82–97, 2012

2012

[53] [53]

K. Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991

1991

[54] [54]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989

1989

[55] [55]

Hornik, M

K. Hornik, M. Stinchcombe, and H. White. Universal approximation of an unknown mapping and its deriva- tives using multilayer feedforward networks.Neural Networks, 3(5):551–560, 1990

1990

[56] [56]

Hytönen, J

T. Hytönen, J. van Neerven, M. Veraar, and L. Weis.Analysis in Banach Spaces, volume 63 ofErgebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. Springer, Cham, 2016

2016

[57] [57]

Iserles and S

A. Iserles and S. P. Nørsett. On the solution of linear differential equations in Lie groups.Philos. Trans. Roy. Soc. A, 357:983–1019, 1999

1999

[58] [58]

Ismailov

V. Ismailov. On shallow feedforward neural networks with inputs from a topological space.forthcoming in Annals of Mathematics and Artificial Intelligence, 2026

2026

[59] [59]

Jakubowski

A. Jakubowski. The Skorokhod space in functional convergence: a short introduction. InInternational con- ference: Skorokhod Space, volume 50, pages 11–18, 2007

2007

[60] [60]

S. Kaijser. A note on dual Banach spaces.Mathematica Scandinavica, 41(2):325–330, 1977

1977

[61] [61]

Keller.Differential calculus in locally convex spaces

H. Keller.Differential calculus in locally convex spaces. Lecture Notes in Mathematics; 417. Springer-Verlag, Berlin, Germany, 1st edition, 1974

1974

[62] [62]

Kidger, J

P. Kidger, J. Foster, X. Li, and T. J. Lyons. Neural SDEs as infinite-dimensional GANs. InInternational Conference on Machine Learning, pages 5453–5463. PMLR, 2021

2021

[63] [63]

D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Y. Bengio and Y. LeCun, editors, 3rd International Conference on Learning Representations, 2015, Conference Track Proceedings, May 2015

2015

[64] [64]

F. J. Kiraly and H. Oberhauser. Kernels for sequentially ordered data.Journal of Machine Learning Research, 20(31):1–45, 2019

2019

[65] [65]

Korevaar

J. Korevaar. Distribution proof of Wiener’s Tauberian theorem.Proceedings of the American Mathematical Society, 16(3):353–355, 1965

1965

[66] [66]

Kovachki, Z

N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023

2023

[67] [67]

Kratsios and I

A. Kratsios and I. Bilokopytov. Non-Euclidean universal approximation.Advances in Neural Information Processing Systems, 33:10635–10646, 2020

2020

[68] [68]

Kratsios, C

A. Kratsios, C. Liu, M. Lassas, M. V. de Hoop, and I. Dokmanić. An approximation theory for metric space-valued functions with a view towards deep learning.Preprint arXiv:2304.12231, 2023

work page arXiv 2023

[69] [69]

Kratsios, A

A. Kratsios, A. Neufeld, and P. Schmocker. Generative neural operators of log-complexity can simultaneously solve infinitely many convex programs.Preprint arXiv:2508.14995, 2025

work page arXiv 2025

[70] [70]

Kriegl and P

A. Kriegl and P. W. Michor.The convenient setting of global analysis, volume 53 ofMathematical surveys and monographs. American Mathematical Society, Providence, R.I, 1997

1997

[71] [71]

Krizhevsky, I

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural net- works. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012

2012

[72] [72]

Lancien and E

G. Lancien and E. Pernecká. Approximation properties and Schauder decompositions in Lipschitz-free spaces. Journal of Functional Analysis, 264(10):2323–2334, 2013

2013

[73] [73]

Lanthaler, S

S. Lanthaler, S. Mishra, and G. E. Karniadakis. Error estimates for DeepONets: a deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications, 6(1), 2022

2022

[74] [74]

Leshno, Vladimir Ya

M. Leshno, Vladimir Ya. Lin, A. Pinkus, and S. Schocken. Multilayer feedforward networks with a nonpoly- nomial activation function can approximate any function.Neural Networks, 6(6):861–867, 1993

1993

[75] [75]

Learning from the past, predicting the statistics for the future, learning an evolving system

D. Levin, T. Lyons, and H. Ni. Learning from the past, predicting the statistics for the future, learning an evolving system.Preprint arXiv:1309.0260, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[76] [76]

Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations.Preprint arXiv:2010.08895, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[77] [77]

Lindenstrauss and L

J. Lindenstrauss and L. Tzafriri.Classical Banach spaces I and II. Classics in mathematics. Springer, Berlin, 1996

1996

[78] [78]

J. G. Llavona.Approximation of Continuously Differentiable Functions, volume 130 ofNorth-Holland Math- ematics Studies. North-Holland, 1986. 76 P. SCHMOCKER AND J. TEICHMANN

1986

[79] [79]

L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

2021

[80] [80]

Lyons, S

T. Lyons, S. Nejad, and I. Perez Arribas. Non-parametric pricing and hedging of exotic derivatives.Applied Mathematical Finance, 27(6):457–494, 2020

2020