Weighted universal approximation of differentiable maps on infinite-dimensional manifolds
Pith reviewed 2026-06-30 11:30 UTC · model grok-4.3
The pith
A weighted Nachbin theorem establishes universal approximation for differentiable maps on infinite-dimensional manifolds including their derivatives.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By proving a weighted Nachbin theorem, the authors show that the function algebras generated by functional neural networks are dense in the space of differentiable maps on weighted manifolds, where the topology controls both the maps and their derivatives. This produces universal approximation theorems for differentiable maps that hold on infinite-dimensional weighted manifolds rather than only on compact sets, and it implies approximation results for non-anticipative functionals including horizontal and vertical derivatives. Linear functions of the signature are shown to approximate path-space functionals together with their directional derivatives.
What carries the argument
The weighted Nachbin theorem, which supplies density of suitable subalgebras in the space of differentiable functions on a weighted manifold with respect to a topology that includes derivative control.
If this is right
- Functional neural networks approximate non-anticipative functionals together with their horizontal and vertical derivatives.
- Linear functions of the signature approximate path-space functionals and their directional derivatives.
- The approximation result applies simultaneously to a map and its derivatives on infinite-dimensional weighted manifolds.
- The density holds in a topology that controls derivatives, extending beyond the usual compact-set setting.
Where Pith is reading between the lines
- If the weighted manifold condition holds for a given class of activations, the same density argument could be checked for higher-order derivatives.
- The signature approximation result suggests testing the method on concrete path-dependent problems arising in stochastic processes.
- The framework may connect to existing signature-based models by supplying derivative control that those models currently lack.
Load-bearing premise
The input space must admit a weighted manifold structure that makes the Nachbin-type density result hold simultaneously for the maps and their derivatives under the chosen activation functions.
What would settle it
A concrete counterexample would be a specific weighted manifold together with a differentiable map such that no sequence of functional neural networks approximates both the map and its derivative uniformly to arbitrary precision.
Figures
read the original abstract
We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the derivatives. A FNN maps the input from a possibly infinite-dimensional weighted manifold to the real-valued hidden layer, on which a non-linear scalar activation function is applied, and then returns the output into a Banach space via some linear readouts. By proving a weighted Nachbin theorem, we establish a universal approximation theorem for differentiable maps, which goes beyond the usual formulation on compact sets and also includes the approximation of the derivatives. This leads us to approximation results for non-anticipative functionals including the horizontal and vertical derivatives. As a further application, we show that linear functions of the signature are able to approximate path space functionals including their directional derivatives.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript generalizes the universal approximation theorem for functional neural networks (FNNs) mapping from possibly infinite-dimensional weighted manifolds to Banach spaces. It proves a weighted Nachbin theorem to establish density results that simultaneously approximate differentiable maps and their derivatives, extending beyond compact sets. Applications include approximation of non-anticipative functionals (with horizontal and vertical derivatives) and linear functionals of the signature for path-space functionals including directional derivatives.
Significance. If the weighted Nachbin theorem and its applications hold, the result would extend classical UATs to differentiable maps on non-compact infinite-dimensional spaces, providing a tool for approximation theory in functional analysis with potential relevance to stochastic analysis and signature methods. The explicit inclusion of derivative approximation and the weighted setting are the main novelties.
major comments (2)
- [Abstract] The central claim rests on the existence of a weighted manifold structure compatible with the Nachbin density result for differentiable maps (abstract and the paragraph introducing the weighted Nachbin theorem). Without an explicit definition of this structure, the precise conditions on the weight function, and verification that the activation functions satisfy the required density simultaneously for the map and derivatives, the proof cannot be assessed for correctness.
- The transition from the weighted Nachbin theorem to the approximation of non-anticipative functionals (including horizontal and vertical derivatives) and to signature approximations requires additional technical steps that are not verifiable from the provided abstract; these steps appear load-bearing for the applications but lack outlined proofs or references to prior results.
minor comments (1)
- Clarify the precise statement of the weighted Nachbin theorem (e.g., the topology on the space of differentiable maps and the form of the weight) in the introduction or dedicated section.
Simulated Author's Rebuttal
We thank the referee for their report and the opportunity to clarify points from the manuscript. We address each major comment below, noting that the referee's concerns appear to stem from the abstract alone; the full text contains the requested definitions, conditions, and proof outlines.
read point-by-point responses
-
Referee: [Abstract] The central claim rests on the existence of a weighted manifold structure compatible with the Nachbin density result for differentiable maps (abstract and the paragraph introducing the weighted Nachbin theorem). Without an explicit definition of this structure, the precise conditions on the weight function, and verification that the activation functions satisfy the required density simultaneously for the map and derivatives, the proof cannot be assessed for correctness.
Authors: The full manuscript explicitly defines the weighted manifold structure immediately after introducing the weighted Nachbin theorem, specifies the precise conditions on the weight function required for compatibility with the density result, and verifies in the proof that the chosen activation functions achieve simultaneous density for both the maps and their derivatives. These elements are standardly located in the body rather than the abstract, which is a high-level summary. revision: no
-
Referee: [—] The transition from the weighted Nachbin theorem to the approximation of non-anticipative functionals (including horizontal and vertical derivatives) and to signature approximations requires additional technical steps that are not verifiable from the provided abstract; these steps appear load-bearing for the applications but lack outlined proofs or references to prior results.
Authors: The full manuscript details the technical transitions in the dedicated applications section. The steps from the weighted Nachbin theorem to the approximation results for non-anticipative functionals (including horizontal and vertical derivatives) and signature-based approximations are outlined with explicit references to prior results on these topics, and the proofs are provided in the text. revision: no
Circularity Check
No significant circularity identified
full rationale
The paper claims to prove a weighted Nachbin theorem that directly yields a UAT for differentiable maps on weighted infinite-dimensional manifolds, including derivative approximation. No equations, definitions, or derivation steps are exhibited that reduce any claimed result to its inputs by construction, fitted parameters renamed as predictions, or load-bearing self-citations. The abstract and description frame the work as an independent proof extending prior results without self-referential reductions, making the derivation self-contained against external mathematical benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard results from functional analysis on Banach spaces, manifolds, and weighted topologies
Reference graph
Works this paper leans on
-
[1]
Acciaio, A
B. Acciaio, A. Kratsios, and G. Pammer. Designing universal causal deep learning models: The geometric (hyper)transformer.Mathematical Finance, 34(2):671–735, 2024
2024
-
[2]
Ambrosio, N
L. Ambrosio, N. Fusco, and D. Pallara.Functions of bounded variation and free discontinuity problems. Oxford science publications. Clarendon Press, Oxford, 2000
2000
-
[3]
R. M. Aron and J. B. Prolla. Polynomial approximation of differentiable functions on Banach spaces.Journal für die reine und angewandte Mathematik, 313:195–216, 1980
1980
-
[4]
A. R. Barron. Universal approximation bounds for superpositions of a sigmoidal function.IEEE Transactions on Information Theory, 39(3):930–945, 1993
1993
-
[5]
Bastiani
A. Bastiani. Applications différentiables et variétés différentiables de dimension infinie.Journal d’Analyse Mathématique, 13:1–114, 1964
1964
-
[6]
Bayer, P
C. Bayer, P. P. Hager, S. Riedel, and J. Schoenmakers. Optimal stopping with signatures.The Annals of Applied Probability, 33(1):238–273, 2023
2023
-
[7]
Bayer, L
C. Bayer, L. Pelizzari, and J. Schoenmakers. Primal and dual optimal stopping with signatures.Finance and Stochastics, 29:981–1014, 2025
2025
-
[8]
F. E. Benth, N. Detering, and L. Galimberti. Neural networks in Fréchet spaces.Annals of Mathematics and Artificial Intelligence, 91:75–103, 2023
2023
-
[9]
M. S. Berger.Nonlinearity and functional analysis: Lectures on nonlinear problems in mathematical analysis. Pure and applied mathematics, a series of monographs and textbooks; v. 74. Academic Press, New York, 1977
1977
-
[10]
Bernstein
S. Bernstein. Le problème de l’approximation des fonctions continues sur tout l’axe réel et l’une de ses applications.Bulletin de la Société Mathématique de France, 52:399–410, 1924
1924
-
[11]
Bierstedt.Gewichtete Räume stetiger vektorwertiger Funktionen und das injektive Tensorprodukt
K.-D. Bierstedt.Gewichtete Räume stetiger vektorwertiger Funktionen und das injektive Tensorprodukt. PhD thesis, Johannes-Gutenberg Universität Mainz, Mainz, 1971
1971
-
[12]
Billingsley.Convergence of probability measures
P. Billingsley.Convergence of probability measures. Wiley series in probability and statistics. Probability and statistics section. Wiley, New York, 2nd ed. edition, 1999
1999
-
[13]
Blessing, R
J. Blessing, R. Denk, M. Kupper, and M. Nendel. Convex monotone semigroups and their generators with respect toΓ-convergence.Journal of Functional Analysis, 288(8):110841, 2025
2025
-
[14]
Boedihardjo, X
H. Boedihardjo, X. Geng, T. Lyons, and D. Yang. The signature of a rough path: Uniqueness.Advances in Mathematics, 293:720–737, 2016
2016
-
[15]
Bölcskei, P
H. Bölcskei, P. Grohs, G. Kutyniok, and P. Petersen. Optimal approximation with sparsely connected deep neural networks.SIAM Journal on Mathematics of Data Science, 1:8–45, 2019
2019
-
[16]
Brézis.Functional Analysis, Sobolev Spaces and Partial Differential Equations
H. Brézis.Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext. Springer, New York, 2011
2011
-
[17]
E. J. Candès.Ridgelets: Theory and Applications. PhD thesis, Stanford University, 1998.https://candes. su.domains/publications/downloads/Thesis.pdf
1998
-
[18]
M. Ceylan and D. J. Prömel. Global universal approximation with Brownian signatures.Preprint arXiv:2512.16396, 2025
-
[19]
K.-T. Chen. Integration of paths, geometric invariants and a generalized Baker-Hausdorff formula.Annals of Mathematics, 65(1):163–178, 1957
1957
-
[20]
Chen and H
T. Chen and H. Chen. Approximation capability to functions of several variables, nonlinear functionals, and operators by radial basis function neural networks.IEEE Transactions on Neural Networks, 6(4):904–910, 1995
1995
-
[21]
Cont.Functional Ito Calculus and functional Kolmogorov equations, pages 123–208
R. Cont.Functional Ito Calculus and functional Kolmogorov equations, pages 123–208. Advanced Courses in Mathematics. Birkhauser, Basel, 2016. Lecture Notes of the Barcelona Summer School in Stochastic Analysis, July 2012. 74 P. SCHMOCKER AND J. TEICHMANN
2016
-
[22]
Cont and D.-A
R. Cont and D.-A. Fournié. Change of variable formulas for non-anticipative functionals on path space. Journal of Functional Analysis, 259(4):1043–1072, 2010
2010
-
[23]
Cont and D.-A
R. Cont and D.-A. Fournié. Functional Itô calculus and stochastic integral representation of martingales. Annals of Probability, 41(1):109–133, 01 2013
2013
- [24]
-
[25]
Cuchiero, G
C. Cuchiero, G. Gazzani, and S. Svaluto-Ferro. Signature-based models: Theory and calibration.SIAM Journal on Financial Mathematics, 14(3):910–957, 2023
2023
-
[26]
Cuchiero and J
C. Cuchiero and J. Möller. Signature methods in stochastic portfolio theory.SIAM Journal on Financial Mathematics, 16(4):1239–1303, 2025
2025
-
[27]
Cuchiero, F
C. Cuchiero, F. Primavera, and S. Svaluto-Ferro. Universal approximation theorems for continuous functions of càdlàg paths and Lévy-type signature models.Finance and Stochastics, 29:289–342, 2025
2025
-
[28]
Cuchiero, P
C. Cuchiero, P. Schmocker, and J. Teichmann. Global universal approximation of functional input maps on weighted spaces.Constructive Approximation, 63:537–612, 2026
2026
-
[29]
Cuchiero and J
C. Cuchiero and J. Teichmann. Generalized Feller processes and Markovian lifts of stochastic Volterra pro- cesses: the affine case.Journal of Evolution Equations, 20:1–48, 2020
2020
-
[30]
G. Cybenko. Approximation by superpositions of a sigmoidal function.Mathematics of Control, Signals and Systems, 2(4):303–314, 1989
1989
-
[31]
Dieudonné.Foundations of modern analysis
J. Dieudonné.Foundations of modern analysis. Enlarged and corrected printing. Academic Press, New York, London, 1969
1969
-
[32]
J. Dixmier. Sur un théorème de Banach.Duke Mathematical Journal, 15(4):1057–1071, 1948
1948
-
[33]
P.DörsekandJ.Teichmann.Asemigrouppointofviewonsplittingschemesforstochastic(partial)differential equations.Preprint arXiv:1011.2651, 2010
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[34]
B. Dupire. Functional Itô calculus. Technical report, Bloomberg, 2009. Bloomberg Portfolio Research Paper No. 2009-04-FRONTIERS
2009
-
[35]
S. N. Ethier and T. G. Kurtz.Markov processes: Characterization and convergence. John Wiley & Sons, 2005
2005
-
[36]
M. Fliess. Fonctionnelles causales non linéaires et indéterminées non commutatives.Bulletin de la Société Mathématique de France, 109:3–40, 1981
1981
-
[37]
G. B. Folland.Fourier analysis and its applications. Brooks/Cole Publishing Company, Belmont, California, 1st edition, 1992
1992
-
[38]
H. Föllmer. Calcul d’Ito sans probabilités.Séminaire de probabilités de Strasbourg, 15:143–150, 1981
1981
-
[39]
P. K. Friz and M. Hairer.A Course on Rough Paths: With an Introduction to Regularity Structures. Uni- versitext. Springer International Publishing, Cham, 2nd edition, 2020
2020
-
[40]
Cambridge Studies in Advanced Mathematics
P.K.FrizandN.B.Victoir.Multidimensional Stochastic Processes as Rough Paths: Theory and Applications. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2010
2010
-
[41]
Galimberti
L. Galimberti. Neural networks in non-metric spaces.forthcoming in Analysis and Applications, 2026
2026
-
[42]
Galimberti, A
L. Galimberti, A. Kratsios, and G. Livieri. Designing universal causal deep learning models: The case of infinite-dimensional dynamical systems from stochastic analysis.forthcoming in Constructive Approximation, 2026
2026
-
[43]
Gierjatowicz, M
P. Gierjatowicz, M. Sabate-Vidales, D. Šiška, L. Szpruch, and Ž. Žurič. Robust pricing and hedging via neural SDEs.Journal of Computational Finance, 26:1–32, 2020
2020
-
[44]
Glöckner
H. Glöckner. Infinite-dimensional Lie groups without completeness restrictions.Banach Center Publications, 55:43–59, 2002
2002
-
[45]
Fundamentals of submersions and immersions between infinite-dimensional manifolds
H. Glöckner. Fundamentals of submersions and immersions between infinite-dimensional manifolds.Preprint arXiv:1502.05795, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[46]
Gómez Gil and J
J. Gómez Gil and J. G. Llavona. Polynomial approximation of weakly differentiable functions on Banach spaces.Proceedings of the Royal Irish Academy. Section A: Mathematical and Physical Sciences, 82A(2):141– 150, 1982
1982
-
[47]
Gonon, L
L. Gonon, L. Grigoryeva, and J.-P. Ortega. Infinite-dimensional reservoir computing.Neural Networks, 179:106486, 2024
2024
-
[48]
Goodfellow, Y
I. Goodfellow, Y. Bengio, and A. Courville.Deep Learning. MIT Press, 2016
2016
-
[49]
Grigoryeva and J.-P
L. Grigoryeva and J.-P. Ortega. Echo state networks are universal.Neural Networks, 108:495–508, 2018
2018
-
[50]
Hambly and T
B. Hambly and T. Lyons. Uniqueness for the signature of a path of bounded variation and the reduced path group.Annals of Mathematics, 171(1):109–167, 2010
2010
-
[51]
Hausdorff
F. Hausdorff. Die symbolische Exponentialformel in der Gruppentheorie.Ber. Verh. Kgl. Sächs. Ges. Wiss., 58:19–48, 1906. WEIGHTED UNIVERSAL APPROXIMATION OF DIFFERENTIABLE MAPS 75
1906
-
[52]
Hinton, L
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups.IEEE Signal Processing Magazine, 29(6):82–97, 2012
2012
-
[53]
K. Hornik. Approximation capabilities of multilayer feedforward networks.Neural Networks, 4(2):251–257, 1991
1991
-
[54]
Hornik, M
K. Hornik, M. Stinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2(5):359–366, 1989
1989
-
[55]
Hornik, M
K. Hornik, M. Stinchcombe, and H. White. Universal approximation of an unknown mapping and its deriva- tives using multilayer feedforward networks.Neural Networks, 3(5):551–560, 1990
1990
-
[56]
Hytönen, J
T. Hytönen, J. van Neerven, M. Veraar, and L. Weis.Analysis in Banach Spaces, volume 63 ofErgebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. Springer, Cham, 2016
2016
-
[57]
Iserles and S
A. Iserles and S. P. Nørsett. On the solution of linear differential equations in Lie groups.Philos. Trans. Roy. Soc. A, 357:983–1019, 1999
1999
-
[58]
Ismailov
V. Ismailov. On shallow feedforward neural networks with inputs from a topological space.forthcoming in Annals of Mathematics and Artificial Intelligence, 2026
2026
-
[59]
Jakubowski
A. Jakubowski. The Skorokhod space in functional convergence: a short introduction. InInternational con- ference: Skorokhod Space, volume 50, pages 11–18, 2007
2007
-
[60]
S. Kaijser. A note on dual Banach spaces.Mathematica Scandinavica, 41(2):325–330, 1977
1977
-
[61]
Keller.Differential calculus in locally convex spaces
H. Keller.Differential calculus in locally convex spaces. Lecture Notes in Mathematics; 417. Springer-Verlag, Berlin, Germany, 1st edition, 1974
1974
-
[62]
Kidger, J
P. Kidger, J. Foster, X. Li, and T. J. Lyons. Neural SDEs as infinite-dimensional GANs. InInternational Conference on Machine Learning, pages 5453–5463. PMLR, 2021
2021
-
[63]
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In Y. Bengio and Y. LeCun, editors, 3rd International Conference on Learning Representations, 2015, Conference Track Proceedings, May 2015
2015
-
[64]
F. J. Kiraly and H. Oberhauser. Kernels for sequentially ordered data.Journal of Machine Learning Research, 20(31):1–45, 2019
2019
-
[65]
Korevaar
J. Korevaar. Distribution proof of Wiener’s Tauberian theorem.Proceedings of the American Mathematical Society, 16(3):353–355, 1965
1965
-
[66]
Kovachki, Z
N. Kovachki, Z. Li, B. Liu, K. Azizzadenesheli, K. Bhattacharya, A. Stuart, and A. Anandkumar. Neural operator: Learning maps between function spaces with applications to PDEs.Journal of Machine Learning Research, 24(89):1–97, 2023
2023
-
[67]
Kratsios and I
A. Kratsios and I. Bilokopytov. Non-Euclidean universal approximation.Advances in Neural Information Processing Systems, 33:10635–10646, 2020
2020
-
[68]
A. Kratsios, C. Liu, M. Lassas, M. V. de Hoop, and I. Dokmanić. An approximation theory for metric space-valued functions with a view towards deep learning.Preprint arXiv:2304.12231, 2023
-
[69]
A. Kratsios, A. Neufeld, and P. Schmocker. Generative neural operators of log-complexity can simultaneously solve infinitely many convex programs.Preprint arXiv:2508.14995, 2025
-
[70]
Kriegl and P
A. Kriegl and P. W. Michor.The convenient setting of global analysis, volume 53 ofMathematical surveys and monographs. American Mathematical Society, Providence, R.I, 1997
1997
-
[71]
Krizhevsky, I
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural net- works. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012
2012
-
[72]
Lancien and E
G. Lancien and E. Pernecká. Approximation properties and Schauder decompositions in Lipschitz-free spaces. Journal of Functional Analysis, 264(10):2323–2334, 2013
2013
-
[73]
Lanthaler, S
S. Lanthaler, S. Mishra, and G. E. Karniadakis. Error estimates for DeepONets: a deep learning framework in infinite dimensions.Transactions of Mathematics and Its Applications, 6(1), 2022
2022
-
[74]
Leshno, Vladimir Ya
M. Leshno, Vladimir Ya. Lin, A. Pinkus, and S. Schocken. Multilayer feedforward networks with a nonpoly- nomial activation function can approximate any function.Neural Networks, 6(6):861–867, 1993
1993
-
[75]
Learning from the past, predicting the statistics for the future, learning an evolving system
D. Levin, T. Lyons, and H. Ni. Learning from the past, predicting the statistics for the future, learning an evolving system.Preprint arXiv:1309.0260, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[76]
Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar. Fourier neural operator for parametric partial differential equations.Preprint arXiv:2010.08895, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[77]
Lindenstrauss and L
J. Lindenstrauss and L. Tzafriri.Classical Banach spaces I and II. Classics in mathematics. Springer, Berlin, 1996
1996
-
[78]
J. G. Llavona.Approximation of Continuously Differentiable Functions, volume 130 ofNorth-Holland Math- ematics Studies. North-Holland, 1986. 76 P. SCHMOCKER AND J. TEICHMANN
1986
-
[79]
L. Lu, P. Jin, G. Pang, Z. Zhang, and G. E. Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021
2021
-
[80]
Lyons, S
T. Lyons, S. Nejad, and I. Perez Arribas. Non-parametric pricing and hedging of exotic derivatives.Applied Mathematical Finance, 27(6):457–494, 2020
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.