Recognition: unknown
Natural Riemannian gradient for learning functional tensor networks
Pith reviewed 2026-05-10 16:56 UTC · model grok-4.3
The pith
Natural Riemannian gradients optimize functional tensor networks for any loss function without depending on basis choice.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A natural Riemannian gradient descent approach, derived from Amari's natural gradient, produces a search direction that is independent of the basis of the functional tensor product space. This property holds for both the factorized and the manifold-based representations of low-rank functional tree tensor networks. For any loss function the method supplies a hierarchy of efficient approximations that can be used to compute parameter updates, and these updates yield measurably faster convergence on classification tasks than standard Riemannian gradient descent.
What carries the argument
The natural Riemannian gradient computed in the functional tensor product space, which yields a basis-independent search direction for parameter updates.
Load-bearing premise
The hierarchy of efficient approximations to the true natural Riemannian gradient retains enough accuracy to produce the claimed convergence gains without introducing instability or systematic bias.
What would settle it
On the same classification datasets, the natural-gradient updates produce convergence curves and final accuracies indistinguishable from or worse than those obtained by ordinary Riemannian gradient descent.
Figures
read the original abstract
We consider machine learning tasks with low-rank functional tree tensor networks (TTN) as the learning model. While in the case of least-squares regression, low-rank functional TTNs can be efficiently optimized using alternating optimization, this is not directly possible in other problems, such as multinomial logistic regression. We propose a natural Riemannian gradient descent type approach applicable to arbitrary losses which is based on the natural gradient by Amari. In particular, the search direction obtained by the natural gradient is independent of the choice of basis of the underlying functional tensor product space. Our framework applies to both the factorized and manifold-based approach for representing the functional TTN. For practical application, we propose a hierarchy of efficient approximations to the true natural Riemannian gradient for computing the updates in the parameter space. Numerical experiments confirm our theoretical findings on common classification datasets and show that using natural Riemannian gradient descent for learning considerably improves convergence behavior when compared to standard Riemannian gradient methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a natural Riemannian gradient descent method for optimizing low-rank functional tree tensor networks (TTNs) as models in machine learning tasks. It extends Amari's natural gradient to arbitrary loss functions (beyond least-squares regression), establishes that the resulting search direction is independent of the choice of basis for the underlying functional tensor product space, covers both factorized and manifold-based TTN representations, introduces a hierarchy of efficient approximations to the true natural gradient for practical computation, and reports numerical experiments on classification datasets demonstrating improved convergence compared to standard Riemannian gradient descent.
Significance. If the central claims hold, the work offers a principled geometric optimization framework that generalizes Riemannian methods to functional TTNs with general losses while preserving intrinsic basis independence. This could enable more effective training of tensor network models in classification and other non-regression settings. The explicit use of an intrinsic Riemannian metric (rather than coordinate-dependent gradients) and the hierarchy of approximations are notable strengths; the empirical results provide concrete evidence of practical benefit. The approach builds directly on established Riemannian optimization and natural gradient literature without introducing circularity or undefined quantities.
major comments (2)
- [§4] §4 (Hierarchy of approximations): The central practical claim relies on a hierarchy of approximations to the true natural Riemannian gradient, yet no error bounds, stability analysis, or convergence-rate perturbation results are provided for these approximations. This is load-bearing because the abstract and experiments assert improved convergence; without such analysis it is unclear whether the approximations preserve the claimed advantages or introduce bias/instability for arbitrary losses.
- [§5] §5 (Numerical experiments): The experiments confirm faster convergence, but the text does not report statistical significance tests, variance across random seeds, or ablation studies isolating the effect of the natural gradient versus the choice of approximation level. This weakens the strength of the empirical support for the method's superiority.
minor comments (3)
- [§3] The abstract states that the method applies to both factorized and manifold-based approaches, but the main text should clarify in one place (e.g., §3) whether the basis-independence proof carries over identically to both representations or requires separate arguments.
- Notation for the functional tensor product space and the Riemannian metric could be introduced earlier and used consistently; occasional shifts between coordinate and intrinsic descriptions make the independence argument harder to follow on first reading.
- A short comparison table or paragraph contrasting the proposed natural gradient with the Euclidean gradient and with existing alternating least-squares methods for the regression case would help readers situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment, and recommendation for minor revision. We address the major comments point by point below.
read point-by-point responses
-
Referee: [§4] §4 (Hierarchy of approximations): The central practical claim relies on a hierarchy of approximations to the true natural Riemannian gradient, yet no error bounds, stability analysis, or convergence-rate perturbation results are provided for these approximations. This is load-bearing because the abstract and experiments assert improved convergence; without such analysis it is unclear whether the approximations preserve the claimed advantages or introduce bias/instability for arbitrary losses.
Authors: We thank the referee for this observation. The hierarchy is introduced to ensure computational tractability for high-dimensional functional tensor spaces while preserving the key property of basis independence. General error bounds for arbitrary losses are difficult to derive without additional assumptions on the loss or the functional space, which would narrow the scope of the contribution. In the revised manuscript we will expand §4 with a qualitative discussion of the approximation levels, their design rationale, computational trade-offs, and the fact that the exact natural gradient (and its basis independence) is recovered in the limit of the hierarchy. We will also note the absence of general perturbation bounds as a limitation and direction for future work. revision: partial
-
Referee: [§5] §5 (Numerical experiments): The experiments confirm faster convergence, but the text does not report statistical significance tests, variance across random seeds, or ablation studies isolating the effect of the natural gradient versus the choice of approximation level. This weakens the strength of the empirical support for the method's superiority.
Authors: We agree that the experimental section would benefit from additional statistical rigor. In the revised manuscript we will report results averaged over multiple random seeds together with standard deviations, include statistical significance tests (e.g., paired t-tests) on the convergence metrics, and add ablation studies that vary the approximation level while keeping the natural-gradient formulation fixed. These changes will strengthen the empirical evidence without altering the core claims. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper extends Amari's established natural gradient to the manifold of low-rank functional TTNs via the intrinsic Riemannian metric, yielding basis-independence by construction of the geometry rather than by redefinition or fitting. The hierarchy of approximations is introduced as a practical implementation choice separate from the core claim, and numerical experiments serve as external validation rather than tautological confirmation. No load-bearing step reduces to a self-citation chain, a fitted parameter renamed as prediction, or an ansatz smuggled via prior work by the same authors. The central independence property follows directly from using the manifold's own metric instead of coordinate-dependent gradients.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The space of low-rank functional tree tensor networks forms a Riemannian manifold suitable for gradient-based optimization.
- standard math Amari's natural gradient yields a search direction independent of basis choice in the functional tensor product space.
Reference graph
Works this paper leans on
-
[1]
Absil, R
P.-A. Absil, R. Mahony, and R. Sepulchre.Optimization algorithms on matrix manifolds. Princeton University Press, Princeton, NJ, 2008
2008
-
[2]
Alpaydin and C
E. Alpaydin and C. Kaynak. Optical Recognition of Handwritten Digits. UCI Machine Learning Repository, 1998
1998
-
[3]
S.-I. Amari. Information geometry. InGeometry and nature (Madeira, 1995), volume 203 ofContemp. Math., pages 81–95. Amer. Math. Soc., Providence, RI, 1997
1995
-
[4]
Natural gradient works efficiently in learning.Neural Comput., 10(2): 251–276, 1998
Shun-ichi Amari. Natural gradient works efficiently in learning.Neural Comput., 10(2): 251–276, 1998
1998
-
[5]
American Mathematical Society, Providence, RI; Oxford University Press, Oxford, 2000
Shun-ichi Amari and Hiroshi Nagaoka.Methods of information geometry, volume 191 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI; Oxford University Press, Oxford, 2000
2000
-
[6]
Low-rank tensor methods for partial differential equations.Acta Numer., 32:1–121, 2023
Markus Bachmayr. Low-rank tensor methods for partial differential equations.Acta Numer., 32:1–121, 2023
2023
-
[7]
Tensor networks and hierarchical tensors for the solution of high-dimensional partial differential equations.Found
Markus Bachmayr, Reinhold Schneider, and Andr´ e Uschmajew. Tensor networks and hierarchical tensors for the solution of high-dimensional partial differential equations.Found. Comput. Math., 16(6):1423–1472, 2016
2016
-
[8]
Bader and Tamara G
Brett W. Bader and Tamara G. Kolda. Efficient MATLAB computations with sparse and factored tensors.SIAM J. Sci. Comput., 30(1):205–231, 2007/08
2007
-
[9]
Cambridge University Press, Cambridge, 2023
Nicolas Boumal.An introduction to optimization on smooth manifolds. Cambridge University Press, Cambridge, 2023
2023
-
[10]
Z. Chen, K. Batselier, J. A. K. Suykens, and N. Wong. Parallelized tensor train learning of polynomial classifiers.IEEE Trans. Neural Netw. Learn. Syst., 29(10):4621–4632, 2018
2018
-
[11]
Herrmann
Curt Da Silva and Felix J. Herrmann. Optimization on the hierarchical Tucker manifold— applications to tensor completion.Linear Algebra Appl., 481:131–173, 2015
2015
-
[12]
Approximation and learning with compositional tensor trains.arXiv:2512.18059, 2025
Martin Eigel, Charles Miranda, Anthony Nouy, and David Sommer. Approximation and learning with compositional tensor trains.arXiv:2512.18059, 2025
-
[13]
Gorodetsky and John D
Alex A. Gorodetsky and John D. Jakeman. Gradient-based optimization for regression in the functional tensor-train format.J. Comput. Phys., 374:1219–1238, 2018
2018
-
[14]
A literature survey of low-rank tensor approximation techniques.GAMM-Mitt., 36(1):53–78, 2013
Lars Grasedyck, Daniel Kressner, and Christine Tobler. A literature survey of low-rank tensor approximation techniques.GAMM-Mitt., 36(1):53–78, 2013. 28
2013
-
[15]
Hackbusch and S
W. Hackbusch and S. K¨ uhn. A new scheme for the tensor representation.J. Fourier Anal. Appl., 15(5):706–722, 2009
2009
-
[16]
Springer, Cham, second edition, 2019
Wolfgang Hackbusch.Tensor spaces and numerical tensor calculus. Springer, Cham, second edition, 2019
2019
-
[17]
On manifolds of tensors of fixed TT-rank.Numer
Sebastian Holtz, Thorsten Rohwedder, and Reinhold Schneider. On manifolds of tensors of fixed TT-rank.Numer. Math., 120(4):701–731, 2012
2012
-
[18]
Rieman- nian natural gradient methods.SIAM J
Jiang Hu, Ruicheng Ao, Anthony Man-Cho So, Minghan Yang, and Zaiwen Wen. Rieman- nian natural gradient methods.SIAM J. Sci. Comput., 46(1):A204–A231, 2024
2024
-
[19]
Dongseong Hwang. FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information.arXiv:2405.12807, 2024
-
[20]
Khoromskij.Tensor numerical methods in scientific computing
Boris N. Khoromskij.Tensor numerical methods in scientific computing. De Gruyter, Berlin, 2018
2018
-
[21]
Kingma and Jimmy Ba
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR 2015, January 2017
2015
-
[22]
Tensor-based algorithms for image classification.Algorithms, 12(11):240, 2019
Stefan Klus and Patrick Gelß. Tensor-based algorithms for image classification.Algorithms, 12(11):240, 2019
2019
-
[23]
Low-rank tensor completion by Riemannian optimization.BIT, 54(2):447–468, 2014
Daniel Kressner, Michael Steinlechner, and Bart Vandereycken. Low-rank tensor completion by Riemannian optimization.BIT, 54(2):447–468, 2014
2014
-
[24]
Preconditioned low-rank Riemannian optimization for linear systems with tensor product structure.SIAM J
Daniel Kressner, Michael Steinlechner, and Bart Vandereycken. Preconditioned low-rank Riemannian optimization for linear systems with tensor product structure.SIAM J. Sci. Comput., 38(4):A2018–A2044, 2016
2016
-
[25]
Yann LeCun, Corinna Cortes, and Christopher J. C. Burges. The MNIST database of handwritten digits, 1998
1998
-
[26]
Lee.Introduction to Riemannian manifolds
John M. Lee.Introduction to Riemannian manifolds. Springer, Cham, second edition, 2018
2018
-
[27]
New insights and perspectives on the natural gradient method.Journal of Machine Learning Research, 21(146):1–76, 2020
James Martens. New insights and perspectives on the natural gradient method.Journal of Machine Learning Research, 21(146):1–76, 2020
2020
-
[28]
Optimizing neural networks with kronecker-factored approximate curvature
James Martens and Roger Grosse. Optimizing neural networks with kronecker-factored approximate curvature. In Francis Bach and David Blei, editors,Proceedings of the 32nd In- ternational Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 2408–2417. PMLR, 2015
2015
-
[29]
Optimizing neural networks with kronecker-factored approximate curvature.arXiv:1503.05671, 2020
James Martens and Roger Grosse. Optimizing neural networks with kronecker-factored approximate curvature.arXiv:1503.05671, 2020
-
[30]
Novikov, M
A. Novikov, M. Trofimov, and I. Oseledets. Exponential machines.Bull. Pol. Acad. Sci. Tech. Sci., 66(6):789–797, 2018
2018
-
[31]
I. V. Oseledets. Tensor-train decomposition.SIAM J. Sci. Comput., 33(5):2295–2317, 2011
2011
-
[32]
Radhakrishna Rao
C. Radhakrishna Rao. Information and the accuracy attainable in the estimation of statis- tical parameters.Bull. Calcutta Math. Soc., 37:81–91, 1945. 29
1945
-
[33]
Low-rank Riemannian eigensolver for high-dimensional Hamiltonians.J
Maxim Rakhuba, Alexander Novikov, and Ivan Oseledets. Low-rank Riemannian eigensolver for high-dimensional Hamiltonians.J. Comput. Phys., 396:718–737, 2019
2019
-
[34]
Schneider and M
R. Schneider and M. Oster. Some thoughts on compositional tensor networks. InMultiscale, nonlinear and adaptive approximation II, pages 419–447. Springer, Cham
-
[35]
Cambridge University Press, 2014
Shai Shalev-Shwartz and Shai Ben-David.Understanding Machine Learning. Cambridge University Press, 2014
2014
-
[36]
Riemannian optimization for high-dimensional tensor completion
Michael Steinlechner. Riemannian optimization for high-dimensional tensor completion. SIAM J. Sci. Comput., 38(5):S461–S484, 2016
2016
-
[37]
Miles Stoudenmire
E. Miles Stoudenmire. Learning relevant features of data with multi-scale tensor networks. Quantum Sci. Technol., 3(3):034003, 2018
2018
-
[38]
Supervised learning with tensor networks
Edwin Stoudenmire and David J Schwab. Supervised learning with tensor networks. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 29, pages 4799–4807. Curran Associates, Inc., 2016
2016
-
[39]
The geometry of algorithms using hierarchical tensors.Linear Algebra Appl., 439(1):133–166, 2013
Andr´ e Uschmajew and Bart Vandereycken. The geometry of algorithms using hierarchical tensors.Linear Algebra Appl., 439(1):133–166, 2013
2013
-
[40]
Geometric methods on low-rank matrix and tensor manifolds
Andr´ e Uschmajew and Bart Vandereycken. Geometric methods on low-rank matrix and tensor manifolds. InHandbook of variational methods for nonlinear geometric data, pages 261–313. Springer, Cham, 2020
2020
-
[41]
Marius Willner, Marco Trenti, and Dirk Lebiedz. Riemannian optimization on tree tensor networks with application in machine learning.arXiv:2507.21726, 2025
-
[42]
Expectation-maximization alter- nating least squares for tensor network logistic egression.Frontiers Appl
Naoya Yamauchi, Hidekata Hontani, and Tatsuya Yokota. Expectation-maximization alter- nating least squares for tensor network logistic egression.Frontiers Appl. Math. Stat., 11, 2025. 30
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.