Optimizing neural networks with kronecker-factored approximate curvature

James Martens, Roger Grosse · 2020 · arXiv 1503.05671

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.

Natural Riemannian gradient for learning functional tensor networks

math.OC · 2026-04-10 · unverdicted · novelty 6.0

Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.

Loss-aware state space geometry for quantum variational algorithms

quant-ph · 2026-04-07 · unverdicted · novelty 6.0

Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.

Natural gradient descent with momentum

cs.LG · 2026-04-16 · unverdicted · novelty 5.0

Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.

citing papers explorer

Showing 5 of 5 citing papers.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 32
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces cs.AI · 2026-05-04 · unverdicted · none · ref 30
JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.
Natural Riemannian gradient for learning functional tensor networks math.OC · 2026-04-10 · unverdicted · none · ref 29
Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.
Loss-aware state space geometry for quantum variational algorithms quant-ph · 2026-04-07 · unverdicted · none · ref 94
Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.
Natural gradient descent with momentum cs.LG · 2026-04-16 · unverdicted · none · ref 20
Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.

Optimizing neural networks with kronecker-factored approximate curvature

fields

years

verdicts

representative citing papers

citing papers explorer