hub

Optimizing neural networks with kronecker-factored approximate curvature

· 2015 · arXiv 1503.05671

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Measuring Dead Directions: Decomposing and Classifying Singular Structure off Canonical Alignment

cs.LG · 2026-07-01 · unverdicted · novelty 7.0

A descent-free method recovers the singularity order k of dead directions in neural networks from the directional-Fisher rate, classifies them, and assembles global learning coefficients matching closed forms.

Dead-Direction Signatures: A Cheap Spectral Reading of Singular Complexity

cs.LG · 2026-06-19 · unverdicted · novelty 7.0

Dead-Direction Signatures provide closed-form spectral readings of dead directions in network activations and gradients that track rank deficits at singular minima, offering a cheap directional alternative to SGLD-based LLC.

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

cs.LG · 2026-06-17 · unverdicted · novelty 7.0

The normalized inverse-scale direction of LayerNorm's affine parameters is an exact algebraic kernel of the post-final-norm centred activation covariance for any input distribution in LayerNorm transformers.

Dead Directions: Geometric Singular Learning

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

An Optimisation Framework for the Well-Conditioned Training of Physics-Informed Neural Networks

cs.LG · 2026-07-02 · unverdicted · novelty 6.0

DSGNAR optimization framework for PINNs reaches relative L2 errors of 3e-16 in double precision and improves prior results by 5-8 orders of magnitude on Burgers' and high-dimensional Poisson problems while remaining faster.

Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.

Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

cs.DC · 2026-05-15 · unverdicted · novelty 6.0

Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.

Solving Classical and Quantum Spin Glasses with Deep Boltzmann Quantum States

cond-mat.dis-nn · 2026-05-15 · unverdicted · novelty 6.0

Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.

Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.

Natural Riemannian gradient for learning functional tensor networks

math.OC · 2026-04-10 · unverdicted · novelty 6.0

Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.

Loss-aware state space geometry for quantum variational algorithms

quant-ph · 2026-04-07 · unverdicted · novelty 6.0

Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.

On subspace-constrained preconditioning for randomized iterative methods

math.NA · 2026-05-28 · unverdicted · novelty 5.0

Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.

Natural gradient descent with momentum

cs.LG · 2026-04-16 · unverdicted · novelty 5.0

Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.

Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization

cs.LG · 2026-06-29 · unverdicted · novelty 4.0

Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.

citing papers explorer

Showing 16 of 16 citing papers after filters.

Measuring Dead Directions: Decomposing and Classifying Singular Structure off Canonical Alignment cs.LG · 2026-07-01 · unverdicted · none · ref 9
A descent-free method recovers the singularity order k of dead directions in neural networks from the directional-Fisher rate, classifies them, and assembles global learning coefficients matching closed forms.
Dead-Direction Signatures: A Cheap Spectral Reading of Singular Complexity cs.LG · 2026-06-19 · unverdicted · none · ref 20
Dead-Direction Signatures provide closed-form spectral readings of dead directions in network activations and gradients that track rank deficits at singular minima, offering a cheap directional alternative to SGLD-based LLC.
Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale cs.LG · 2026-06-17 · unverdicted · none · ref 25
The normalized inverse-scale direction of LayerNorm's affine parameters is an exact algebraic kernel of the post-final-norm centred activation covariance for any input distribution in LayerNorm transformers.
Dead Directions: Geometric Singular Learning cs.LG · 2026-06-04 · unverdicted · none · ref 27
Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences cs.LG · 2026-04-22 · unverdicted · none · ref 32
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
An Optimisation Framework for the Well-Conditioned Training of Physics-Informed Neural Networks cs.LG · 2026-07-02 · unverdicted · none · ref 17
DSGNAR optimization framework for PINNs reaches relative L2 errors of 3e-16 in double precision and improves prior results by 5-8 orders of magnitude on Burgers' and high-dimensional Poisson problems while remaining faster.
Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior cs.LG · 2026-06-22 · unverdicted · none · ref 32
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run cs.LG · 2026-05-26 · unverdicted · none · ref 41
New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.
Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training cs.DC · 2026-05-15 · unverdicted · none · ref 11
Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.
Solving Classical and Quantum Spin Glasses with Deep Boltzmann Quantum States cond-mat.dis-nn · 2026-05-15 · unverdicted · none · ref 133
Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.
Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces cs.AI · 2026-05-04 · unverdicted · none · ref 30
JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.
Natural Riemannian gradient for learning functional tensor networks math.OC · 2026-04-10 · unverdicted · none · ref 29
Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.
Loss-aware state space geometry for quantum variational algorithms quant-ph · 2026-04-07 · unverdicted · none · ref 94
Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.
On subspace-constrained preconditioning for randomized iterative methods math.NA · 2026-05-28 · unverdicted · none · ref 48
Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.
Natural gradient descent with momentum cs.LG · 2026-04-16 · unverdicted · none · ref 20
Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.
Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization cs.LG · 2026-06-29 · unverdicted · none · ref 21
Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.

Optimizing neural networks with kronecker-factored approximate curvature

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer