A descent-free method recovers the singularity order k of dead directions in neural networks from the directional-Fisher rate, classifies them, and assembles global learning coefficients matching closed forms.
hub
Optimizing neural networks with kronecker-factored approximate curvature
16 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 16verdicts
UNVERDICTED 16roles
background 2polarities
background 2representative citing papers
Dead-Direction Signatures provide closed-form spectral readings of dead directions in network activations and gradients that track rank deficits at singular minima, offering a cheap directional alternative to SGLD-based LLC.
The normalized inverse-scale direction of LayerNorm's affine parameters is an exact algebraic kernel of the post-final-norm centred activation covariance for any input distribution in LayerNorm transformers.
Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
DSGNAR optimization framework for PINNs reaches relative L2 errors of 3e-16 in double precision and improves prior results by 5-8 orders of magnitude on Burgers' and high-dimensional Poisson problems while remaining faster.
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.
Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.
Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.
JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.
Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.
Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.
Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.
Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.
Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.
citing papers explorer
-
Measuring Dead Directions: Decomposing and Classifying Singular Structure off Canonical Alignment
A descent-free method recovers the singularity order k of dead directions in neural networks from the directional-Fisher rate, classifies them, and assembles global learning coefficients matching closed forms.
-
Dead-Direction Signatures: A Cheap Spectral Reading of Singular Complexity
Dead-Direction Signatures provide closed-form spectral readings of dead directions in network activations and gradients that track rank deficits at singular minima, offering a cheap directional alternative to SGLD-based LLC.
-
Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale
The normalized inverse-scale direction of LayerNorm's affine parameters is an exact algebraic kernel of the post-final-norm centred activation covariance for any input distribution in LayerNorm transformers.
-
Dead Directions: Geometric Singular Learning
Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.
-
Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
-
An Optimisation Framework for the Well-Conditioned Training of Physics-Informed Neural Networks
DSGNAR optimization framework for PINNs reaches relative L2 errors of 3e-16 in double precision and improves prior results by 5-8 orders of magnitude on Burgers' and high-dimensional Poisson problems while remaining faster.
-
Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
-
Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run
New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.
-
Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training
Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.
-
Solving Classical and Quantum Spin Glasses with Deep Boltzmann Quantum States
Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.
-
Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces
JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.
-
Natural Riemannian gradient for learning functional tensor networks
Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.
-
Loss-aware state space geometry for quantum variational algorithms
Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.
-
On subspace-constrained preconditioning for randomized iterative methods
Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.
-
Natural gradient descent with momentum
Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.
-
Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization
Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.