A descent-free method recovers the singularity order k of dead directions in neural networks from the directional-Fisher rate, classifies them, and assembles global learning coefficients matching closed forms.
hub
Optimizing neural networks with kronecker-factored approximate curvature
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
years
2026 13verdicts
UNVERDICTED 13roles
background 2polarities
background 2representative citing papers
Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.
Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.
Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.
New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.
Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.
Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.
JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.
Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.
Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.
Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.
Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.
Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.
citing papers explorer
No citing papers match the current filters.