hub

Optimizing neural networks with kronecker-factored approximate curvature

URLhttps://arxiv · 2015 · arXiv 1503.05671

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Measuring Dead Directions: Decomposing and Classifying Singular Structure off Canonical Alignment

cs.LG · 2026-07-01 · unverdicted · novelty 7.0

A descent-free method recovers the singularity order k of dead directions in neural networks from the directional-Fisher rate, classifies them, and assembles global learning coefficients matching closed forms.

Dead Directions: Geometric Singular Learning

cs.LG · 2026-06-04 · unverdicted · novelty 7.0

Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.

Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.

Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior

cs.LG · 2026-06-22 · unverdicted · novelty 6.0

Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.

Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run

cs.LG · 2026-05-26 · unverdicted · novelty 6.0

New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training

cs.DC · 2026-05-15 · unverdicted · novelty 6.0

Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.

Solving Classical and Quantum Spin Glasses with Deep Boltzmann Quantum States

cond-mat.dis-nn · 2026-05-15 · unverdicted · novelty 6.0

Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.

Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces

cs.AI · 2026-05-04 · unverdicted · novelty 6.0

JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.

Natural Riemannian gradient for learning functional tensor networks

math.OC · 2026-04-10 · unverdicted · novelty 6.0

Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.

Loss-aware state space geometry for quantum variational algorithms

quant-ph · 2026-04-07 · unverdicted · novelty 6.0

Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.

On subspace-constrained preconditioning for randomized iterative methods

math.NA · 2026-05-28 · unverdicted · novelty 5.0

Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.

Natural gradient descent with momentum

cs.LG · 2026-04-16 · unverdicted · novelty 5.0

Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.

Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization

cs.LG · 2026-06-29 · unverdicted · novelty 4.0

Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Optimizing neural networks with kronecker-factored approximate curvature

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer