{"total":13,"items":[{"citing_arxiv_id":"2607.00603","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Measuring Dead Directions: Decomposing and Classifying Singular Structure off Canonical Alignment","primary_cat":"cs.LG","submitted_at":"2026-07-01T08:29:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A descent-free method recovers the singularity order k of dead directions in neural networks from the directional-Fisher rate, classifies them, and assembles global learning coefficients matching closed forms.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.30813","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization","primary_cat":"cs.LG","submitted_at":"2026-06-29T18:37:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Gradient Smoothing applies depth-wise smoothing to optimizer updates from base methods like Adam, yielding consistent gains in optimization and generalization on language, RL, diffusion, and vision tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.23591","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Quantifying the Agreement Between Data-Influence and Data-Similarity to Understand LLM Behavior","primary_cat":"cs.LG","submitted_at":"2026-06-22T17:00:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Data-similarity and data-influence produce significantly overlapping rankings of training documents for LLM outputs, with asymmetry allowing a favorable cost-accuracy trade-off.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.05957","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Dead Directions: Geometric Singular Learning","primary_cat":"cs.LG","submitted_at":"2026-06-04T09:54:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Dead directions recover Watanabe's RLCT contribution and triple (λ, m, ν) from directional Fisher curvature decay rates in original parameter space for singular models, extended via K-FAC to networks and gauge-equivariant optimizers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29304","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"On subspace-constrained preconditioning for randomized iterative methods","primary_cat":"math.NA","submitted_at":"2026-05-28T03:36:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Refines subspace preconditioning for randomized linear solvers via QR-like factorization, enabling implicit use and proving expected linear convergence while reducing to a smaller system with good singular values.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27292","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run","primary_cat":"cs.LG","submitted_at":"2026-05-26T17:06:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"New canary crafting via greedy influence-based init and bilevel optimization for diversity in embedding space yields stronger one-run privacy leakage estimates at lower cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16184","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training","primary_cat":"cs.DC","submitted_at":"2026-05-15T17:03:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Asteria is a runtime system that enables second-order optimization for LLMs by dynamically distributing optimizer state across GPU, CPU, and NVMe while using asynchronous inverse-root computations and bounded-staleness synchronization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15899","ref_index":133,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Solving Classical and Quantum Spin Glasses with Deep Boltzmann Quantum States","primary_cat":"cond-mat.dis-nn","submitted_at":"2026-05-15T12:30:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Deep Boltzmann Quantum States with natural-gradient optimization and annealing-like training match exact or best-known solutions for large infinite-range Ising spin glasses and solve job shop scheduling instances.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02829","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces","primary_cat":"cs.AI","submitted_at":"2026-05-04T17:05:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"JACTUS unifies low-rank compression and task adaptation via a task-aware union of subspaces and global rank allocation by marginal gain, outperforming 100% PEFT methods like DoRA on ViT-Base (89.2% avg) and Llama2-7B (80.9% avg) at 80% retained parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21100","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences","primary_cat":"cs.LG","submitted_at":"2026-04-22T21:38:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15554","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Natural gradient descent with momentum","primary_cat":"cs.LG","submitted_at":"2026-04-16T22:09:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces natural-gradient versions of Heavy-Ball and Nesterov momentum methods for function approximation on differentiable nonlinear manifolds.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"for Parametric Manifold Optimization. Apr. 2025.doi:10.48550/arXiv.2504.05753. arXiv: 2504.05753 [math]. [18] James Martens.New Insights and Perspectives on the Natural Gradient Method. Sept. 2020. arXiv:1412.1193 [cs, stat]. [19] James Martens and Roger Grosse.Optimizing Neural Networks with Kronecker-factored Approx- imate Curvature. June 2020. arXiv:1503.05671 [cs, stat]. [20] Johannes M¨ uller and Marius Zeinhofer.Position: Optimization in SciML Should Employ the Function Space Geometry. May 2024.doi:10.48550/arXiv.2402.07318. arXiv:2402.07318 [math]. [21] Yurii Nesterov. \"A Method for Solving the Convex Programming Problem with Convergence Rate O(1/Kˆ2)\". In:Proceedings of the USSR Academy of Sciences269 (1983), pp. 543-547."},{"citing_arxiv_id":"2604.09263","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Natural Riemannian gradient for learning functional tensor networks","primary_cat":"math.OC","submitted_at":"2026-04-10T12:25:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Natural Riemannian gradient descent enables optimization of functional tensor networks for general losses and shows improved convergence on classification tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05627","ref_index":94,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Loss-aware state space geometry for quantum variational algorithms","primary_cat":"quant-ph","submitted_at":"2026-04-07T09:28:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Loss-aware natural gradient variants are introduced by embedding the loss hypersurface in a statistical manifold or using quantum state overlaps, yielding conformal updates that adjust effective step size.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"on knowledge discovery&data mining(2019), pp. 2623-2631. [91] E. D. Dolan and J. J. Mor 'e, arXiv e-prints cs/0102001 (2001), cs/0102001. 25 [92] D. Wierichs, C. Gogolin, and M. Kastoryano, Phys. Rev. Res.2, 043246 (2020), URLhttps://link.aps.org/doi/10. 1103/PhysRevResearch.2.043246. [93] A. Arrasmith, M. Cerezo, P. Czarnik, L. Cincio, and P. J. Coles, Quantum5, 558 (2021), 2011.12245. [94] J. Martens and R. Grosse, arXiv preprint arXiv:1503.05671 (2015). [95] R. Grosse and J. Martens, arXiv preprint arXiv:1602.01407 (2016). [96] J. Martens, arXiv preprint arXiv:1412.1193 (2014). [97] F. Dangel, B. Mucs 'anyi, T. Weber, and R. Eschenhagen, arXiv preprint arXiv:2507.05127 (2025). [98] S.-i. Amari,Information geometry and its applications(Springer, 2016)."}],"limit":50,"offset":0}