arXiv preprint arXiv:2601.21487 , year=

Manifold constrained steepest descent , author= · 2026 · arXiv 2601.21487

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2 method 1

citation-polarity summary

background 2 use method 1

representative citing papers

Intrinsic Muon: Spectral Optimization on Riemannian Matrix Manifolds

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

Intrinsic Muon provides closed-form linear maximization oracles on multiple Riemannian matrix manifolds for unitarily invariant norms, with convergence rates depending only on manifold dimension or rank.

Learned Subspace Compression for Communication-Efficient Pipeline Parallelism

cs.LG · 2026-06-03 · unverdicted · novelty 6.0

MAPL learns task-specific orthogonal compression subspaces per pipeline stage via manifold-constrained optimization and recovers signals with low-overhead anchors, yielding better compression-performance tradeoffs than fixed projections on LLaMA models up to 1B parameters.

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

math.OC · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

Proposes equivariant optimizer updates matched to layer symmetries for embeddings, SwiGLU MLPs, and MoE routers, with reported gains in validation loss and training stability on several language model architectures.

Demystifying Manifold Constraints in LLM Pre-training

cs.LG · 2026-05-06 · unverdicted · novelty 6.0

Manifold constraints via the new MACRO optimizer independently bound activation scales and enforce rotational equilibrium in LLM pre-training, subsuming RMS normalization and decoupled weight decay while delivering competitive performance with convergence guarantees.

Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

cs.LG · 2026-06-11 · unverdicted · novelty 5.0

Stiefel on attention and DGram on MLP layers outperforms uniform or inverted manifold assignments in transformer pretraining by avoiding attention logit amplification.

Convergence of Spectral Descent for Non-smooth Optimization

cs.LG · 2026-05-26 · unverdicted · novelty 5.0

Proves linear convergence of Spectral Descent (SD) and Truncated SD for non-smooth convex problems under stated conditions, sublinear rates for regularized versions via Frank-Wolfe, and recovery guarantees for robust low-rank matrix recovery.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

arXiv preprint arXiv:2601.21487 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer