HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
High-dimensional linear regression via implicit regularization , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
AC-IHT is a two-stage iterative algorithm for contaminated high-dimensional regression that attains minimax near-optimal rates, signal adaptivity under suitable conditions, and the strong oracle property.
FEM decreases during training in linear regression and index models, providing supporting evidence for the adaptive feature program.
citing papers explorer
-
HORST: Composing Optimizer Geometries for Sparse Transformer Training
HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
-
Adversarial Contamination Meets Hard Thresholding: An Iterative Algorithm with Signal Adaptivity and Minimax Optimality
AC-IHT is a two-stage iterative algorithm for contaminated high-dimensional regression that attains minimax near-optimal rates, signal adaptivity under suitable conditions, and the strong oracle property.
-
Supporting Evidence for the Adaptive Feature Program across Diverse Models
FEM decreases during training in linear regression and index models, providing supporting evidence for the adaptive feature program.