HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
Avoiding Overfitting: A Survey on Regularization Methods for Convolutional Neural Networks , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Infant daily visual experiences of objects are dominated by repeated instances of few exemplars in lumpy similarity clusters, enabling category generalization from small training sets in computational models.
A single recurrent transformer block trained once delivers 5 dB and 7.5 dB NMSE gains over prior methods for narrowband and wideband hybrid near-far field THz UM-MIMO channel estimation.
citing papers explorer
-
HORST: Composing Optimizer Geometries for Sparse Transformer Training
HORST uses non-commutative operator composition and a hyperbolic mirror map to combine stability from adaptive optimizers with L1 sparsity bias, outperforming AdamW across sparsity levels on vision and language tasks.
-
A solution to generalized learning from small training sets found in infant repeated visual experiences of individual objects
Infant daily visual experiences of objects are dominated by repeated instances of few exemplars in lumpy similarity clusters, enabling category generalization from small training sets in computational models.
-
Recurrent Transformer-Based Near- and Far-Field THz Wideband Channel Estimation for UM-MIMO
A single recurrent transformer block trained once delivers 5 dB and 7.5 dB NMSE gains over prior methods for narrowband and wideband hybrid near-far field THz UM-MIMO channel estimation.