pith. machine review for the scientific record. sign in

hub

arXiv preprint arXiv:2512.24880 , year=

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

hub tools

years

2026 18

representative citing papers

Transformers with Selective Access to Early Representations

cs.LG · 2026-05-05 · unverdicted · novelty 7.0 · 2 refs

SATFormer uses a context-dependent gate for selective reuse of early Transformer representations, improving validation loss and zero-shot accuracy especially on retrieval benchmarks.

Can an MLP Absorb Its Own Skip Connection?

cs.LG · 2026-04-26 · accept · novelty 7.0

Skip-connected MLPs and residual-free MLPs of equal width represent generically disjoint function classes for common activations, with explicit impossibility proofs and a non-generic absorption condition for ReLU and GELU.

Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

cs.LG · 2026-04-24 · unverdicted · novelty 7.0

A new SFT framework for MoE models combines bias-driven sparsification with gated condenser experts to retain long-tailed expert information, outperforming DenseMixer and ESFT by over 2.5% on math reasoning and commonsense QA benchmarks.

Optimistic Dual Averaging Unifies Modern Optimizers

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

SODA unifies several modern optimizers under optimistic dual averaging and supplies a 1/k decay wrapper that improves performance without weight decay tuning.

Hyperloop Transformers

cs.LG · 2026-04-23 · unverdicted · novelty 5.0

Hyperloop Transformers outperform standard and mHC Transformers with roughly 50% fewer parameters by looping a middle block of layers and applying hyper-connections only after each loop.

citing papers explorer

Showing 18 of 18 citing papers.