Advances in Neural Information Processing Systems , year=

Searching for Efficient Transformers for Language Modeling , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

cs.CL · 2023-05-17 · unverdicted · novelty 5.0

PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.

Showing 2 of 2 citing papers.

PaLM 2 Technical Report cs.CL · 2023-05-17 · unverdicted · none · ref 195
PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.
Can Muon Fine-tune Adam-Pretrained Models? cs.LG · 2026-05-11 · unverdicted · none · ref 67
Constraining fine-tuning updates with LoRA mitigates performance degradation when switching from Adam to Muon on pretrained models.