Methods of improving llm training stability.arXiv preprint arXiv:2410.16682

Oleg Rybakov, Mike Chrzanowski, Peter Dykas, Jinze Xue, Ben Lanir · 2024 · arXiv 2410.16682

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Unifying View of Attention Sinks: Two Algorithms, Two Solutions

cs.LG · 2026-06-06 · unverdicted · novelty 7.0

Attention sinks reflect either adaptive nop or broadcast mechanisms, with distinct traces, synthetic diagnostics, and complementary interventions via gating plus registers.

Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention

cs.LG · 2025-10-05 · unverdicted · novelty 7.0

Low-precision Flash Attention fails due to similar low-rank attention representations combined with biased rounding errors that accumulate and corrupt weight updates; a minimal fix to reduce rounding bias stabilizes training.

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.

citing papers explorer

Showing 3 of 3 citing papers.

A Unifying View of Attention Sinks: Two Algorithms, Two Solutions cs.LG · 2026-06-06 · unverdicted · none · ref 21
Attention sinks reflect either adaptive nop or broadcast mechanisms, with distinct traces, synthetic diagnostics, and complementary interventions via gating plus registers.
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention cs.LG · 2025-10-05 · unverdicted · none · ref 26
Low-precision Flash Attention fails due to similar low-rank attention representations combined with biased rounding errors that accumulate and corrupt weight updates; a minimal fix to reduce rounding bias stabilizes training.
Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation cs.LG · 2026-05-12 · unverdicted · none · ref 66
Pion is an optimizer that preserves the singular values of weight matrices in LLM training by applying orthogonal equivalence transformations.

Methods of improving llm training stability.arXiv preprint arXiv:2410.16682

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer