Nesterov method for asynchronous pipeline parallel optimization.arXiv preprint arXiv:2505.01099, 2025

Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, Alexander Long · 2025 · arXiv 2505.01099

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

One-step gradient delay is optimizer-dependent rather than intrinsically unstable, with Muon and error-feedback correction enabling async pipeline parallelism to match synchronous performance on models up to 10B parameters.

Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency

cs.LG · 2026-06-05 · unverdicted · novelty 6.0

PACI enables bubble-free asynchronous pipeline training by bounding version drift via local gradient accumulation, matching synchronous stability with higher throughput and no extra memory.

citing papers explorer

Showing 2 of 2 citing papers after filters.

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining cs.LG · 2026-06-29 · unverdicted · none · ref 2
One-step gradient delay is optimizer-dependent rather than intrinsically unstable, with Muon and error-feedback correction enabling async pipeline parallelism to match synchronous performance on models up to 10B parameters.
Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency cs.LG · 2026-06-05 · unverdicted · none · ref 1
PACI enables bubble-free asynchronous pipeline training by bounding version drift via local gradient accumulation, matching synchronous stability with higher throughput and no extra memory.

Nesterov method for asynchronous pipeline parallel optimization.arXiv preprint arXiv:2505.01099, 2025

fields

years

verdicts

representative citing papers

citing papers explorer