ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates

Tingfeng Lan, Yusen Wu, Bin Ma, Zhaoyuan Su, Rui Yang, Tekin Bicer, Masahiro Tanaka, Olatunji Ruwase, Dong Li, Yue Cheng · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference

cs.DC · 2026-05-11 · unverdicted · novelty 6.0

ChunkFlow achieves up to 1.28x step-time speedup and up to 49% lower peak GPU memory for DiT inference by using a first-order model to guide communication-aware chunked prefetching.

citing papers explorer

Showing 1 of 1 citing paper.

ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference cs.DC · 2026-05-11 · unverdicted · none · ref 18
ChunkFlow achieves up to 1.28x step-time speedup and up to 49% lower peak GPU memory for DiT inference by using a first-order model to guide communication-aware chunked prefetching.

ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates

fields

years

verdicts

representative citing papers

citing papers explorer