Title resolution pending

· 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload

cs.DC · 2026-05-11 · unverdicted · novelty 6.0

ReCoVer uses fault-tolerant collectives, in-step recovery, and dynamic microbatch redistribution to maintain training trajectory equivalence under GPU failures, delivering 2.23x higher effective throughput than checkpoint-restart on up to 512 GPUs with 256 failures.

citing papers explorer

Showing 1 of 1 citing paper.

ReCoVer: Resilient LLM Pre-Training System via Fault-Tolerant Collective and Versatile Workload cs.DC · 2026-05-11 · unverdicted · none · ref 26
ReCoVer uses fault-tolerant collectives, in-step recovery, and dynamic microbatch redistribution to maintain training trajectory equivalence under GPU failures, delivering 2.23x higher effective throughput than checkpoint-restart on up to 512 GPUs with 256 failures.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer