MRC protocol with multi-plane Clos and SRv6 enables large AI training clusters to continue jobs through network failures that previously halted training.
Pingmesh: A large-scale system for data center network latency measurement and analysis.SIGCOMM Comput
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.NI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Resilient AI Supercomputer Networking using MRC and SRv6
MRC protocol with multi-plane Clos and SRv6 enables large AI training clusters to continue jobs through network failures that previously halted training.