fabric-lib: RDMA Point-to-Point Communication for LLM Systems

Nandor Licker (1) , Kevin Hu (1) , Vladimir Zaytsev (1) , Lequn Chen (1) ((1) Perplexity AI)

Authors on Pith no claims yet

classification 💻 cs.DC

keywords fabric-libcommunicationinferencenicspoint-to-pointcollectivesconnectx-7demonstrate

read the original abstract

Emerging Large Language Model (LLM) system patterns, such as disaggregated inference, Mixture-of-Experts (MoE) routing, and asynchronous reinforcement fine-tuning, require flexible point-to-point communication beyond simple collectives. Existing implementations are locked to specific Network Interface Controllers (NICs), hindering integration into inference engines and portability across hardware providers. We present fabric-lib, which bridges the functionality of common NICs to expose a uniform interface. fabric-lib exposes one-sided WriteImm operations with a ImmCounter primitive for completion notification, without ordering assumptions of network transport, transparently managing multiple NICs per GPU. We demonstrate peak throughput of 400 Gbps on both NVIDIA ConnectX-7 and AWS Elastic Fabric Adapter (EFA). We showcase fabric-lib through three production systems: (1) KvCache transfer for disaggregated inference with dynamic scaling, (2) RL weight updates achieving 1.3 seconds for trillion-parameter models, and (3) MoE dispatch/combine implementation exceeding DeepEP decode latency on ConnectX-7, with the first viable latencies on EFA. We demonstrate that our portable point-to-point communication complements collectives while avoiding lock-in. fabric-lib is open-sourced at https://github.com/perplexityai/pplx-garden/

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Eliminating Hidden Serialization in Multi-Node Megakernel Communication
cs.DC 2026-05 conditional novelty 6.0

Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exce...
UCCL-Zip: Lossless Compression Supercharged GPU Communication
cs.DC 2026-04 unverdicted novelty 6.0

UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.