PCCL synthesizes near-optimal topology-aware collective algorithms for arbitrary patterns while being process group-aware and scalable to subsets of devices.
Tacos: Topology-aware collective algorithm synthesizer for distributed machine learning
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
AoiZora adds topology-aware physical placement planning to auto-parallel compilation for diffusion transformer inference, cutting one-step denoising latency by up to 1.42x on TPU v5e sub-slices.
CLIPGen is a framework for automated generation of chiplet interconnect IP with PPA estimates to support 2.5D SiP architecture exploration.
ELMoE-3D achieves 6.6x average speedup and 4.4x energy efficiency gain for MoE serving on 3D hardware by scaling expert and bit elasticity for elastic self-speculative decoding.
citing papers explorer
-
PCCL: Process Group-Aware Scalable and Generic Collective Algorithm Synthesizer
PCCL synthesizes near-optimal topology-aware collective algorithms for arbitrary patterns while being process group-aware and scalable to subsets of devices.
-
AoiZora: Topology-Aware Auto-Parallel Optimization for Inference of Diffusion Transformers
AoiZora adds topology-aware physical placement planning to auto-parallel compilation for diffusion transformer inference, cutting one-step denoising latency by up to 1.42x on TPU v5e sub-slices.
-
CLIPGen: A Chiplet Link IP Modeling and Generation Framework for 2.5D Architecture Exploration
CLIPGen is a framework for automated generation of chiplet interconnect IP with PPA estimates to support 2.5D SiP architecture exploration.
-
ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving
ELMoE-3D achieves 6.6x average speedup and 4.4x energy efficiency gain for MoE serving on 3D hardware by scaling expert and bit elasticity for elastic self-speculative decoding.