Chakra: Advancing performance benchmarking and co-design using standardized execution traces

Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt Bergeron, Wenyin Fu, Shengbao Zheng, Brian Coutinho, Saeed Rashidi, Changhai Man, et al · 2023 · arXiv 2305.14516

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

representative citing papers

StageFrontier: Synchronization-Aware Stage Accounting for Distributed ML Training

cs.DC · 2026-06-04 · conditional · novelty 7.0 · 2 refs

StageFrontier computes an exact additive accounting of exposed step time in distributed training by taking the frontier of per-rank coarse stage durations reported with unsynchronized CPU wall clocks.

Simulating Unified Tensor Resharding in heterogeneous AI systems

cs.DC · 2026-06-25 · unverdicted · novelty 6.0

Xsim is a heterogeneity-aware simulator for distributed LLM training supporting load balancing, customized collectives, tensor resharding, and pluggable network simulation, reporting under 5% error in training time predictions.

A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM

cs.DC · 2026-05-15 · conditional · novelty 6.0

PrismLLM constructs a sliced execution graph and uses hybrid emulation to faithfully reproduce performance and memory behavior of up to 8192-GPU LLM training runs on fewer than 1% of the original GPUs.

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

cs.DC · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Chakra introduces a standardized graph-based execution trace representation for distributed ML workloads along with supporting tools to enable benchmarking, analysis, generation, and co-design across simulators and hardware.

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

cs.DC · 2026-06-09 · unverdicted · novelty 5.0

ASTRA-sim 3.0 introduces cache-line load-store simulation, a detailed GPU execution model, and InfraGraph to support high-fidelity distributed machine learning infrastructure simulations.

Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML

cs.DC · 2026-04-19 · unverdicted · novelty 5.0

Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.

Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLO

cs.DC · 2026-04-13 · unverdicted · novelty 4.0

StableHLO serves as a viable unified representation for cross-architecture performance modeling of distributed ML workloads, preserving relative trends while exposing fidelity trade-offs.

citing papers explorer

Showing 7 of 7 citing papers after filters.

StageFrontier: Synchronization-Aware Stage Accounting for Distributed ML Training cs.DC · 2026-06-04 · conditional · none · ref 29 · 2 links
StageFrontier computes an exact additive accounting of exposed step time in distributed training by taking the frontier of per-rank coarse stage durations reported with unsynchronized CPU wall clocks.
Simulating Unified Tensor Resharding in heterogeneous AI systems cs.DC · 2026-06-25 · unverdicted · none · ref 57
Xsim is a heterogeneity-aware simulator for distributed LLM training supporting load balancing, customized collectives, tensor resharding, and pluggable network simulation, reporting under 5% error in training time predictions.
A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM cs.DC · 2026-05-15 · conditional · none · ref 29
PrismLLM constructs a sliced execution graph and uses hybrid emulation to faithfully reproduce performance and memory behavior of up to 8192-GPU LLM training runs on fewer than 1% of the original GPUs.
MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces cs.DC · 2026-05-11 · unverdicted · none · ref 95 · 2 links
Chakra introduces a standardized graph-based execution trace representation for distributed ML workloads along with supporting tools to enable benchmarking, analysis, generation, and co-design across simulators and hardware.
ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling cs.DC · 2026-06-09 · unverdicted · none · ref 44
ASTRA-sim 3.0 introduces cache-line load-store simulation, a detailed GPU execution model, and InfraGraph to support high-fidelity distributed machine learning infrastructure simulations.
Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML cs.DC · 2026-04-19 · unverdicted · none · ref 28
Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.
Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLO cs.DC · 2026-04-13 · unverdicted · none · ref 32
StableHLO serves as a viable unified representation for cross-architecture performance modeling of distributed ML workloads, preserving relative trends while exposing fidelity trade-offs.

Chakra: Advancing performance benchmarking and co-design using standardized execution traces

fields

years

verdicts

representative citing papers

citing papers explorer