Chakra: Advancing performance benchmarking and co-design using standardized execution traces

doi: 10 · 2023 · arXiv 2305.14516

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

StageFrontier: Synchronization-Aware Stage Accounting for Distributed ML Training

cs.DC · 2026-06-04 · conditional · novelty 7.0

StageFrontier computes an exact additive accounting of exposed step time in distributed training by taking the frontier of per-rank coarse stage durations reported with unsynchronized CPU wall clocks.

A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM

cs.DC · 2026-05-15 · conditional · novelty 6.0

PrismLLM constructs a sliced execution graph and uses hybrid emulation to faithfully reproduce performance and memory behavior of up to 8192-GPU LLM training runs on fewer than 1% of the original GPUs.

MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

cs.DC · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Chakra introduces a standardized graph-based execution trace representation for distributed ML workloads along with supporting tools to enable benchmarking, analysis, generation, and co-design across simulators and hardware.

Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML

cs.DC · 2026-04-19 · unverdicted · novelty 5.0

Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.

Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLO

cs.DC · 2026-04-13 · unverdicted · novelty 4.0

StableHLO serves as a viable unified representation for cross-architecture performance modeling of distributed ML workloads, preserving relative trends while exposing fidelity trade-offs.

citing papers explorer

Showing 5 of 5 citing papers.

StageFrontier: Synchronization-Aware Stage Accounting for Distributed ML Training cs.DC · 2026-06-04 · conditional · none · ref 30
StageFrontier computes an exact additive accounting of exposed step time in distributed training by taking the frontier of per-rank coarse stage durations reported with unsynchronized CPU wall clocks.
A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM cs.DC · 2026-05-15 · conditional · none · ref 29
PrismLLM constructs a sliced execution graph and uses hybrid emulation to faithfully reproduce performance and memory behavior of up to 8192-GPU LLM training runs on fewer than 1% of the original GPUs.
MLCommons Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces cs.DC · 2026-05-11 · unverdicted · none · ref 95 · 2 links
Chakra introduces a standardized graph-based execution trace representation for distributed ML workloads along with supporting tools to enable benchmarking, analysis, generation, and co-design across simulators and hardware.
Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML cs.DC · 2026-04-19 · unverdicted · none · ref 28
Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.
Evaluating Cross-Architecture Performance Modeling of Distributed ML Workloads Using StableHLO cs.DC · 2026-04-13 · unverdicted · none · ref 32
StableHLO serves as a viable unified representation for cross-architecture performance modeling of distributed ML workloads, preserving relative trends while exposing fidelity trade-offs.

Chakra: Advancing performance benchmarking and co-design using standardized execution traces

fields

years

verdicts

representative citing papers

citing papers explorer