MIST is a new simulator for heterogeneous multi-stage LLM inference that combines hardware traces with analytical models to explore configuration trade-offs in hybrid CPU-accelerator systems.
Vidur: A large-scale simulation framework for llm inference
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
baseline 1polarities
baseline 1representative citing papers
RouteBalance fuses routing and load balancing for heterogeneous LLM serving and traces the upper quality-cost-throughput frontier on a 13-instance 28-GPU cluster.
Dooly reduces LLM inference profiling GPU-hours by 56.4% across 12 models while keeping simulation MAPE under 5% for TTFT and 8% for TPOT by making profiling configuration-agnostic and redundancy-aware.
PipeWeave predicts GPU kernel performance with 6.1% average error and end-to-end inference with 8.5% error by feeding analytical pipeline features into ML, cutting prior method errors by 4-7x across 11 GPUs.
Operator-level attention-FFN disaggregation enables ~4k tokens/s throughput for DeepSeek-V3.2 under tight TTFT/TPOT SLOs where chunked-prefill and prefill-decode baselines cannot.
Charon is a unified modular simulator that predicts LLM training and inference performance with under 5.35% error and identifies throughput improvements over baselines in a real deployment case.
citing papers explorer
-
MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference
MIST is a new simulator for heterogeneous multi-stage LLM inference that combines hardware traces with analytical models to explore configuration trade-offs in hybrid CPU-accelerator systems.