Vidur: A large-scale simulation framework for llm inference

URLhttps://arxiv · 2024 · arXiv 2405.05465

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference

cs.AR · 2025-04-14 · unverdicted · novelty 7.0

MIST is a new simulator for heterogeneous multi-stage LLM inference that combines hardware traces with analytical models to explore configuration trade-offs in hybrid CPU-accelerator systems.

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

cs.DC · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

Dooly reduces LLM inference profiling GPU-hours by 56.4% across 12 models while keeping simulation MAPE under 5% for TTFT and 8% for TPOT by making profiling configuration-agnostic and redundancy-aware.

PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction

cs.PF · 2026-01-21 · unverdicted · novelty 6.0

PipeWeave predicts GPU kernel performance with 6.1% average error and end-to-end inference with 8.5% error by feeding analytical pipeline features into ML, cutting prior method errors by 4-7x across 11 GPUs.

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

cs.LG · 2026-05-27 · unverdicted · novelty 5.0

Operator-level attention-FFN disaggregation enables ~4k tokens/s throughput for DeepSeek-V3.2 under tight TTFT/TPOT SLOs where chunked-prefill and prefill-decode baselines cannot.

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

cs.DC · 2026-05-16 · unverdicted · novelty 5.0

Charon is a unified modular simulator that predicts LLM training and inference performance with under 5.35% error and identifies throughput improvements over baselines in a real deployment case.

citing papers explorer

Showing 1 of 1 citing paper after filters.

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving cs.LG · 2026-05-27 · unverdicted · none · ref 2
Operator-level attention-FFN disaggregation enables ~4k tokens/s throughput for DeepSeek-V3.2 under tight TTFT/TPOT SLOs where chunked-prefill and prefill-decode baselines cannot.

Vidur: A large-scale simulation framework for llm inference

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer