Revisiting slo and goodput metrics in llm serving

Zhibin Wang, Shipeng Li, Yuhang Zhou, Xue Li, Rong Gu, Nguyen Cam-Tu, Chen Tian, Sheng Zhong · 2024 · arXiv 2410.14257

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving

cs.LG · 2026-04-19 · conditional · novelty 7.0

SLO-Guard improves tuning budget consistency for SLO-constrained LLM serving by handling crashes explicitly and using a two-phase feasible-first exploration plus exploitation strategy.

CoRoVA: Compressed Representations for Vector-Augmented Code Completion

cs.CL · 2025-10-22 · unverdicted · novelty 6.0

CoRoVA compresses repository context into compact vectors for code LLMs, reducing TTFT 20-38% versus uncompressed RAG with only a small projector module.

Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities

cs.DC · 2026-04-24 · unverdicted · novelty 3.0

A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.

citing papers explorer

Showing 3 of 3 citing papers.

SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving cs.LG · 2026-04-19 · conditional · none · ref 18
SLO-Guard improves tuning budget consistency for SLO-constrained LLM serving by handling crashes explicitly and using a two-phase feasible-first exploration plus exploitation strategy.
CoRoVA: Compressed Representations for Vector-Augmented Code Completion cs.CL · 2025-10-22 · unverdicted · none · ref 29
CoRoVA compresses repository context into compact vectors for code LLMs, reducing TTFT 20-38% versus uncompressed RAG with only a small projector module.
Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities cs.DC · 2026-04-24 · unverdicted · none · ref 160
A survey synthesizing challenges, system architectures, model optimizations, deployment methods, and resource management techniques for large language model inference at the network edge.

Revisiting slo and goodput metrics in llm serving

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer