SLO-Guard, a crash-aware two-phase autotuner for vLLM serving, achieves no best-latency improvement over random search but demonstrates more consistent budget allocation across 150 trials on Qwen2-1.5B/A100.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Copy First, Translate Later: Interpreting Translation Dynamics in Multilingual Pretraining
SLO-Guard, a crash-aware two-phase autotuner for vLLM serving, achieves no best-latency improvement over random search but demonstrates more consistent budget allocation across 150 trials on Qwen2-1.5B/A100.