Festina reduces energy consumption by up to 56% for serverless LLM inference on shared GPUs while keeping TTFT/TBT SLO attainment within 2% of four state-of-the-art baselines.
Available from: https://arxiv.org/abs/2301.00407
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
baseline 1polarities
baseline 1representative citing papers
SMART-MIG applies MF-MARL for constant-complexity MIG repartitioning plus heuristics for scheduling, reporting 18% better energy-tardiness efficiency than static partitioning and 27% above a theoretical energy lower bound.
CompPow makes the case that component-aware power management inside GPUs can yield 10% higher energy efficiency and 5% better performance for ML workloads.
MPS can boost performance up to 30% and cut energy 20% with careful provisioning but degrades sharply under memory contention, whereas MIG delivers steadier gains through hardware isolation at the cost of higher overhead and occasional performance losses.
citing papers explorer
-
Energy-Aware Scheduling for Serverless LLM Serving on Shared GPUs
Festina reduces energy consumption by up to 56% for serverless LLM inference on shared GPUs while keeping TTFT/TBT SLO attainment within 2% of four state-of-the-art baselines.
-
SMART-MIG: A Learning Framework for Scalable and Energy-Efficient GPU Scheduling
SMART-MIG applies MF-MARL for constant-complexity MIG repartitioning plus heuristics for scheduling, reporting 18% better energy-tardiness efficiency than static partitioning and 27% above a theoretical energy lower bound.
-
CompPow: A Case for Component-level GPU Power Management
CompPow makes the case that component-aware power management inside GPUs can yield 10% higher energy efficiency and 5% better performance for ML workloads.
-
A comprehensive evaluation of spatial co-execution on GPUs using MPS and MIG technologies
MPS can boost performance up to 30% and cut energy 20% with careful provisioning but degrades sharply under memory contention, whereas MIG delivers steadier gains through hardware isolation at the cost of higher overhead and occasional performance losses.