Variability modeling from software engineering enables systematic sampling, measurement, and prediction of LLM inference configurations for energy, latency, and accuracy trade-offs.
Sustainable llm inference for edge ai: Evaluating quantized llms for energy efficiency, output accuracy, and inference latency,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Develops a constrained bandit algorithm for online LLM selection under packing and covering constraints with time-varying demand, claiming sublinear regret and constraint violations versus an offline full-information benchmark.
Empirical benchmarks show MoE inference cost on edge hardware tracks total parameters rather than active parameters, with OLMoE-1B-7B behind dense baselines especially on the Jetson device.
citing papers explorer
-
Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters
Variability modeling from software engineering enables systematic sampling, measurement, and prediction of LLM inference configurations for energy, latency, and accuracy trade-offs.
-
Online LLM Selection via Constrained Bandits with Time-Varying Demand
Develops a constrained bandit algorithm for online LLM selection under packing and covering constraints with time-varying demand, claiming sublinear regret and constraint violations versus an offline full-information benchmark.