Sustainable llm inference for edge ai: Evaluating quantized llms for energy efficiency, output accuracy, and inference latency,

Husom, E · 2025 · arXiv 2504.03360

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters

cs.LG · 2026-02-06 · unverdicted · novelty 7.0

Variability modeling from software engineering enables systematic sampling, measurement, and prediction of LLM inference configurations for energy, latency, and accuracy trade-offs.

Online LLM Selection via Constrained Bandits with Time-Varying Demand

cs.LG · 2026-06-16 · unverdicted · novelty 5.0

Develops a constrained bandit algorithm for online LLM selection under packing and covering constraints with time-varying demand, claiming sublinear regret and constraint violations versus an offline full-information benchmark.

Does Mixture-of-Experts Actually Help Inference on Consumer and Edge Hardware? An Empirical Study

cs.PF · 2026-06-19 · accept · novelty 4.0

Empirical benchmarks show MoE inference cost on edge hardware tracks total parameters rather than active parameters, with OLMoE-1B-7B behind dense baselines especially on the Jetson device.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters cs.LG · 2026-02-06 · unverdicted · none · ref 32
Variability modeling from software engineering enables systematic sampling, measurement, and prediction of LLM inference configurations for energy, latency, and accuracy trade-offs.
Online LLM Selection via Constrained Bandits with Time-Varying Demand cs.LG · 2026-06-16 · unverdicted · none · ref 9
Develops a constrained bandit algorithm for online LLM selection under packing and covering constraints with time-varying demand, claiming sublinear regret and constraint violations versus an offline full-information benchmark.

Sustainable llm inference for edge ai: Evaluating quantized llms for energy efficiency, output accuracy, and inference latency,

fields

years

verdicts

representative citing papers

citing papers explorer