Power capping is illusory in LLM decode as memory-bound operation leaves power headroom untouched on 700 W GPUs, while SM clock locking saves up to 32% energy and three DVFS classes appear across attention types.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
years
2026 8verdicts
UNVERDICTED 8representative citing papers
Dooly reduces LLM inference profiling costs by 56.4% via configuration-agnostic taint-based labeling and selective database reuse, delivering simulation accuracy within 5% MAPE for TTFT and 8% for TPOT across 12 models.
SplitZip is a new GPU-friendly lossless compressor for KV cache tensors that exploits exponent redundancy to achieve over 600 GB/s compression throughput and up to 1.32x faster transfers in disaggregated LLM serving.
Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.
Chakra introduces a portable, interoperable graph-based execution trace format for distributed ML workloads along with supporting tools to standardize performance benchmarking and software-hardware co-design.
MemExplorer optimizes heterogeneous memory systems for agentic LLM inference on NPUs and reports up to 2.3x higher energy efficiency than baselines under fixed power budgets.
Integrated solar-compute-radiator panels enable orbital satellites to achieve over 100 kW of AI inference compute per metric ton launched, supporting thousands of simultaneous large language model sessions.
Workload-aware optimizations for LLM serving in AML and fraud detection yield substantial gains in throughput, latency, and GPU utilization on synthetic compliance prompts.
citing papers explorer
-
Sparse Prefix Caching for Hybrid and Recurrent LLM Serving
Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.