2502.07864 , doi =

Meng, Fanxu, Yao, Zengwei, Zhang, Muhan , year = · 2025 · arXiv 2502.07864

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

cs.CV · 2026-05-28 · unverdicted · novelty 8.0

VideoMLA applies multi-head latent attention with 3D-RoPE decoupling to autoregressive video diffusion, delivering 92.7% KV memory reduction while matching short-horizon baselines and leading long-horizon VBench scores.

The Illusion of Power Capping in LLM Decode: A Phase-Aware Energy Characterisation Across Attention Architectures

cs.DC · 2026-05-12 · unverdicted · novelty 7.0

Power capping is illusory in LLM decode as memory-bound operation leaves power headroom untouched on 700 W GPUs, while SM clock locking saves up to 32% energy and three DVFS classes appear across attention types.

YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition

cs.CL · 2026-06-04 · unverdicted · novelty 5.0

YouZhi-LLM applies a layer-adaptive GQA-to-MLA transition plus Ascend-specific distillation and fine-tuning to reduce KV-cache size, yielding up to 2.69× higher concurrency and modest gains on financial benchmarks versus base models.

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference

cs.AR · 2025-09-11 · unverdicted · novelty 5.0

PLENA introduces a co-designed system with three optimization pathways for long-context agentic LLM inference, claiming up to 2.23x throughput over A100 and 4.04x energy efficiency.

Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale

cs.CL · 2026-06-13 · unverdicted · novelty 4.0

Technical report announcing Ling-2.6 and Ring-2.6 models with hybrid linear attention, evolutionary CoT, and KPop RL for efficient agentic intelligence at scale.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference cs.AR · 2025-09-11 · unverdicted · none · ref 45
PLENA introduces a co-designed system with three optimization pathways for long-context agentic LLM inference, claiming up to 2.23x throughput over A100 and 4.04x energy efficiency.

2502.07864 , doi =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer