T urbo RAG : Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

Lu, Songshuo, Wang, Hua, Rong, Yutian, Chen, Zhi, Tang, Yaohua · 2025 · DOI 10.18653/v1/2025.emnlp-main.334

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.

When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

Defines cost-aware RAG with evidence cost tiers and shows static selectors are brittle while agentic LLM-based selection is promising but model-dependent.

CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference

cs.CL · 2026-06-18 · unverdicted · novelty 4.0

CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.

citing papers explorer

Showing 3 of 3 citing papers after filters.

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding cs.CL · 2026-06-03 · unverdicted · none · ref 5
LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation cs.CL · 2026-06-01 · unverdicted · none · ref 13
Defines cost-aware RAG with evidence cost tiers and shows static selectors are brittle while agentic LLM-based selection is promising but model-dependent.
CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference cs.CL · 2026-06-18 · unverdicted · none · ref 28
CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.

T urbo RAG : Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text

fields

years

verdicts

representative citing papers

citing papers explorer