InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text · 2025 · DOI 10.18653/v1/2025.emnlp-main.334

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

representative citing papers

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.

When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

Defines cost-aware RAG with evidence cost tiers and shows static selectors are brittle while agentic LLM-based selection is promising but model-dependent.

HYPIC: Accelerating Hybrid-Attention LLM Serving with Position-Independent Caching

cs.DC · 2026-07-01 · unverdicted · novelty 6.0

Hypic enables position-independent KV caching for hybrid-attention models via segment-cumulative operators and boundary seam recomputation, delivering 2.45x average TTFT reduction and up to 2.0x throughput gain.

Latent Bridges for Multi-Table Question Answering

cs.CL · 2026-06-27 · unverdicted · novelty 5.0

GRAB improves multi-table QA performance by encoding relational data as graphs and bridging structural signals to frozen LLMs through latent tokens.

CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference

cs.CL · 2026-06-18 · unverdicted · novelty 4.0

CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.

citing papers explorer

Showing 5 of 5 citing papers after filters.

LazyAttention: Efficient Retrieval-Augmented Generation with Deferred Positional Encoding cs.CL · 2026-06-03 · unverdicted · none · ref 5
LazyAttention kernelizes deferred positional encoding to enable zero-copy, position-agnostic KV cache reuse, delivering 1.37× lower TTFT and 1.40× higher throughput than Block-Attention under skewed document distributions while preserving output quality.
When Knowledge Is Not Free: Cost-Aware Evidence Selection in Retrieval-Augmented Generation cs.CL · 2026-06-01 · unverdicted · none · ref 13
Defines cost-aware RAG with evidence cost tiers and shows static selectors are brittle while agentic LLM-based selection is promising but model-dependent.
HYPIC: Accelerating Hybrid-Attention LLM Serving with Position-Independent Caching cs.DC · 2026-07-01 · unverdicted · none · ref 29
Hypic enables position-independent KV caching for hybrid-attention models via segment-cumulative operators and boundary seam recomputation, delivering 2.45x average TTFT reduction and up to 2.0x throughput gain.
Latent Bridges for Multi-Table Question Answering cs.CL · 2026-06-27 · unverdicted · none · ref 18
GRAB improves multi-table QA performance by encoding relational data as graphs and bridging structural signals to frozen LLMs through latent tokens.
CacheWeaver: Cache-Aware Evidence Ordering for Efficient Grounded RAG Inference cs.CL · 2026-06-18 · unverdicted · none · ref 28
CacheWeaver is a lightweight scheduling layer that orders evidence to exploit prefix caching, reducing median TTFT by 20-33% across vLLM setups while preserving answer quality.

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

fields

years

verdicts

representative citing papers

citing papers explorer