Towards understanding systems trade-offs in retrieval-augmented generation model inference.arXiv preprint arXiv:2412.11854, 2024

Michael Shen, Muhammad Umar, Kiwan Maeng, G Edward Suh, Udit Gupta · 2024 · arXiv 2412.11854

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Integrating Domain-Specialized Language Models with AI Measurement Tools for Deterministic Atomic-Resolution Experimentation

physics.app-ph · 2026-02-24 · unverdicted · novelty 7.0

Domain-specialized small language models enable deterministic atomic-resolution scanning probe microscopy control with 99.3% command accuracy, lower computational cost, and better domain performance than larger general models.

uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.

CacheClip: Accelerating RAG with Effective KV Cache Reuse

cs.LG · 2025-10-11 · unverdicted · novelty 6.0

CacheClip accelerates RAG prefill by up to 3.33x via auxiliary-model-guided selective KV recomputation while retaining 85-91% of full-attention quality on NIAH and LongBench.

GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search

cs.IR · 2026-02-17 · unverdicted · novelty 5.0

GaiaFlow combines semantic-guided diffusion tuning with early-exit and quantization methods to lower carbon emissions in neural information retrieval while maintaining competitive effectiveness.

citing papers explorer

Showing 4 of 4 citing papers.

Integrating Domain-Specialized Language Models with AI Measurement Tools for Deterministic Atomic-Resolution Experimentation physics.app-ph · 2026-02-24 · unverdicted · none · ref 25
Domain-specialized small language models enable deterministic atomic-resolution scanning probe microscopy control with 99.3% command accuracy, lower computational cost, and better domain performance than larger general models.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs cs.CR · 2026-05-15 · unverdicted · none · ref 44
uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
CacheClip: Accelerating RAG with Effective KV Cache Reuse cs.LG · 2025-10-11 · unverdicted · none · ref 21
CacheClip accelerates RAG prefill by up to 3.33x via auxiliary-model-guided selective KV recomputation while retaining 85-91% of full-attention quality on NIAH and LongBench.
GaiaFlow: Semantic-Guided Diffusion Tuning for Carbon-Frugal Search cs.IR · 2026-02-17 · unverdicted · none · ref 10
GaiaFlow combines semantic-guided diffusion tuning with early-exit and quantization methods to lower carbon emissions in neural information retrieval while maintaining competitive effectiveness.

Towards understanding systems trade-offs in retrieval-augmented generation model inference.arXiv preprint arXiv:2412.11854, 2024

fields

years

verdicts

representative citing papers

citing papers explorer