Beyond the context window: A cost-performance analysis of fact-based memory vs. long-context llms for persistent agents

· 2026 · arXiv 2603.04814

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Knowledge Compounding: An Empirical Economic Analysis of Self-Evolving Knowledge Wikis under the Agentic ROI Framework

econ.EM · 2026-04-13 · unverdicted · novelty 5.0

A four-query experiment demonstrates 84.6% token savings through knowledge compounding in self-evolving wikis compared to standard RAG, by amortizing ingestion costs and reusing synthesized knowledge over time.

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project

cs.LG · 2026-03-22 · unverdicted · novelty 5.0

The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.

The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management

cs.CL · 2026-05-21 · unverdicted · novelty 4.0

Introduces Efficiency Frontier framework for deployment-aware cost-performance optimization of LLM context strategies, reporting ~25% token reduction at F1≈0.78 on 5,000 HotpotQA instances.

citing papers explorer

Showing 3 of 3 citing papers.

Knowledge Compounding: An Empirical Economic Analysis of Self-Evolving Knowledge Wikis under the Agentic ROI Framework econ.EM · 2026-04-13 · unverdicted · none · ref 9
A four-query experiment demonstrates 84.6% token savings through knowledge compounding in self-evolving wikis compared to standard RAG, by amortizing ingestion costs and reusing synthesized knowledge over time.
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project cs.LG · 2026-03-22 · unverdicted · none · ref 30
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.
The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management cs.CL · 2026-05-21 · unverdicted · none · ref 11
Introduces Efficiency Frontier framework for deployment-aware cost-performance optimization of LLM context strategies, reporting ~25% token reduction at F1≈0.78 on 5,000 HotpotQA instances.

Beyond the context window: A cost-performance analysis of fact-based memory vs. long-context llms for persistent agents

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer