A four-query experiment demonstrates 84.6% token savings through knowledge compounding in self-evolving wikis compared to standard RAG, by amortizing ingestion costs and reusing synthesized knowledge over time.
Beyond the context window: A cost-performance analysis of fact-based memory vs. long-context llms for persistent agents
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.
Introduces Efficiency Frontier framework for deployment-aware cost-performance optimization of LLM context strategies, reporting ~25% token reduction at F1≈0.78 on 5,000 HotpotQA instances.
citing papers explorer
-
Knowledge Compounding: An Empirical Economic Analysis of Self-Evolving Knowledge Wikis under the Agentic ROI Framework
A four-query experiment demonstrates 84.6% token savings through knowledge compounding in self-evolving wikis compared to standard RAG, by amortizing ingestion costs and reusing synthesized knowledge over time.
-
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
The Workload-Router-Pool architecture is a 3D framework for LLM inference optimization that synthesizes prior vLLM work into a 3x3 interaction matrix and proposes 21 research directions at the intersections.
-
The Efficiency Frontier: A Unified Framework for Cost-Performance Optimization in LLM Context Management
Introduces Efficiency Frontier framework for deployment-aware cost-performance optimization of LLM context strategies, reporting ~25% token reduction at F1≈0.78 on 5,000 HotpotQA instances.