Sleep-time compute: Beyond inference scaling at test-time

Kevin Lin, Charlie Snell, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, Joseph E Gonzalez · 2025 · arXiv 2504.13171

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

representative citing papers

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

cs.AI · 2026-05-07 · conditional · novelty 7.0

Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

BALAR : A Bayesian Agentic Loop for Active Reasoning

cs.AI · 2026-05-06 · unverdicted · novelty 5.0

BALAR is a task-agnostic Bayesian loop that maintains structured beliefs over latent states, selects questions via expected mutual information, and expands its state space when needed, delivering 14.6-38.5% accuracy gains over baselines on detective, puzzle, and clinical diagnosis benchmarks.

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

cs.CL · 2025-03-20 · accept · novelty 5.0

A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.

citing papers explorer

Showing 4 of 4 citing papers.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues cs.CL · 2026-05-12 · unverdicted · none · ref 78
LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost cs.AI · 2026-05-07 · conditional · none · ref 196
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
BALAR : A Bayesian Agentic Loop for Active Reasoning cs.AI · 2026-05-06 · unverdicted · none · ref 11
BALAR is a task-agnostic Bayesian loop that maintains structured beliefs over latent states, selects questions via expected mutual information, and expands its state space when needed, delivering 14.6-38.5% accuracy gains over baselines on detective, puzzle, and clinical diagnosis benchmarks.
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models cs.CL · 2025-03-20 · accept · none · ref 104
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.

Sleep-time compute: Beyond inference scaling at test-time

fields

years

verdicts

representative citing papers

citing papers explorer