IdleSpec improves LLM agent accuracy by generating and aggregating speculative plans during idle time between tool calls and observations using complementary drafting strategies.
Scaling test-time compute for LLM agents
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8representative citing papers
TClone introduces low-latency forking of live GUI environments via sibling containers, copy-on-write sharing, and asynchronous checkpointing, achieving 1.9x and 1.5x lower task latency than KVM and CRIU.
PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.
A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.
SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.
LLM reliability techniques are unified as communication channel operators, with a new cost-aware router achieving superior quality-cost tradeoffs on hard tasks.
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
ExComm adds cross-agent conflict detection and soft belief correction plus trajectory diversification to agentic test-time scaling, yielding 5-6% gains over baselines on AIME and GAIA benchmarks.
citing papers explorer
-
IdleSpec: Exploiting Idle Time via Speculative Planning for LLM Agents
IdleSpec improves LLM agent accuracy by generating and aggregating speculative plans during idle time between tool calls and observations using complementary drafting strategies.
-
TClone: Low-Latency Forking of Live GUI Environments for Computer-Use Agents
TClone introduces low-latency forking of live GUI environments via sibling containers, copy-on-write sharing, and asynchronous checkpointing, achieving 1.9x and 1.5x lower task latency than KVM and CRIU.
-
The Granularity Mismatch in Agent Security: Argument-Level Provenance Solves Enforcement and Isolates the LLM Reasoning Bottleneck
PACT achieves perfect security and utility under oracle provenance by enforcing argument-level trust contracts based on semantic roles and cross-step provenance tracking, outperforming invocation-level monitors in AgentDojo evaluations.
-
Inference-Time Budget Control for LLM Search Agents
A VOI-based controller for dual inference budgets improves multi-hop QA performance by prioritizing search actions and selectively finalizing answers.
-
Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility
SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.
-
A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability
LLM reliability techniques are unified as communication channel operators, with a new cost-aware router achieving superior quality-cost tradeoffs on hard tasks.
-
Evaluation-driven Scaling for Scientific Discovery
SimpleTES scales test-time evaluation in LLMs to discover state-of-the-art solutions on 21 scientific problems across six domains, outperforming frontier models and optimization pipelines with examples like 2x faster LASSO and new Erdos constructions.
-
ExComm: Exploration-Stage Communication for Error-Resilient Agentic Test-Time Scaling
ExComm adds cross-agent conflict detection and soft belief correction plus trajectory diversification to agentic test-time scaling, yielding 5-6% gains over baselines on AIME and GAIA benchmarks.