Alice uses preservation conflicts from failed candidate updates to create class-stratified hypotheses and guide exploration, improving executable world-model learning under prior misalignment.
hub Mixed citations
Resum: Unlocking long-horizon search intelligence via context summarization
Mixed citation behavior. Most common role is background (67%).
hub tools
citation-role summary
citation-polarity summary
representative citing papers
MemDocAgent generates consistent hierarchical repository-level code documentation by combining dependency-aware traversal with memory-guided agent interactions that accumulate work traces.
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
AdaCoM trains an external context manager with RL to improve long-horizon LLM agent performance via adaptive pruning and preservation, revealing a fidelity-reliability trade-off across agents.
PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.
Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.
Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.
PRISM is a new inference-time retrieval system that achieves higher accuracy than baselines on long-horizon agent tasks while using an order of magnitude less context by combining hierarchical graph search, intent-based costing, compression, and adaptive routing over structured memory.
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
A large model generates a compact reasoning signal that a small model uses to solve tasks, reducing the large model's output tokens by up to 60% on benchmarks like AIME and GPQA.
LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.
AggAgent uses an agent with inspection tools to aggregate parallel trajectories for agentic tasks, outperforming prior methods by up to 5.3% on average across benchmarks.
MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.
LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.
RSE distills search trajectories into an experience bank for positive and negative recycling, yielding efficiency gains over independent sampling on math reasoning benchmarks.
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.
Kimi K2.5 combines joint text-vision training with an Agent Swarm parallel orchestration framework to reach claimed state-of-the-art results on coding, vision, reasoning, and agent tasks while cutting latency up to 4.5 times.
Agent-Radar is a training-free context management technique applying temporal and spatial decay to focus multi-agent LLM attention on relevant history, delivering up to 7.64 point gains on five benchmarks.
citing papers explorer
-
Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models
Alice uses preservation conflicts from failed candidate updates to create class-stratified hypotheses and guide exploration, improving executable world-model learning under prior misalignment.
-
Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation
MemDocAgent generates consistent hierarchical repository-level code documentation by combining dependency-aware traversal with memory-guided agent interactions that accumulate work traces.
-
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
-
Learning Agent-Compatible Context Management for Long-Horizon Tasks
AdaCoM trains an external context manager with RL to improve long-horizon LLM agent performance via adaptive pruning and preservation, revealing a fidelity-reliability trade-off across agents.
-
PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents
PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.
-
Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs
Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.
-
Argus: Evidence Assembly for Scalable Deep Research Agents
Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.
-
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents
PRISM is a new inference-time retrieval system that achieves higher accuracy than baselines on long-horizon agent tasks while using an order of magnitude less context by combining hierarchical graph search, intent-based costing, compression, and adaptive routing over structured memory.
-
Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
-
When Less is Enough: Efficient Inference via Collaborative Reasoning
A large model generates a compact reasoning signal that a small model uses to solve tasks, reducing the large model's output tokens by up to 60% on benchmarks like AIME and GPQA.
-
Towards Long-horizon Agentic Multimodal Search
LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.
-
Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks
AggAgent uses an agent with inspection tools to aggregate parallel trajectories for agentic tasks, outperforming prior methods by up to 5.3% on average across benchmarks.
-
MEMENTO: Teaching LLMs to Manage Their Own Context
MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.
-
LightThinker++: From Reasoning Compression to Memory Management
LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.
-
Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling
RSE distills search trajectories into an experience bank for positive and negative recycling, yielding efficiency gains over independent sampling on math reasoning benchmarks.
-
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.
-
ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning
ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.
-
Kimi K2.5: Visual Agentic Intelligence
Kimi K2.5 combines joint text-vision training with an Agent Swarm parallel orchestration framework to reach claimed state-of-the-art results on coding, vision, reasoning, and agent tasks while cutting latency up to 4.5 times.
-
Enhancing Multi-Agent Communication through Attention Steering with Context Relevance
Agent-Radar is a training-free context management technique applying temporal and spatial decay to focus multi-agent LLM attention on relevant history, delivering up to 7.64 point gains on five benchmarks.