{"total":31,"items":[{"citing_arxiv_id":"2607.00508","ref_index":40,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When RAG Meets Query Planning: Logical Query Trees for Resolving Exploratory Reasoning Problems","primary_cat":"cs.IR","submitted_at":"2026-07-01T06:43:55+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.31650","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ECHO: Prune to act, trace to learn with selective turn memory in agentic RL","primary_cat":"cs.LG","submitted_at":"2026-06-30T13:29:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ECHO is a selective turn-memory framework for agentic RL that compresses turns into indexed records, selects them for bounded contexts, and uses source indices to assign outcome credit to supporting evidence, reaching 43.4% accuracy on BrowseComp-Plus versus 28.9% for GRPO and 36.1% for SUPO.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.31564","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ACE: Pluggable Adaptive Context Elasticizer across Agents","primary_cat":"cs.AI","submitted_at":"2026-06-30T12:20:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ACE is a pluggable module that elastically orchestrates historical agent steps as raw, abstract, or dropped to maintain compact yet recoverable context for LLM agents handling long trajectories.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.30005","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM Agents Are Latent Context Managers: Eliciting Self-Managed Context via a Proprioceptive Dashboard","primary_cat":"cs.CL","submitted_at":"2026-06-29T09:13:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VISTA supplies LLM agents with a visible proprioceptive dashboard of typed context blocks, enabling untrained self-management that lifts performance on long-horizon tool-use benchmarks across multiple model scales.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.28434","ref_index":52,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SWE-MeM: Learning Adaptive Memory Management for Long-Horizon Coding Agents","primary_cat":"cs.SE","submitted_at":"2026-06-26T04:55:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SWE-MeM introduces adaptive memory management for coding agents via synthesized trajectories and Memory-aware GRPO, reporting 43.4% and 60.2% resolve rates on SWE-Bench Verified for 4B and 30B models while beating baselines on performance and token use.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.13316","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ReSum: Synergizing LLM Reasoning and Summarization with Reinforcement Learning","primary_cat":"cs.AI","submitted_at":"2026-06-11T13:10:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ReSum trains LLMs via RLVR to self-summarize reasoning trajectories, yielding 4% average performance gains and 18.6% shorter rollouts through contrastive rollout branches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12837","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling","primary_cat":"cs.CL","submitted_at":"2026-06-11T03:04:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"LoHoSearch is a new benchmark of 544 KG-constructed questions across 11 domains where the strongest search agent scores 34.74% and context strategies add at most 6.8%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11680","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents","primary_cat":"cs.AI","submitted_at":"2026-06-10T05:49:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HORMA builds a hierarchical memory structure from agent experiences and trains a lightweight RL navigator to retrieve minimal sufficient context, yielding better task performance with at most 22.17% of baseline token usage on ALFWorld, LoCoMo, and LongMemEval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10532","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning","primary_cat":"cs.AI","submitted_at":"2026-06-09T08:03:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30785","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Agent-Compatible Context Management for Long-Horizon Tasks","primary_cat":"cs.AI","submitted_at":"2026-05-29T03:21:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AdaCoM trains an external context manager with RL to improve long-horizon LLM agent performance via adaptive pruning and preservation, revealing a fidelity-reliability trade-off across agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30136","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Enhancing Multi-Agent Communication through Attention Steering with Context Relevance","primary_cat":"cs.AI","submitted_at":"2026-05-28T16:02:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Agent-Radar is a training-free context management technique applying temporal and spatial decay to focus multi-agent LLM attention on relevant history, delivering up to 7.64 point gains on five benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27141","ref_index":81,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions","primary_cat":"cs.AI","submitted_at":"2026-05-26T15:07:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VitaBench 2.0 introduces a benchmark for long-term personalized and proactive agent behavior, with results indicating substantial gaps in current frontier LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25693","ref_index":97,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents","primary_cat":"cs.CL","submitted_at":"2026-05-25T10:48:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RoleMemo dataset and DualMem dual-memory framework let role-playing agents interpret facts through personas, with a 4B model beating larger zero-shot systems on fidelity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24486","ref_index":61,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AgentFugue: Agent Scaling for Long-Horizon Tasks through Collective Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-23T09:21:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AgentFugue introduces a plug-in shared reasoning hub trained with SFT and RL that enables peer agents to share intermediate reasoning, yielding gains on long-horizon tasks over strong baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24468","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SAM: State-Adaptive Memory for Long-Horizon Reasoning Agent","primary_cat":"cs.AI","submitted_at":"2026-05-23T08:37:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SAM is a standalone memory framework for long-horizon LLM agents that creates state-adaptive cues from interactions, preserves raw trajectories for intent-driven recall, and optimizes the module via expert supervision and RL, outperforming baselines on BrowseComp and related benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19932","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents","primary_cat":"cs.AI","submitted_at":"2026-05-19T14:51:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PEEK maintains a constant-sized context map via a programmable cache policy to give LLM agents persistent orientation knowledge about recurring external contexts, yielding 6-34% gains and lower cost than prior prompt-learning methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18165","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Elastic-dLLM: Position Preserving Context Compression and Augmentation of Diffusion LLMs","primary_cat":"cs.LG","submitted_at":"2026-05-18T10:09:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Position-preserving MASK token compression reduces redundancy in diffusion LLMs to accelerate parallel decoding and enable context folding for longer sequences.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16725","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models","primary_cat":"cs.AI","submitted_at":"2026-05-16T00:18:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Alice uses preservation conflicts from failed candidate updates to create class-stratified hypotheses and guide exploration, improving executable world-model learning under prior misalignment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16217","ref_index":55,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Argus: Evidence Assembly for Scalable Deep Research Agents","primary_cat":"cs.CL","submitted_at":"2026-05-15T17:29:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Argus coordinates a Navigator and multiple Searchers via an evidence graph for deep research, reporting average gains of 5.5 points with one Searcher and 12.7 points with eight parallel Searchers across eight benchmarks, reaching 86.2 on BrowseComp with 64 Searchers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14563","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation","primary_cat":"cs.SE","submitted_at":"2026-05-14T08:35:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemDocAgent generates consistent hierarchical repository-level code documentation by combining dependency-aware traversal with memory-guided agent interactions that accumulate work traces.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 32779-32798, 2025. [41] Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, et al. Resum: Unlocking long-horizon search intelligence via context summarization.arXiv preprint arXiv:2509.13313, 2025. [42] Minki Kang, Wei-Ning Chen, Dongge Han, Huseyin A Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, and Saravan Rajmohan. Acon: Optimizing context compression for long-horizon llm agents.arXiv preprint arXiv:2510.00615, 2025. [43] Mo Li, LH Xu, Qitai Tan, Long Ma, Ting Cao, and Yunxin Liu. Sculptor: Empowering llms with cognitive agency via active context management."},{"citing_arxiv_id":"2605.12260","ref_index":24,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents","primary_cat":"cs.CL","submitted_at":"2026-05-12T15:28:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PRISM is a new inference-time retrieval system that achieves higher accuracy than baselines on long-horizon agent tasks while using an order of magnitude less context by combining hierarchical graph search, intent-based costing, compression, and adaptive routing over structured memory.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"long-context inference of large language models.arXiv preprint arXiv:2406.13035, 2024. [23] Zhongwei Wan, Ziang Wu, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, and Li Yuan. Look-m: Look-once optimization in kv cache for efficient multimodal long-context inference. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 4065-4078, 2024. [24] Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, et al. Resum: Unlocking long-horizon search intelligence via context summarization.arXiv preprint arXiv:2509.13313, 2025. [25] Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for llm agents."},{"citing_arxiv_id":"2605.08580","ref_index":94,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents","primary_cat":"cs.MA","submitted_at":"2026-05-09T00:47:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04496","ref_index":66,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States","primary_cat":"cs.CL","submitted_at":"2026-05-06T04:55:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01111","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When Less is Enough: Efficient Inference via Collaborative Reasoning","primary_cat":"cs.LG","submitted_at":"2026-05-01T21:31:59+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A large model generates a compact reasoning signal that a small model uses to solve tasks, reducing the large model's output tokens by up to 60% on benchmarks like AIME and GPQA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12890","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Long-horizon Agentic Multimodal Search","primary_cat":"cs.CV","submitted_at":"2026-04-14T15:40:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"multimodal large language models: Are we solving the right problem? InFindings of the Association for Computational Linguistics: ACL 2025, pages 15537-15549, 2025. [14] Lingrui Mei, Jiayu Yao, Yuyao Ge, Yiwei Wang, Baolong Bi, Yujun Cai, Jiazhi Liu, Mingyu Li, Zhong-Zhi Li, Duzhen Zhang, et al. A survey of context engineering for large language models. arXiv preprint arXiv:2507.13334, 2025. [15] Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, et al. Resum: Unlocking long-horizon search intelligence via context summarization.arXiv preprint arXiv:2509.13313, 2025. [16] Guoxin Chen, Zile Qiao, Xuanzhong Chen, Donglei Yu, Haotian Xu, Wayne Xin Zhao, Ruihua Song, Wenbiao Yin, Huifeng Yin, Liwen Zhang, et al."},{"citing_arxiv_id":"2604.11753","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks","primary_cat":"cs.CL","submitted_at":"2026-04-13T17:26:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AggAgent uses an agent with inspection tools to aggregate parallel trajectories for agentic tasks, outperforming prior methods by up to 5.3% on average across benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09852","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MEMENTO: Teaching LLMs to Manage Their Own Context","primary_cat":"cs.AI","submitted_at":"2026-04-10T19:30:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03679","ref_index":73,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LightThinker++: From Reasoning Compression to Memory Management","primary_cat":"cs.CL","submitted_at":"2026-04-04T10:46:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"forcement learning to maintain a fixed-size internal memory, allowing agents to handle long-term tasks by retaining essential information and discarding redundant data. ReSum [71] addresses context constraints by periodically summarizing interaction histories, enabling agents to resume exploration from compact, state-based representations. Further advancing this paradigm, AgentFold [72] and Context-Folding [73], introduce a \"folding\" mechanism that compresses detailed interaction histories into compact reasoning states. Compared to token-level KV-cache pruning, these semantic-level methods better preserve the task-critical logic required for complex reasoning scenarios. 8 Conclusion Inthispaper,wepresentLightThinker,anewapproachtoenhancetheefficiencyofLLMsincomplexreasoning"},{"citing_arxiv_id":"2602.02276","ref_index":74,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Kimi K2.5: Visual Agentic Intelligence","primary_cat":"cs.CL","submitted_at":"2026-02-02T16:17:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Kimi K2.5 combines joint text-vision training with an Agent Swarm parallel orchestration framework to reach claimed state-of-the-art results on coding, vision, reasoning, and agent tasks while cutting latency up to 4.5 times.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"normally for tokens with log-ratios within the interval[α,β], while gradients for tokens falling outside this range are zeroed out. Notably, a key distinction from standard PPO clipping [50] is that our method relies strictly on the log-ratio to explicitly bound off-policy drift, regardless of the sign of the advantages. This approach aligns with recent strategies proposed to stabilize large-scale RL training [74, 78]. Empirically, we find this mechanism essential for maintaining training stability in complex domains requiring long-horizon, multi-step tool-use reasoning. We employ the MuonClip optimizer [30, 34] to minimize this objective. Reward FunctionWe apply a rule-based outcome reward for tasks with verifiable solutions, such as reasoning and agentic tasks."},{"citing_arxiv_id":"2601.21684","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Do Not Waste Your Rollouts: Recycling Search Experience for Efficient Test-Time Scaling","primary_cat":"cs.CL","submitted_at":"2026-01-29T13:18:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RSE distills search trajectories into an experience bank for positive and negative recycling, yielding efficiency gains over independent sampling on math reasoning benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.02547","ref_index":140,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","primary_cat":"cs.AI","submitted_at":"2025-09-02T17:46:26+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"M+ [135] Latent Token Scalable memory tokens for long-context tracking IMM [136] Latent Token Decouples word representations and latent memory Memory [137] Latent Token Forget-resistant memory tokens for evolving context MemGen†[138] Latent Token Context-sensitive latent token as memory carriers Structured Memory Zep [139] Temporal Graph Temporal knowledge graph enabling structured retrieval A-MEM [140] Atomic Memory Notes Symbolic atomic memory units; structured storage G-Memory [141] Hierarchical Graph Multi-level memory graph with topological structure Mem0 [142] Structured Graph Agent memory with full-stack graph-based design to human-readable text but rather constitute a machine-native form of memory. Related efforts include IMM [136] and Memory [137]."}],"limit":50,"offset":0}