Scaling llm multi-turn rl with end-to-end summarization-based context management

Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, Jiecao Chen · 2025 · arXiv 2510.06727

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

cs.MA · 2026-05-09 · unverdicted · novelty 6.0

Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.

ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

ScrapMem introduces optical forgetting to compress multimodal memories for LLM agents on edge devices, cutting storage by up to 93% while reaching 51.0% Joint@10 and 70.3% Recall@10 on ATM-Bench.

A Brief Overview: Agentic Reinforcement Learning In Large Language Models

cs.AI · 2026-04-30 · unverdicted · novelty 2.0 · 2 refs

The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-reflection into LLM-based agents.

citing papers explorer

Showing 4 of 4 citing papers.

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents cs.MA · 2026-05-09 · unverdicted · none · ref 81
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents cs.AI · 2026-05-06 · unverdicted · none · ref 6
Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.
ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting cs.AI · 2026-05-05 · unverdicted · none · ref 71
ScrapMem introduces optical forgetting to compress multimodal memories for LLM agents on edge devices, cutting storage by up to 93% while reaching 51.0% Joint@10 and 70.3% Recall@10 on ATM-Bench.
A Brief Overview: Agentic Reinforcement Learning In Large Language Models cs.AI · 2026-04-30 · unverdicted · none · ref 54 · 2 links
The paper surveys the conceptual foundations, methodological innovations, challenges, and future directions of agentic reinforcement learning frameworks that embed cognitive capabilities like meta-reasoning and self-reflection into LLM-based agents.

Scaling llm multi-turn rl with end-to-end summarization-based context management

fields

years

verdicts

representative citing papers

citing papers explorer