Scaling llm multi-turn rl with end-to-end summarization-based context management

Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, Jiecao Chen · 2025 · arXiv 2510.06727

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

cs.MA · 2026-05-09 · unverdicted · novelty 6.0

Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.

SWE-MeM: Learning Adaptive Memory Management for Long-Horizon Coding Agents

cs.SE · 2026-06-26 · unverdicted · novelty 5.0

SWE-MeM introduces adaptive memory management for coding agents via synthesized trajectories and Memory-aware GRPO, reporting 43.4% and 60.2% resolve rates on SWE-Bench Verified for 4B and 30B models while beating baselines on performance and token use.

Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning

cs.AI · 2026-05-14 · unverdicted · novelty 5.0

LaMR decomposes code context pruning into two rubrics using dedicated CRFs, a mixture-of-experts gate, and AST-derived labels to filter noise and often match or beat full-context baselines on coding benchmarks.

R$^2$-Searcher: Calibrating Retrieval and Reasoning Boundaries for Agentic Search

cs.IR · 2026-06-26 · unverdicted · novelty 4.0

R²-Searcher introduces fine-grained evidence modeling, retrieval reflection, and R²PO RL to calibrate retrieval-reasoning boundaries and improve multi-hop QA performance.

Rethinking Agentic Reinforcement Learning In Large Language Models

cs.AI · 2026-04-30 · unverdicted · novelty 3.0 · 3 refs

The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.

ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting

cs.AI · 2026-05-05

citing papers explorer

Showing 7 of 7 citing papers.

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents cs.MA · 2026-05-09 · unverdicted · none · ref 81
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents cs.AI · 2026-05-06 · unverdicted · none · ref 6
Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.
SWE-MeM: Learning Adaptive Memory Management for Long-Horizon Coding Agents cs.SE · 2026-06-26 · unverdicted · none · ref 26
SWE-MeM introduces adaptive memory management for coding agents via synthesized trajectories and Memory-aware GRPO, reporting 43.4% and 60.2% resolve rates on SWE-Bench Verified for 4B and 30B models while beating baselines on performance and token use.
Context Pruning for Coding Agents via Multi-Rubric Latent Reasoning cs.AI · 2026-05-14 · unverdicted · none · ref 30
LaMR decomposes code context pruning into two rubrics using dedicated CRFs, a mixture-of-experts gate, and AST-derived labels to filter noise and often match or beat full-context baselines on coding benchmarks.
R$^2$-Searcher: Calibrating Retrieval and Reasoning Boundaries for Agentic Search cs.IR · 2026-06-26 · unverdicted · none · ref 29
R²-Searcher introduces fine-grained evidence modeling, retrieval reflection, and R²PO RL to calibrate retrieval-reasoning boundaries and improve multi-hop QA performance.
Rethinking Agentic Reinforcement Learning In Large Language Models cs.AI · 2026-04-30 · unverdicted · none · ref 54 · 3 links
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.
ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting cs.AI · 2026-05-05 · unreviewed · ref 71

Scaling llm multi-turn rl with end-to-end summarization-based context management

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer