citation dossier

Memagent: Reshaping long-context llm with multi-conv rl-based memory agent

Yu, H · 2025 · arXiv 2507.02259

17Pith papers citing it

17reference links

cs.AItop field · 6 papers

UNVERDICTEDtop verdict bucket · 15 papers

This arXiv-backed work is queued for full Pith review when it crosses the high-inbound sweep. That review runs reader · skeptic · desk-editor · referee · rebuttal · circularity · lean confirmation · RS check · pith extraction.

read on arXiv PDF

why this work matters in Pith

Pith has found this work in 17 reviewed papers. Its strongest current cluster is cs.AI (6 papers). The largest review-status bucket among citing papers is UNVERDICTED (15 papers). For highly cited works, this page shows a dossier first and a bounded explorer second; it never tries to render every citing paper at once.

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

cs.CL · 2026-05-12 · unverdicted · novelty 8.0

Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent

cs.AI · 2026-04-08 · unverdicted · novelty 7.0

PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.

LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.

An Agentic Approach to Metadata Reasoning

cs.DB · 2026-04-22 · unverdicted · novelty 6.0

Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99% of the time.

POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.

MEMENTO: Teaching LLMs to Manage Their Own Context

cs.AI · 2026-04-10 · unverdicted · novelty 6.0

MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.

In-Place Test-Time Training

cs.LG · 2026-04-07 · conditional · novelty 6.0

In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.

Decocted Experience Improves Test-Time Inference in LLM Agents

cs.AI · 2026-04-06 · unverdicted · novelty 6.0

Decocted experience—extracting and organizing the essence from accumulated interactions—enables more effective context construction that improves test-time inference in LLM agents on math, web, and software tasks.

LightThinker++: From Reasoning Compression to Memory Management

cs.CL · 2026-04-04 · unverdicted · novelty 6.0

LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

cs.AI · 2026-04-03 · unverdicted · novelty 6.0

GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.

MemFactory: Unified Inference & Training Framework for Agent Memory

cs.CL · 2026-03-31 · unverdicted · novelty 6.0

MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.

MiA-Signature: Approximating Global Activation for Long-Context Understanding

cs.CL · 2026-05-07 · unverdicted · novelty 5.0

MiA-Signature approximates the global activation state induced by a query via submodular concept selection to enable tractable long-context understanding in LLMs.

LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning

cs.LG · 2026-04-16 · unverdicted · novelty 5.0

LongAct uses saliency from high-magnitude activations to guide sparse weight updates in long-context RL, yielding about 8% gains on LongBench v2 across multiple algorithms.

Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference

cs.DC · 2026-03-30 · unverdicted · novelty 5.0

Unifying LLM memory optimizations into a Prepare-Compute-Retrieve-Apply pipeline and accelerating it on GPU-FPGA hardware yields up to 2.2x faster inference and 4.7x less energy than GPU-only baselines.

citing papers explorer

Showing 17 of 17 citing papers.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 39
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty cs.CL · 2026-05-12 · unverdicted · none · ref 21
Agent-BRACE improves LLM agent performance on long-horizon partially observable tasks by 5.3-14.5% through a decoupled belief state of verbalized atomic claims with certainty labels that keeps context length constant.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States cs.CL · 2026-05-06 · unverdicted · none · ref 42
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 172
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent cs.AI · 2026-04-08 · unverdicted · none · ref 23
PRIME enables agents to proactively reason in user-centric tasks by iteratively evolving structured memories from interaction trajectories without gradient-based training.
LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents cs.AI · 2026-05-06 · unverdicted · none · ref 15
Context-ReAct enables agents to dynamically manage context via five atomic operations, and LongSeeker fine-tuned on 10k trajectories achieves 61.5% and 62.5% on BrowseComp benchmarks, outperforming prior agents.
An Agentic Approach to Metadata Reasoning cs.DB · 2026-04-22 · unverdicted · none · ref 41
Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99% of the time.
POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch cs.CV · 2026-04-15 · unverdicted · none · ref 58
POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.
MEMENTO: Teaching LLMs to Manage Their Own Context cs.AI · 2026-04-10 · unverdicted · none · ref 35
MEMENTO trains LLMs to segment reasoning into blocks, generate mementos as dense summaries, and reason forward using only mementos and KV states, cutting peak KV cache by ~2.5x while preserving benchmark accuracy.
In-Place Test-Time Training cs.LG · 2026-04-07 · conditional · none · ref 65
In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.
Decocted Experience Improves Test-Time Inference in LLM Agents cs.AI · 2026-04-06 · unverdicted · none · ref 4
Decocted experience—extracting and organizing the essence from accumulated interactions—enables more effective context construction that improves test-time inference in LLM agents on math, web, and software tasks.
LightThinker++: From Reasoning Compression to Memory Management cs.CL · 2026-04-04 · unverdicted · none · ref 72
LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning cs.AI · 2026-04-03 · unverdicted · none · ref 37
GrandCode is the first AI system to consistently beat all human participants and place first in live Codeforces competitive programming contests.
MemFactory: Unified Inference & Training Framework for Agent Memory cs.CL · 2026-03-31 · unverdicted · none · ref 17
MemFactory is a new unified modular framework for memory-augmented LLM agent inference and training that integrates GRPO and reports up to 14.8% relative gains on MemAgent evaluations.
MiA-Signature: Approximating Global Activation for Long-Context Understanding cs.CL · 2026-05-07 · unverdicted · none · ref 36
MiA-Signature approximates the global activation state induced by a query via submodular concept selection to enable tractable long-context understanding in LLMs.
LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning cs.LG · 2026-04-16 · unverdicted · none · ref 31
LongAct uses saliency from high-magnitude activations to guide sparse weight updates in long-context RL, yielding about 8% gains on LongBench v2 across multiple algorithms.
Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference cs.DC · 2026-03-30 · unverdicted · none · ref 29
Unifying LLM memory optimizations into a Prepare-Compute-Retrieve-Apply pipeline and accelerating it on GPU-FPGA hardware yields up to 2.2x faster inference and 4.7x less energy than GPU-only baselines.

Memagent: Reshaping long-context llm with multi-conv rl-based memory agent

why this work matters in Pith

fields

years

verdicts

representative citing papers

citing papers explorer