hub

URL https://openreview.net/forum?id=ehfRiF0R3a

Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, Julian McAuley · 2025 · arXiv 2509.25911

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.

DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

SkillGraph represents skills as nodes in an evolving directed graph with typed dependency edges and updates the graph from RL trajectories to boost compositional task performance.

Tree-based Credit Assignment for Multi-Agent Memory System

cs.MA · 2026-05-06 · unverdicted · novelty 6.0

TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.

Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents

cs.AI · 2026-04-17 · conditional · novelty 6.0

The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.

POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.

Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards

cs.AI · 2026-04-11 · unverdicted · novelty 6.0

Introduces MemHome benchmark and RL with multi-dimensional rewards for memory-driven smart home device control.

TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.

HyperMem: Hypergraph Memory for Long-Term Conversations

cs.CL · 2026-04-09 · unverdicted · novelty 5.0

HyperMem is a hypergraph memory architecture that groups related conversation episodes and facts via hyperedges and reports 92.73% LLM-as-a-judge accuracy on the LoCoMo benchmark.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

cs.AI · 2026-05-08 · 2 refs

citing papers explorer

Showing 14 of 14 citing papers.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 33
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 8
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues cs.CL · 2026-05-12 · unverdicted · none · ref 95
LongMemEval-V2 is a new benchmark where AgentRunbook-C reaches 72.5% accuracy on long-term agent memory tasks, beating RAG baselines at 48.5% and basic coding agents at 69.3%.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning cs.CL · 2026-05-11 · unverdicted · none · ref 23
DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 159
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs cs.CL · 2026-05-12 · unverdicted · none · ref 14
SkillGraph represents skills as nodes in an evolving directed graph with typed dependency edges and updates the graph from RL trajectories to boost compositional task performance.
Tree-based Credit Assignment for Multi-Agent Memory System cs.MA · 2026-05-06 · unverdicted · none · ref 17
TreeMem assigns credit to agents in multi-agent memory systems by expanding outputs into a tree and using Monte Carlo averaging of final rewards to optimize each agent's policy.
Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents cs.AI · 2026-04-17 · conditional · none · ref 21
The Experience Compression Spectrum unifies memory, skills, and rules in LLM agents along increasing compression levels and identifies the absence of adaptive cross-level compression as the missing diagonal.
POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch cs.CV · 2026-04-15 · unverdicted · none · ref 46
POINTS-Seeker-8B is an 8B multimodal model trained from scratch for agentic search that uses seeding and visual-space history folding to outperform prior models on six visual reasoning benchmarks.
Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards cs.AI · 2026-04-11 · unverdicted · none · ref 21
Introduces MemHome benchmark and RL with multi-dimensional rewards for memory-driven smart home device control.
TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation cs.CL · 2026-04-09 · unverdicted · none · ref 71
TSUBASA improves long-horizon personalization in LLMs via dynamic memory evolution for writing and context-distillation self-learning for reading, outperforming Mem0 and Memory-R1 on Qwen-3 benchmarks while reducing token use.
HyperMem: Hypergraph Memory for Long-Term Conversations cs.CL · 2026-04-09 · unverdicted · none · ref 8
HyperMem is a hypergraph memory architecture that groups related conversation episodes and facts via hyperedges and reports 92.73% LLM-as-a-judge accuracy on the LoCoMo benchmark.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 152
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs cs.AI · 2026-05-08 · unreviewed · ref 37 · 2 links

URL https://openreview.net/forum?id=ehfRiF0R3a

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer