ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
hub
Llm-based multi-agent reinforcement learning: Current and future directions
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
MACA frames multi-agent coordination as posterior inference, learns a structural prior to guide orchestration, and reports 8.42% higher performance with 43.19% fewer tokens than adaptive baselines on benchmarks.
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.
AgeMem unifies long-term and short-term memory management in LLM agents by exposing memory operations as learnable tool actions trained via three-stage progressive reinforcement learning, outperforming baselines on long-horizon tasks.
WebSailor trains open-source web agents to match proprietary performance on complex information-seeking tasks by generating high-uncertainty scenarios and using a new RL method called DUPO.
CoEvolve improves LLM agent performance by 15-19% on AppWorld and BFCL benchmarks through mutual evolution of the agent and data distribution using feedback-driven task synthesis.
OATH combines adaptive Halton sampling, obstacle-aware clustering with auctions, and LLM-based instruction interpretation to improve task assignment and planning for heterogeneous robot teams in obstacle-rich environments.
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
The survey organizes LLM-based multi-agent collaboration mechanisms into a framework with dimensions of actors, types, structures, strategies, and coordination protocols, reviews applications across domains, and identifies challenges for future research.
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
citing papers explorer
-
ReCrit: Transition-Aware Reinforcement Learning for Scientific Critic Reasoning
ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
-
Multi-Agent Coordination Adaptation via Structure-Guided Orchestration
MACA frames multi-agent coordination as posterior inference, learns a structural prior to guide orchestration, and reports 8.42% higher performance with 43.19% fewer tokens than adaptive baselines on benchmarks.
-
Robust Instruction Compliance in Cooperative Multi-Agent Reinforcement Learning
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
-
Do LLM-derived graph priors improve multi-agent coordination?
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
-
Joint Optimization of Multi-agent Memory System
CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.
-
Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents
AgeMem unifies long-term and short-term memory management in LLM agents by exposing memory operations as learnable tool actions trained via three-stage progressive reinforcement learning, outperforming baselines on long-horizon tasks.
-
WebSailor: Navigating Super-human Reasoning for Web Agent
WebSailor trains open-source web agents to match proprietary performance on complex information-seeking tasks by generating high-uncertainty scenarios and using a new RL method called DUPO.
-
CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution
CoEvolve improves LLM agent performance by 15-19% on AppWorld and BFCL benchmarks through mutual evolution of the agent and data distribution using feedback-driven task synthesis.
-
Adaptive Obstacle-Aware Task Assignment and Planning for Heterogeneous Robot Teaming
OATH combines adaptive Halton sampling, obstacle-aware clustering with auctions, and LLM-based instruction interpretation to improve task assignment and planning for heterogeneous robot teams in obstacle-rich environments.
-
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
-
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
-
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
The survey organizes LLM-based multi-agent collaboration mechanisms into a framework with dimensions of actors, types, structures, strategies, and coordination protocols, reviews applications across domains, and identifies challenges for future research.
-
Large Language Model-Brained GUI Agents: A Survey
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.