ReCrit frames critic interaction as a correctness-transition problem and uses quadrant-based RL rewards to improve LLM performance on scientific reasoning benchmarks by rewarding corrections and robustness while penalizing sycophancy.
hub
Llm-based multi-agent reinforcement learning: Current and future directions
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
MACA frames multi-agent coordination as posterior inference, learns a structural prior to guide orchestration, and reports 8.42% higher performance with 43.19% fewer tokens than adaptive baselines on benchmarks.
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.
CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.
AgeMem unifies long-term and short-term memory management in LLM agents by exposing memory operations as learnable tool actions trained via three-stage progressive reinforcement learning, outperforming baselines on long-horizon tasks.
WebSailor trains open-source web agents to match proprietary performance on complex information-seeking tasks by generating high-uncertainty scenarios and using a new RL method called DUPO.
CoEvolve improves LLM agent performance by 15-19% on AppWorld and BFCL benchmarks through mutual evolution of the agent and data distribution using feedback-driven task synthesis.
OATH combines adaptive Halton sampling, obstacle-aware clustering with auctions, and LLM-based instruction interpretation to improve task assignment and planning for heterogeneous robot teams in obstacle-rich environments.
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.
The survey organizes LLM-based multi-agent collaboration mechanisms into a framework with dimensions of actors, types, structures, strategies, and coordination protocols, reviews applications across domains, and identifies challenges for future research.
A survey consolidating frameworks, data practices, large action models, benchmarks, applications, and research gaps in LLM-brained GUI agents.
citing papers explorer
-
Multi-Agent Coordination Adaptation via Structure-Guided Orchestration
MACA frames multi-agent coordination as posterior inference, learns a structural prior to guide orchestration, and reports 8.42% higher performance with 43.19% fewer tokens than adaptive baselines on benchmarks.
-
Joint Optimization of Multi-agent Memory System
CoMAM jointly optimizes agents in multi-agent LLM memory systems via end-to-end RL and adaptive credit assignment to improve collaboration and performance.