Learning to Cooperate via Policy Search

Kee-Eung Kim; Leonid Peshkin; Leslie Pack Kaelbling; Nicolas Meuleau

arxiv: cs/0105032 · v1 · pith:VXPGMA4Enew · submitted 2001-05-25 · 💻 cs.LG · cs.MA

Learning to Cooperate via Policy Search

Leonid Peshkin , Kee-Eung Kim , Nicolas Meuleau , Leslie Pack Kaelbling This is my paper

classification 💻 cs.LG cs.MA

keywords cooperativegamesobservableagentslearningmethodmethodspartially

0 comments

read the original abstract

Cooperative games are those in which both agents share the same payoff structure. Value-based reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policy-search method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
cs.AI 2026-01 unverdicted novelty 6.0

Multi-agent actor-critic methods with a centralized critic improve decentralized LLM collaboration over Monte Carlo baselines in long-horizon and sparse-reward settings.
Cross-Modal Navigation with Multi-Agent Reinforcement Learning
cs.RO 2026-05 unverdicted novelty 5.0

CRONA is a MARL framework that uses modality-specialized agents with auxiliary beliefs and a centralized multi-modal critic to achieve better performance and efficiency than single-agent baselines on visual-acoustic n...