Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Joel Z. Leibo , Vinicius Zambaldi , Marc Lanctot , Janusz Marecki , Thore Graepel

Authors on Pith no claims yet

classification 💻 cs.MA cs.AIcs.GTcs.LG

keywords dilemmassocialgamepoliciessequentialagentsgamesintroduce

read the original abstract

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Reciprocity Gradient
cs.LG 2026-05 unverdicted novelty 7.0

The reciprocity gradient allows agents to learn near-optimal context-sensitive policies by analytically propagating reward gradients through reputation chains in multi-agent settings.
Investigating the Impact of Subgraph Social Structure Preference on the Strategic Behavior of Networked Mixed-Motive Learning Agents
cs.MA 2026-04 unverdicted novelty 6.0

Preferences over local subgraph structures cause distinct changes in reward collection and strategic actions for agents playing sequential social dilemmas in Harvest and Cleanup environments.
Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems
cs.MA 2026-04 unverdicted novelty 6.0

LLM agent societies develop power-law coordination cascades and intellectual elites through an integration bottleneck that grows with system size.