Intrinsic motivation and automatic curricula via asymmetric self-play

Sukhbaatar,S · 2017 · cs.LG · arXiv 1703.05407

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be reset. Alice will "propose" the task by doing a sequence of actions and then Bob must undo or repeat them, respectively. Via an appropriate reward structure, Alice and Bob automatically generate a curriculum of exploration, enabling unsupervised training of the agent. When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Dota 2 with Large Scale Deep Reinforcement Learning

cs.LG · 2019-12-13 · accept · novelty 7.0

OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.

Scaling Self-Play with Self-Guidance

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

SGS adds self-guidance to LLM self-play for Lean4 theorem proving, surpassing RL baselines and enabling a 7B model to outperform a 671B model after 200 rounds.

Growing Action Spaces

cs.LG · 2019-06-28 · unverdicted · novelty 5.0

A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.

citing papers explorer

Showing 3 of 3 citing papers.

Dota 2 with Large Scale Deep Reinforcement Learning cs.LG · 2019-12-13 · accept · none · ref 36
OpenAI Five achieved superhuman performance in Dota 2 by defeating the world champions using scaled self-play reinforcement learning.
Scaling Self-Play with Self-Guidance cs.LG · 2026-04-22 · unverdicted · none · ref 34
SGS adds self-guidance to LLM self-play for Lean4 theorem proving, surpassing RL baselines and enabling a 7B model to outperform a 671B model after 200 rounds.
Growing Action Spaces cs.LG · 2019-06-28 · unverdicted · none · ref 10 · internal anchor
A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.

Intrinsic motivation and automatic curricula via asymmetric self-play

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer