Approximate exploitability: Learning a best response in large games

Edward Lockhart; Finbarr Timbers; Julian Schrittwieser; Marc Lanctot; Martin Schmid; Michael Bowling; Neil Burch; Nolan Bard; Thomas Hubert

arxiv: 2004.09677 · v5 · pith:7AAOBRT5new · submitted 2020-04-20 · 💻 cs.LG · stat.ML

Approximate exploitability: Learning a best response in large games

Finbarr Timbers , Nolan Bard , Edward Lockhart , Marc Lanctot , Martin Schmid , Neil Burch , Julian Schrittwieser , Thomas Hubert

show 1 more author

Michael Bowling

This is my paper

classification 💻 cs.LG stat.ML

keywords agentsgameslearningworst-caseagentbestevaluationoutcomes

0 comments

read the original abstract

Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically fails to evaluate robustness to worst-case outcomes. Prior research in computer poker has examined how to assess such worst-case performance, both exactly and approximately. Unfortunately, exact computation is infeasible with larger domains, and existing approximations rely on poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, thereby approximating worst-case performance. We demonstrate the technique in several two-player zero-sum games against a variety of agents, including several AlphaZero-based agents.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

How Much Due Diligence Before You Bid? Learning in Intractable Takeover Auctions
cs.AI 2026-06 unverdicted novelty 6.0

Self-play RL in a takeover auction model shows optimal due diligence is modest and finite, decreasing with cost and competition, while simple general methods outperform specialized ones in large intractable games.