Bayes' Bluff: Opponent Modelling in Poker

Bryce Larson; Carmelo Piccione; Chris Rayner; Darse Billings; Finnegan Southey; Michael P. Bowling; Neil Burch

arxiv: 1207.1411 · v1 · pith:6CJSNUEZnew · submitted 2012-07-04 · 💻 cs.GT · cs.AI

Bayes' Bluff: Opponent Modelling in Poker

Finnegan Southey , Michael P. Bowling , Bryce Larson , Carmelo Piccione , Neil Burch , Darse Billings , Chris Rayner This is my paper

classification 💻 cs.GT cs.AI

keywords opponentpokerdemonstratedistributiondynamicsgamemodellingplaying

0 comments

read the original abstract

Poker is a challenging problem for artificial intelligence, with non-deterministic dynamics, partial observability, and the added difficulty of unknown adversaries. Modelling all of the uncertainties in this domain is not an easy task. In this paper we present a Bayesian probabilistic model for a broad class of poker games, separating the uncertainty in the game dynamics from the uncertainty of the opponent's strategy. We then describe approaches to two key subproblems: (i) inferring a posterior over opponent strategies given a prior distribution and observations of their play, and (ii) playing an appropriate response to that distribution. We demonstrate the overall approach on a reduced version of poker using Dirichlet priors and then on the full game of Texas hold'em using a more informed prior. We demonstrate methods for playing effective responses to the opponent, based on the posterior.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On-line Learning in Tree MDPs by Treating Policies as Bandit Arms
cs.AI 2026-05 unverdicted novelty 7.0

Bandit algorithms can be adapted to Tree MDPs by treating policies as arms with shared-data confidence bounds, achieving polynomial memory and instance-dependent bounds on sample complexity and regret that depend on t...
GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
cs.AI 2025-06 unverdicted novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasonin...
AlphaExploitem: Going Beyond the Nash Equilibrium in Poker by Learning to Exploit Suboptimal Play
cs.LG 2026-05 unverdicted novelty 5.0

AlphaExploitem adds a hierarchical transformer encoder and a diverse pool of exploitable opponents to AlphaHoldem, enabling exploitation of suboptimal poker play while preserving performance against Nash-equilibrium o...