Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information

Henry Charlesworth

arxiv: 1808.10442 · v1 · pith:TUCT2YAZnew · submitted 2018-08-30 · 💻 cs.LG · cs.AI· stat.ML

Application of Self-Play Reinforcement Learning to a Four-Player Game of Imperfect Information

Henry Charlesworth This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords gamelearningcardfour-playerimperfectinformationplayreinforcement

0 comments

read the original abstract

We introduce a new virtual environment for simulating a card game known as "Big 2". This is a four-player game of imperfect information with a relatively complicated action space (being allowed to play 1,2,3,4 or 5 card combinations from an initial starting hand of 13 cards). As such it poses a challenge for many current reinforcement learning methods. We then use the recently proposed "Proximal Policy Optimization" algorithm to train a deep neural network to play the game, purely learning via self-play, and find that it is able to reach a level which outperforms amateur human players after only a relatively short amount of training time and without needing to search a tree of future game states.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Self-Play Reinforcement Learning under Imperfect Information in Big 2
cs.LG 2026-05 unverdicted novelty 3.0

PPO with moderate entropy regularization and current-policy self-play outperforms Monte Carlo Q, SARSA, and Q-learning in a controlled self-play framework for the imperfect-information game Big 2.