OpenSpiel: A Framework for Reinforcement Learning in Games
read the original abstract
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully- observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning, computational game theory, and search.
This paper has not been read by Pith yet.
Forward citations
Cited by 19 Pith papers
-
Generalized Intention Modeling in Multi-Agent Reinforcement Learning
Introduces a generalized intention modeling framework in multi-agent RL using a mixture of intent representations and a mutual information-based intent measure that improves or matches state-of-the-art performance.
-
Effective, Efficient, and General Information Abstraction for Imperfect-Information Extensive-Form Games
WEVA uses short CFR warm-up runs to build expected-value feature vectors for k-means clustering, yielding abstractions that reduce exploitability by up to 80% compared with equity- or rank-based methods across three games.
-
Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition
Coopetition-Gym v1 provides twenty calibrated environments for mixed-motive MARL with parameterized private/integrated/cooperative rewards, game-theoretic oracles, and validation against four historical coopetitive ca...
-
A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets
Introduces a differentiable dual-positive monotone parameterization for multi-segment bids and a framework to measure how close RL electricity market simulations are to Nash equilibrium.
-
Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning
Solly is the first AI to achieve elite human-level play in reduced-format Liar's Poker via self-play actor-critic reinforcement learning, outperforming both world-class humans and large language models on win rate and...
-
EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games
EMAgnet replaces uniform-magnet regularization in PPO self-play with an EMA of last-iterate policy parameters and reports lower exploitability on most tested zero-sum benchmarks, especially those with dominated strategies.
-
Real-Time Parallel Counterfactual Regret Minimization
Parallel CFR achieves 3.3-3.4x speedup and 47-54 ms per iteration for real-time depth-limited CFR on Heads-Up No-Limit Texas Hold'em with over one billion histories.
-
GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
-
Verifiable Process Rewards for Agentic Reasoning
Verifiable Process Rewards (VPR) converts symbolic oracles into dense turn-level supervision for reinforcement learning in agentic reasoning, outperforming outcome-only rewards and transferring to general benchmarks.
-
Verifiable Process Rewards for Agentic Reasoning
VPR converts symbolic, constraint, or posterior oracles into dense turn-level rewards for RL, improving credit assignment in agentic reasoning and transferring to general benchmarks.
-
A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning
A sharp threshold at zero reach-weighted contingent action capacity governs whether self-play RL collapses to a deterministic exploitation attractor under asymmetric perturbations.
-
TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning
Presents TABX, a modular JAX-accelerated sandbox simulator enabling customizable multi-agent tasks and high-throughput evaluation for cooperative MARL.
-
NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria
NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.
-
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasonin...
-
Robots Need More than VLA and World Models
The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.
-
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
-
StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games
StratFormer uses a two-phase curriculum with dual-turn tokens and bucket-rate features to model and exploit opponents in Leduc Hold'em, gaining +0.106 BB/hand on average over GTO while keeping near-equilibrium safety.
-
Towards Learning Representations of Policies in Two-Player Zero-Sum Imperfect-Information Games
Basic dataset creation, embedding learning, and evaluation tasks on Kuhn and Leduc Poker demonstrate that useful behavioral representations appear in the learned embeddings.
-
Distilling Game Code World Model Generation into Lightweight Large Language Models
SFT followed by RLVR on Qwen2.5-3B-Instruct raises syntactic and execution correctness when generating Game Code World Models across 30 games.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.