arxiv: 2604.23819 · v1 · submitted 2026-04-26 · 💻 cs.ET · quant-ph

Recognition: unknown

Playing Dice with the Universe: Programming Quantum Computers to Play Traditional Games

Tristan Zaborniak , Vikram Khipple Mulligan

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:47 UTC · model grok-4.3

classification 💻 cs.ET quant-ph

keywords quantum annealingtic-tac-toegame rules encodingHamiltonianD-Waveoptimal play samplingquantum benchmarkingperfect-information games

0 comments

The pith

A quantum annealer programmed only with tic-tac-toe rules can represent all game paths and sample moves that raise the odds of winning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that quantum computers given solely the rules of a competitive game can encode every possible sequence of future moves leading to wins, losses, or draws. Sampling from the states that satisfy those rules then yields choices that increase the chance of victory without any separate strategy code. The authors show early evidence by casting tic-tac-toe as an optimization problem on a D-Wave device that then plays against a human. This method differs from classical approaches that rely on hardcoded tactics or machine-learned policies. If the approach holds, game play could become a practical benchmark for quantum hardware in the same way it once drove classical computing progress.

Core claim

Programmed only with the rules of a game, quantum computers should be able to implicitly represent all future paths of a game leading to wins, losses, or draws, and sample from this path set to identify moves that maximize the likelihood of a win. Early results after programming the D-Wave quantum annealer with the rules of tic-tac-toe enable it to play against a human opponent.

What carries the argument

The Hamiltonian that encodes game rules as an energy minimization problem whose low-energy configurations correspond to sequences of legal and winning moves.

If this is right

Skilled moves emerge directly from sampling the encoded rule set without any hard-coded strategy.
Game performance improves as the quantum sampler better distinguishes low-energy winning paths.
The same encoding method can be applied to other finite, perfect-information games.
Quantum hardware development gains a concrete, measurable application benchmark.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

For games larger than tic-tac-toe the method may still require hybrid classical post-processing to handle embedding and noise.
If sampling quality scales with qubit count, the approach could supply move suggestions in real time where classical exhaustive search becomes intractable.
The technique offers a route to quantum game solvers that remain rule-complete even when the full game tree exceeds classical memory.

Load-bearing premise

Mapping the game rules into the annealer's Hamiltonian produces an energy landscape in which optimal moves are the ones most readily sampled by the hardware.

What would settle it

If the programmed device repeatedly selects losing moves or fails to block an opponent's win in tic-tac-toe despite correct rule encoding, the sampling would not be identifying high-win-probability paths.

Figures

Figures reproduced from arXiv: 2604.23819 by Tristan Zaborniak, Vikram Khipple Mulligan.

**Figure 1.** Figure 1: Probability of moves leading to a win (A) or loss (B), based on probability estimation by quantum annealing (cyan) or by exhaustive enumeration of possible games subject to the assumption of minimally strategic play (orange). Pale bars show individual corner or edge squares; solid bars represent the collective probablities for a corner, edge, or centre square. Note the quantum annealer yields P ′ (win|m), … view at source ↗

**Figure 2.** Figure 2: Selected games played by the quantum annealer. Top view at source ↗

read the original abstract

The challenge of programming classical computers to play traditional, competitive games against human players has helped to advance classical hardware and software. Quantum computers have the potential to play games in a unique way: programmed only with the rules of a game, they should be able to implicitly represent all future paths of a game leading to wins, losses, or draws, and sample from this path set to identify moves that maximize the likelihood of a win. This permits skilled play without hard-coded or machine-learned strategy. As a proof of principle, we present early results obtained after programming the D-Wave quantum annealer with the rules of tic-tac-toe, enabling it to play against a human opponent. We anticipate that, as it has for classical computers, game-playing will serve as an important real-world benchmark for quantum computers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper encodes tic-tac-toe rules into a D-Wave Hamiltonian for valid game sampling but the move selection reduces to random rollouts, not skilled play.

read the letter

The key takeaway is that this paper gets a D-Wave quantum annealer to play tic-tac-toe by encoding only the game rules into the Hamiltonian, but the resulting strategy is no stronger than random Monte Carlo rollouts. They do a good job describing the mapping from tic-tac-toe rules to the quadratic terms in the Ising Hamiltonian. Penalties for invalid cell states, turn order, and terminal conditions ensure that sampled configurations correspond to complete valid games. Presenting early hardware results for play against a human is a concrete step, and suggesting games as a benchmark for quantum hardware is a reasonable parallel to classical computing history. The soft spot is the gap between the claim and what the method actually achieves. The abstract argues that sampling from the set of all future paths allows moves that maximize win likelihood, enabling skilled play without strategy. In reality, a Hamiltonian with only validity constraints produces samples over all valid trajectories, which at finite temperature is close to uniform. Selecting the move that leads to the most wins in those samples is equivalent to averaging over random opponent play. This works against weak opponents but fails to produce optimal or even strong play, such as forcing a draw in tic-tac-toe against a perfect player. No adversarial modeling or value function is included, so the premise of skilled play from rules alone doesn't hold. There are also no reported metrics on how often the system wins, draw rates, or runtime comparisons in the provided abstract, making it difficult to assess practical performance. The full paper may include these, but the current evidence is thin. This work is for quantum hardware researchers interested in applying annealing to combinatorial problems like games, and for those thinking about benchmarks beyond standard optimization tasks. Readers looking for a new application domain will get some value from the encoding details. It deserves a serious referee because the idea is fresh and the implementation is specific enough to evaluate, even with the limitations in the optimality argument. I would recommend sending it for peer review to let the authors add the necessary comparisons and clarify the performance claims.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that quantum annealers can be programmed solely with the rules of a game (e.g., tic-tac-toe) to implicitly represent all valid future paths leading to wins, losses, or draws; sampling from this set then identifies moves that maximize win likelihood, enabling skilled play without hard-coded or learned strategy. Early results from encoding tic-tac-toe rules on a D-Wave device and playing against a human are presented as a proof-of-principle, with game-playing positioned as a future benchmark for quantum hardware.

Significance. If the method could produce better-than-random play from a purely rule-based Hamiltonian without additional strategy terms, it would offer a distinctive quantum approach to combinatorial games and a concrete hardware benchmark. However, the described encoding (validity penalties only) yields approximately uniform sampling over valid trajectories, which is equivalent to Monte Carlo rollouts under random continuation and does not achieve the claimed skilled play against optimal opponents.

major comments (2)

[Abstract] Abstract: the claim that the annealer 'sample[s] from this path set to identify moves that maximize the likelihood of a win' is not supported by the method. A Hamiltonian containing only cell-occupancy, turn-order, and terminal win/loss/draw penalties has low-energy states corresponding to all valid complete trajectories; annealing therefore samples (approximately uniformly at effective temperature) from the set of all legal games, so the move with highest empirical win frequency is exactly the move that wins most often against a random opponent. This is suboptimal in general (e.g., fails to force a draw in tic-tac-toe against perfect play) and contradicts the premise of achieving skilled play from rules alone.
[Abstract] Abstract: the 'early results' are described without any performance metrics, error analysis, success rates, embedding overhead, or comparison to classical solvers (minimax, MCTS, or random baseline). Without these data it is impossible to assess whether the quantum sampling functions as intended or provides any advantage over classical random-play simulation.

minor comments (1)

The manuscript should explicitly state the form of the Hamiltonian (e.g., which penalty terms are used for illegal moves or non-terminal states) and the sampling procedure used to extract the next move from the returned samples.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. We agree with the technical analysis of the encoding and will revise the manuscript to clarify the scope and limitations of the approach. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the annealer 'sample[s] from this path set to identify moves that maximize the likelihood of a win' is not supported by the method. A Hamiltonian containing only cell-occupancy, turn-order, and terminal win/loss/draw penalties has low-energy states corresponding to all valid complete trajectories; annealing therefore samples (approximately uniformly at effective temperature) from the set of all legal games, so the move with highest empirical win frequency is exactly the move that wins most often against a random opponent. This is suboptimal in general (e.g., fails to force a draw in tic-tac-toe against perfect play) and contradicts the premise of achieving skilled play from rules alone.

Authors: We agree that a Hamiltonian consisting solely of validity penalties and terminal conditions produces low-energy states corresponding to all valid trajectories and that annealing yields approximately uniform sampling over legal games. The resulting move selection therefore maximizes win frequency against a random opponent and does not achieve optimal play against a perfect opponent. This observation is correct and we will revise the abstract and main text to remove the phrasing 'skilled play' and 'maximize the likelihood of a win' in the sense of optimality, instead describing the method as providing a rule-based baseline equivalent to Monte Carlo sampling over random legal continuations. The positioning of game-playing as a future benchmark will be qualified accordingly. revision: yes
Referee: [Abstract] Abstract: the 'early results' are described without any performance metrics, error analysis, success rates, embedding overhead, or comparison to classical solvers (minimax, MCTS, or random baseline). Without these data it is impossible to assess whether the quantum sampling functions as intended or provides any advantage over classical random-play simulation.

Authors: We acknowledge that the abstract and results section do not provide the quantitative details requested. In the revised manuscript we will expand the description of the D-Wave experiments to include success rates against human play, direct comparison to a random baseline, embedding overhead statistics, and any available hardware error metrics. This will enable readers to evaluate the implementation and its relation to classical random-play simulation. revision: yes

Circularity Check

0 steps flagged

No circularity; direct rule encoding yields uniform path sampling by construction.

full rationale

The paper's core derivation encodes game rules (occupancy, turn order, terminal conditions) as penalty terms in the Hamiltonian; low-energy states are exactly the set of valid complete trajectories, and sampling followed by frequency-based move selection is a direct, non-fitted consequence of that encoding. No parameter is tuned to outcomes and then re-labeled as a prediction, no self-citation supplies a uniqueness theorem or ansatz, and the central claim does not reduce to a redefinition of its own inputs. The approach is therefore self-contained as a proof-of-principle; any shortfall in producing optimal play against skilled opponents is a question of correctness or expressivity, not circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that quantum annealing hardware can faithfully encode combinatorial game rules and sample winning configurations.

axioms (1)

domain assumption Quantum annealers can represent game trees via their Hamiltonian and sample from low-energy states corresponding to optimal play.
Invoked as the mechanism enabling rule-only programming to produce skilled play.

pith-pipeline@v0.9.0 · 5436 in / 1190 out tokens · 76729 ms · 2026-05-08T04:47:14.780044+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Weitzner, Andrew M

Vikram Khipple Mulligan, Hans Melo, Haley Irene Merritt, Stewart Slocum, Brian D. Weitzner, Andrew M. Watkins, P. Douglas Renfrew, Craig Pelissier, Paramjit S. Arora, and Richard Bonneau. Design- ing Peptides on a Quantum Computer, September 2019.bioRxiv, doi:10.1101/752485

work page doi:10.1101/752485 2019
[2]

Mulligan

Mohit Pandey, Tristan Zaborniak, Hans Melo, Alexey Galda, and Vikram K. Mulligan. Multibody molecular docking on a quantum annealer, October 2022.arXiv:2210.11401, doi:10.48550/arXiv.2210.11401

work page doi:10.48550/arxiv.2210.11401 2022
[3]

Radio by Gordon Sinclair.The Toronto Star, page 15, 7 September 1950

Gordon Sinclair. Radio by Gordon Sinclair.The Toronto Star, page 15, 7 September 1950

1950
[4]

Digital computers applied to games.Faster than thought, 1953

Alan Turing. Digital computers applied to games.Faster than thought, 1953

1953
[5]

Some studies in machine learning using the game of checkers.IBM Journal of research and development, 3(3):210–229, 1959

Arthur L Samuel. Some studies in machine learning using the game of checkers.IBM Journal of research and development, 3(3):210–229, 1959

1959
[6]

The day that I sensed a new kind of intelligence.Time, 147(13):55–55, 1996

Garry Kasparov. The day that I sensed a new kind of intelligence.Time, 147(13):55–55, 1996

1996
[7]

Mastering the game of go without human knowledge.nature, 550(7676):354–359, 2017

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge.nature, 550(7676):354–359, 2017

2017
[8]

Quantum games and quantum strategies.Physical Review Letters, 83(15):3077, 1999

Jens Eisert, Martin Wilkens, and Maciej Lewenstein. Quantum games and quantum strategies.Physical Review Letters, 83(15):3077, 1999

1999
[9]

A simple qubo formulation of sudoku

Sascha M ¨ucke. A simple qubo formulation of sudoku. InProceedings of the Genetic and Evolutionary Computation Conference Companion, pages 1958–1962, 2024

1958
[10]

An in-principle super-polynomial quantum advantage for approximating combinatorial optimization problems via computational learning theory.Science Advances, March 2024

Niklas Pirnay, Vincent Ulitzsch, Frederik Wilde, Jens Eisert, and Jean- Pierre Seifert. An in-principle super-polynomial quantum advantage for approximating combinatorial optimization problems via computational learning theory.Science Advances, March 2024

2024
[11]

Convergence and efficiency proof of quantum imaginary time evolution for bounded order systems, 2025

Tobias Hartung and Karl Jansen. Convergence and efficiency proof of quantum imaginary time evolution for bounded order systems, 2025

2025
[12]

Scaling advantage in approximate optimization with quantum annealing.Physical Review Letters, 134(16), April 2025

Humberto Munoz-Bauza and Daniel Lidar. Scaling advantage in approximate optimization with quantum annealing.Physical Review Letters, 134(16), April 2025

2025
[13]

Dreiling, John P

Ruslan Shaydulin, Changhao Li, Shouvanik Chakrabarti, Matthew De- Cross, Dylan Herman, Niraj Kumar, Jeffrey Larson, Danylo Lykov, Pierre Minssen, Yue Sun, Yuri Alexeev, Joan M. Dreiling, John P. Gaebler, Thomas M. Gatterman, Justin A. Gerber, Kevin Gilmore, Dan Gresh, Nathan Hewitt, Chandler V . Horst, Shaohan Hu, Jacob Johansen, Mitchell Matheny, Tanner ...

2024
[14]

255,168 ways of playing Tic Tac Toe – The Ludol- ogist (blog), https://www.jesperjuul.net/ludologist/2003/12/28/255168- ways-of-playing-tic-tac-toe/, December 2003

Jesper Juul. 255,168 ways of playing Tic Tac Toe – The Ludol- ogist (blog), https://www.jesperjuul.net/ludologist/2003/12/28/255168- ways-of-playing-tic-tac-toe/, December 2003

2003
[15]

A Quantum Approximate Optimization Algorithm

Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A Quantum Approximate Optimization Algorithm.arXiv:1411.4028 [quant-ph], November 2014. arXiv: 1411.4028

work page internal anchor Pith review arXiv 2014
[16]

Mario Motta, Chong Sun, Adrian T. K. Tan, Matthew J. O’Rourke, Erika Ye, Austin J. Minnich, Fernando G. S. L. Brand ˜ao, and Garnet Kin- Lic Chan. Determining eigenstates and thermal states on a quantum computer using quantum imaginary time evolution.Nature Physics, 16(2):205–210, February 2020

2020
[17]

Tic-tac-toe: A game theory analy- sis

Shivani Kalyan and Reetika Singh. Tic-tac-toe: A game theory analy- sis. InInternational Conference on Mathematical Modeling, Computa- tional Intelligence Techniques and Renewable Energy, pages 565–576. Springer, 2024

2024
[18]

G., and Roy, A

Jun Cai, William G. Macready, and Aidan Roy. A practical heuristic for finding graph minors, June 2014. arXiv:1406.2741 [quant-ph]. 7

work page arXiv 2014