Recognition: unknown
Playing Dice with the Universe: Programming Quantum Computers to Play Traditional Games
Pith reviewed 2026-05-08 04:47 UTC · model grok-4.3
The pith
A quantum annealer programmed only with tic-tac-toe rules can represent all game paths and sample moves that raise the odds of winning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Programmed only with the rules of a game, quantum computers should be able to implicitly represent all future paths of a game leading to wins, losses, or draws, and sample from this path set to identify moves that maximize the likelihood of a win. Early results after programming the D-Wave quantum annealer with the rules of tic-tac-toe enable it to play against a human opponent.
What carries the argument
The Hamiltonian that encodes game rules as an energy minimization problem whose low-energy configurations correspond to sequences of legal and winning moves.
If this is right
- Skilled moves emerge directly from sampling the encoded rule set without any hard-coded strategy.
- Game performance improves as the quantum sampler better distinguishes low-energy winning paths.
- The same encoding method can be applied to other finite, perfect-information games.
- Quantum hardware development gains a concrete, measurable application benchmark.
Where Pith is reading between the lines
- For games larger than tic-tac-toe the method may still require hybrid classical post-processing to handle embedding and noise.
- If sampling quality scales with qubit count, the approach could supply move suggestions in real time where classical exhaustive search becomes intractable.
- The technique offers a route to quantum game solvers that remain rule-complete even when the full game tree exceeds classical memory.
Load-bearing premise
Mapping the game rules into the annealer's Hamiltonian produces an energy landscape in which optimal moves are the ones most readily sampled by the hardware.
What would settle it
If the programmed device repeatedly selects losing moves or fails to block an opponent's win in tic-tac-toe despite correct rule encoding, the sampling would not be identifying high-win-probability paths.
Figures
read the original abstract
The challenge of programming classical computers to play traditional, competitive games against human players has helped to advance classical hardware and software. Quantum computers have the potential to play games in a unique way: programmed only with the rules of a game, they should be able to implicitly represent all future paths of a game leading to wins, losses, or draws, and sample from this path set to identify moves that maximize the likelihood of a win. This permits skilled play without hard-coded or machine-learned strategy. As a proof of principle, we present early results obtained after programming the D-Wave quantum annealer with the rules of tic-tac-toe, enabling it to play against a human opponent. We anticipate that, as it has for classical computers, game-playing will serve as an important real-world benchmark for quantum computers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that quantum annealers can be programmed solely with the rules of a game (e.g., tic-tac-toe) to implicitly represent all valid future paths leading to wins, losses, or draws; sampling from this set then identifies moves that maximize win likelihood, enabling skilled play without hard-coded or learned strategy. Early results from encoding tic-tac-toe rules on a D-Wave device and playing against a human are presented as a proof-of-principle, with game-playing positioned as a future benchmark for quantum hardware.
Significance. If the method could produce better-than-random play from a purely rule-based Hamiltonian without additional strategy terms, it would offer a distinctive quantum approach to combinatorial games and a concrete hardware benchmark. However, the described encoding (validity penalties only) yields approximately uniform sampling over valid trajectories, which is equivalent to Monte Carlo rollouts under random continuation and does not achieve the claimed skilled play against optimal opponents.
major comments (2)
- [Abstract] Abstract: the claim that the annealer 'sample[s] from this path set to identify moves that maximize the likelihood of a win' is not supported by the method. A Hamiltonian containing only cell-occupancy, turn-order, and terminal win/loss/draw penalties has low-energy states corresponding to all valid complete trajectories; annealing therefore samples (approximately uniformly at effective temperature) from the set of all legal games, so the move with highest empirical win frequency is exactly the move that wins most often against a random opponent. This is suboptimal in general (e.g., fails to force a draw in tic-tac-toe against perfect play) and contradicts the premise of achieving skilled play from rules alone.
- [Abstract] Abstract: the 'early results' are described without any performance metrics, error analysis, success rates, embedding overhead, or comparison to classical solvers (minimax, MCTS, or random baseline). Without these data it is impossible to assess whether the quantum sampling functions as intended or provides any advantage over classical random-play simulation.
minor comments (1)
- The manuscript should explicitly state the form of the Hamiltonian (e.g., which penalty terms are used for illegal moves or non-terminal states) and the sampling procedure used to extract the next move from the returned samples.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We agree with the technical analysis of the encoding and will revise the manuscript to clarify the scope and limitations of the approach. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the annealer 'sample[s] from this path set to identify moves that maximize the likelihood of a win' is not supported by the method. A Hamiltonian containing only cell-occupancy, turn-order, and terminal win/loss/draw penalties has low-energy states corresponding to all valid complete trajectories; annealing therefore samples (approximately uniformly at effective temperature) from the set of all legal games, so the move with highest empirical win frequency is exactly the move that wins most often against a random opponent. This is suboptimal in general (e.g., fails to force a draw in tic-tac-toe against perfect play) and contradicts the premise of achieving skilled play from rules alone.
Authors: We agree that a Hamiltonian consisting solely of validity penalties and terminal conditions produces low-energy states corresponding to all valid trajectories and that annealing yields approximately uniform sampling over legal games. The resulting move selection therefore maximizes win frequency against a random opponent and does not achieve optimal play against a perfect opponent. This observation is correct and we will revise the abstract and main text to remove the phrasing 'skilled play' and 'maximize the likelihood of a win' in the sense of optimality, instead describing the method as providing a rule-based baseline equivalent to Monte Carlo sampling over random legal continuations. The positioning of game-playing as a future benchmark will be qualified accordingly. revision: yes
-
Referee: [Abstract] Abstract: the 'early results' are described without any performance metrics, error analysis, success rates, embedding overhead, or comparison to classical solvers (minimax, MCTS, or random baseline). Without these data it is impossible to assess whether the quantum sampling functions as intended or provides any advantage over classical random-play simulation.
Authors: We acknowledge that the abstract and results section do not provide the quantitative details requested. In the revised manuscript we will expand the description of the D-Wave experiments to include success rates against human play, direct comparison to a random baseline, embedding overhead statistics, and any available hardware error metrics. This will enable readers to evaluate the implementation and its relation to classical random-play simulation. revision: yes
Circularity Check
No circularity; direct rule encoding yields uniform path sampling by construction.
full rationale
The paper's core derivation encodes game rules (occupancy, turn order, terminal conditions) as penalty terms in the Hamiltonian; low-energy states are exactly the set of valid complete trajectories, and sampling followed by frequency-based move selection is a direct, non-fitted consequence of that encoding. No parameter is tuned to outcomes and then re-labeled as a prediction, no self-citation supplies a uniqueness theorem or ansatz, and the central claim does not reduce to a redefinition of its own inputs. The approach is therefore self-contained as a proof-of-principle; any shortfall in producing optimal play against skilled opponents is a question of correctness or expressivity, not circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Quantum annealers can represent game trees via their Hamiltonian and sample from low-energy states corresponding to optimal play.
Reference graph
Works this paper leans on
-
[1]
Vikram Khipple Mulligan, Hans Melo, Haley Irene Merritt, Stewart Slocum, Brian D. Weitzner, Andrew M. Watkins, P. Douglas Renfrew, Craig Pelissier, Paramjit S. Arora, and Richard Bonneau. Design- ing Peptides on a Quantum Computer, September 2019.bioRxiv, doi:10.1101/752485
-
[2]
Mohit Pandey, Tristan Zaborniak, Hans Melo, Alexey Galda, and Vikram K. Mulligan. Multibody molecular docking on a quantum annealer, October 2022.arXiv:2210.11401, doi:10.48550/arXiv.2210.11401
-
[3]
Radio by Gordon Sinclair.The Toronto Star, page 15, 7 September 1950
Gordon Sinclair. Radio by Gordon Sinclair.The Toronto Star, page 15, 7 September 1950
1950
-
[4]
Digital computers applied to games.Faster than thought, 1953
Alan Turing. Digital computers applied to games.Faster than thought, 1953
1953
-
[5]
Some studies in machine learning using the game of checkers.IBM Journal of research and development, 3(3):210–229, 1959
Arthur L Samuel. Some studies in machine learning using the game of checkers.IBM Journal of research and development, 3(3):210–229, 1959
1959
-
[6]
The day that I sensed a new kind of intelligence.Time, 147(13):55–55, 1996
Garry Kasparov. The day that I sensed a new kind of intelligence.Time, 147(13):55–55, 1996
1996
-
[7]
Mastering the game of go without human knowledge.nature, 550(7676):354–359, 2017
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge.nature, 550(7676):354–359, 2017
2017
-
[8]
Quantum games and quantum strategies.Physical Review Letters, 83(15):3077, 1999
Jens Eisert, Martin Wilkens, and Maciej Lewenstein. Quantum games and quantum strategies.Physical Review Letters, 83(15):3077, 1999
1999
-
[9]
A simple qubo formulation of sudoku
Sascha M ¨ucke. A simple qubo formulation of sudoku. InProceedings of the Genetic and Evolutionary Computation Conference Companion, pages 1958–1962, 2024
1958
-
[10]
An in-principle super-polynomial quantum advantage for approximating combinatorial optimization problems via computational learning theory.Science Advances, March 2024
Niklas Pirnay, Vincent Ulitzsch, Frederik Wilde, Jens Eisert, and Jean- Pierre Seifert. An in-principle super-polynomial quantum advantage for approximating combinatorial optimization problems via computational learning theory.Science Advances, March 2024
2024
-
[11]
Convergence and efficiency proof of quantum imaginary time evolution for bounded order systems, 2025
Tobias Hartung and Karl Jansen. Convergence and efficiency proof of quantum imaginary time evolution for bounded order systems, 2025
2025
-
[12]
Scaling advantage in approximate optimization with quantum annealing.Physical Review Letters, 134(16), April 2025
Humberto Munoz-Bauza and Daniel Lidar. Scaling advantage in approximate optimization with quantum annealing.Physical Review Letters, 134(16), April 2025
2025
-
[13]
Dreiling, John P
Ruslan Shaydulin, Changhao Li, Shouvanik Chakrabarti, Matthew De- Cross, Dylan Herman, Niraj Kumar, Jeffrey Larson, Danylo Lykov, Pierre Minssen, Yue Sun, Yuri Alexeev, Joan M. Dreiling, John P. Gaebler, Thomas M. Gatterman, Justin A. Gerber, Kevin Gilmore, Dan Gresh, Nathan Hewitt, Chandler V . Horst, Shaohan Hu, Jacob Johansen, Mitchell Matheny, Tanner ...
2024
-
[14]
255,168 ways of playing Tic Tac Toe – The Ludol- ogist (blog), https://www.jesperjuul.net/ludologist/2003/12/28/255168- ways-of-playing-tic-tac-toe/, December 2003
Jesper Juul. 255,168 ways of playing Tic Tac Toe – The Ludol- ogist (blog), https://www.jesperjuul.net/ludologist/2003/12/28/255168- ways-of-playing-tic-tac-toe/, December 2003
2003
-
[15]
A Quantum Approximate Optimization Algorithm
Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A Quantum Approximate Optimization Algorithm.arXiv:1411.4028 [quant-ph], November 2014. arXiv: 1411.4028
work page internal anchor Pith review arXiv 2014
-
[16]
Mario Motta, Chong Sun, Adrian T. K. Tan, Matthew J. O’Rourke, Erika Ye, Austin J. Minnich, Fernando G. S. L. Brand ˜ao, and Garnet Kin- Lic Chan. Determining eigenstates and thermal states on a quantum computer using quantum imaginary time evolution.Nature Physics, 16(2):205–210, February 2020
2020
-
[17]
Tic-tac-toe: A game theory analy- sis
Shivani Kalyan and Reetika Singh. Tic-tac-toe: A game theory analy- sis. InInternational Conference on Mathematical Modeling, Computa- tional Intelligence Techniques and Renewable Energy, pages 565–576. Springer, 2024
2024
-
[18]
Jun Cai, William G. Macready, and Aidan Roy. A practical heuristic for finding graph minors, June 2014. arXiv:1406.2741 [quant-ph]. 7
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.