Recognition: unknown
Sudoku Solving and Finding Magic Squares by Probability Models and Markov Chains
Pith reviewed 2026-05-07 13:54 UTC · model grok-4.3
The pith
A probability model on 9x9 matrices that favors good attempts, sampled by a Markov chain, finds Sudoku solutions and generates magic squares.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The sudoku puzzles can be solved by putting up a probability model on the enormous space of 9×9 matrix possibilities, constructed to favour good attempts, and then engineering a Markov chain to sample a long enough chain of sudoku table realisations from that model until the solution is found. The methods work also for other types of puzzles, like constructing magic squares with wished-for properties, as is also illustrated; via magic models and equally magic Markov chains impressively magic 8×8 and 10×10 squares are found.
What carries the argument
A probability model on the space of 9x9 matrices designed to favor good attempts, sampled via Markov chain until a valid configuration is reached.
If this is right
- Sudoku solutions appear after sufficient samples from the chain.
- Magic squares of different sizes with equal row, column, and diagonal sums can be generated by the same sampling process.
- The approach applies to other puzzles that impose similar numerical constraints on matrices.
- Solutions are located through probabilistic sampling instead of systematic enumeration.
Where Pith is reading between the lines
- The method might scale to larger Sudoku variants or other constraint problems if the weighting scheme is adjusted.
- Different choices for how to favor good attempts could shorten the time until a solution is sampled.
- This sampling view could connect to other statistical techniques for exploring discrete spaces with hard constraints.
Load-bearing premise
That a probability model can be built so valid solutions receive high enough weight and the Markov chain reaches them in feasible time without staying trapped in invalid states.
What would settle it
Running the Markov chain for millions of steps on a standard Sudoku puzzle and never obtaining a valid solution would show the model or chain does not work as claimed.
Figures
read the original abstract
The sudoku puzzles have a long history, with variations going back more than a hundred years, but its current and perhaps surprising world-wide prominence goes back to certain initiatives and then puzzle-generating computer programmes from just after 2000. To solve a sudoko puzzle, a statistician can put up a probabilitymodel on the enormous space of $9\times9$ matrix possibilities, constructed to favour `good attempts', and then engineer a Markov chain to sample a long enough chain of sudoku table realisations from that model, until the solution is found. The methods work also for other types of puzzles, like constructing `magic squares' with wished-for properties (sums of rows, columns, diagonals equal, etc.), as is also illustrated in this article; via magic models and equally magic Markov chains I find impressively magic $8\times8$ and $10\times10$ squares.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes solving Sudoku puzzles by constructing a probability model on the space of 9x9 grids that favors 'good attempts' (partially valid fillings) and then running a Markov chain to sample from this model until a valid solution is reached. The same probability-model-plus-Markov-chain framework is claimed to generate impressively magic 8x8 and 10x10 squares whose rows, columns, and diagonals sum to the same constant.
Significance. If the models were explicitly defined with verifiable stationary distributions supported only on valid solutions and if the chains were shown to mix in feasible time, the work could illustrate a probabilistic approach to constraint-satisfaction problems and supply accessible examples for MCMC teaching. No such constructions, mixing-time bounds, success rates, or comparisons with standard solvers appear in the manuscript, so the practical or theoretical contribution cannot be assessed.
major comments (2)
- No functional form, energy function, or explicit probability measure on the 9x9 (or 8x8/10x10) grids is supplied anywhere in the text. Consequently the central claim that the stationary distribution concentrates on valid Sudoku solutions (or perfect magic squares) remains an unverified assertion rather than a demonstrated construction.
- No transition kernel, acceptance probabilities, or irreducibility argument is given for the Markov chain. Without these, it is impossible to verify that the chain is irreducible on the valid set or that its mixing time is short enough for practical use, which is load-bearing for the claim that solutions can be found by sampling 'a long enough chain'.
minor comments (2)
- The abstract contains the typo 'sudoko' (should be 'Sudoku').
- The manuscript would be strengthened by at least one small-scale concrete illustration (e.g., a 4x4 Sudoku or 3x3 magic square) together with the explicit model and a few transition steps.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and precise comments, which highlight important gaps in the mathematical detail of our presentation. We agree that the manuscript would benefit from explicit constructions and will revise accordingly to strengthen the contribution.
read point-by-point responses
-
Referee: No functional form, energy function, or explicit probability measure on the 9x9 (or 8x8/10x10) grids is supplied anywhere in the text. Consequently the central claim that the stationary distribution concentrates on valid Sudoku solutions (or perfect magic squares) remains an unverified assertion rather than a demonstrated construction.
Authors: We acknowledge that the current manuscript describes the probability model conceptually without supplying an explicit functional form or energy function. In the revised version we will define the probability measure on the space of 9x9 grids (and analogously for 8x8 and 10x10 magic squares) by specifying the exact weighting that assigns higher probability to partially valid fillings, and we will prove that the resulting stationary distribution is supported exclusively on the valid solutions. revision: yes
-
Referee: No transition kernel, acceptance probabilities, or irreducibility argument is given for the Markov chain. Without these, it is impossible to verify that the chain is irreducible on the valid set or that its mixing time is short enough for practical use, which is load-bearing for the claim that solutions can be found by sampling 'a long enough chain'.
Authors: The manuscript presents the Markov-chain approach at a descriptive level but omits the explicit transition kernel and acceptance probabilities. We will add the precise transition probabilities in the revision, include a proof of irreducibility on the set of valid grids, and provide either theoretical mixing-time bounds or empirical convergence diagnostics to substantiate that sampling a sufficiently long chain yields valid solutions in practice. revision: yes
Circularity Check
No circularity: high-level description only, no equations or derivations to inspect
full rationale
The manuscript presents the core idea at the conceptual level only: a probability model is 'put up' on the space of 9x9 matrices 'constructed to favour good attempts' and a Markov chain is 'engineered' to sample until a solution appears. No functional form, energy function, transition kernel, stationary-distribution proof, or mixing-time argument is supplied in the abstract or described text. Consequently there are no equations, fitted parameters, self-citations, or ansatzes whose load-bearing steps can be checked for reduction to their own inputs. The derivation chain is empty by construction, so the circularity score is zero.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
and Hjort, N.L
Claeskens, G. and Hjort, N.L. (2008).Model Selection and Model Averaging.Cambridge University Press. Dickter, R. (2014).Number Time Archetype: The Significance of Magic Squares in China and the West. Grass Valley, Grass Valley. Edwards, A.W.F. (2025).The Latin Square: Essays in Defence of R.A. Fisher.Cam Rivers Publishing, Cambridge, UK. Franklin, B. (179...
2008
-
[2]
Geißler, A. (1878). ¨Uber die Phantasmen w¨ ahrend des Einschlafens.Philosophische Studien, vol. 1, 83–93. Hjort, N.L. (2019a). Your Mother is Alive with Probability One Half.FocuStat Blog Post. Hjort, N.L. (2019b). The Magic Square of 33.FocuStat Blog Post. Hjort, N.L. and Varin, C. (2008). ML, PL, QL in Markov chain models.Scandinavian Journal of Statis...
2008
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.