pith. machine review for the scientific record. sign in

arxiv: 2604.25402 · v1 · submitted 2026-04-28 · 📊 stat.OT

Recognition: unknown

Sudoku Solving and Finding Magic Squares by Probability Models and Markov Chains

Nils Lid Hjort

Pith reviewed 2026-05-07 13:54 UTC · model grok-4.3

classification 📊 stat.OT
keywords SudokuMarkov chainsprobability modelsmagic squaressamplingpuzzle solvingmatrix constraints
0
0 comments X

The pith

A probability model on 9x9 matrices that favors good attempts, sampled by a Markov chain, finds Sudoku solutions and generates magic squares.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to solve Sudoku by defining a probability distribution over all possible 9 by 9 grids that assigns higher weight to grids closer to meeting the rules. A Markov chain then generates successive samples from this distribution until a fully valid solution appears. The author uses the same construction for magic squares, where the model favors grids with matching row, column, and diagonal sums, and produces working 8 by 8 and 10 by 10 examples. This converts the search for a solution into repeated draws from a tailored probability model. A reader would care because it offers a probabilistic alternative to exhaustive or backtracking search for puzzles defined by constraints.

Core claim

The sudoku puzzles can be solved by putting up a probability model on the enormous space of 9×9 matrix possibilities, constructed to favour good attempts, and then engineering a Markov chain to sample a long enough chain of sudoku table realisations from that model until the solution is found. The methods work also for other types of puzzles, like constructing magic squares with wished-for properties, as is also illustrated; via magic models and equally magic Markov chains impressively magic 8×8 and 10×10 squares are found.

What carries the argument

A probability model on the space of 9x9 matrices designed to favor good attempts, sampled via Markov chain until a valid configuration is reached.

If this is right

  • Sudoku solutions appear after sufficient samples from the chain.
  • Magic squares of different sizes with equal row, column, and diagonal sums can be generated by the same sampling process.
  • The approach applies to other puzzles that impose similar numerical constraints on matrices.
  • Solutions are located through probabilistic sampling instead of systematic enumeration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method might scale to larger Sudoku variants or other constraint problems if the weighting scheme is adjusted.
  • Different choices for how to favor good attempts could shorten the time until a solution is sampled.
  • This sampling view could connect to other statistical techniques for exploring discrete spaces with hard constraints.

Load-bearing premise

That a probability model can be built so valid solutions receive high enough weight and the Markov chain reaches them in feasible time without staying trapped in invalid states.

What would settle it

Running the Markov chain for millions of steps on a standard Sudoku puzzle and never obtaining a valid solution would show the model or chain does not work as claimed.

Figures

Figures reproduced from arXiv: 2604.25402 by Nils Lid Hjort.

Figure 1
Figure 1. Figure 1: Scary but fascinating stuff. Sudoku: the problem Here’s an example ( view at source ↗
Figure 2
Figure 2. Figure 2: My first-ever sudoku puzzle. 1 A probabilistic-statistical take I’ve never solved a sudoku puzzle† before, and to me it looks dauntingly complex (though presumably not to millions of seasoned sudoku puzzlers, and I’m told the one I grabbed here is not among the hard ones). There are 4, 5, 8 missing numbers for the first three blocks, those of the first row, which means there are 4! · 5! · 8! = 24 · 120 · 4… view at source ↗
Figure 3
Figure 3. Figure 3: Long sudoku Markov chain to see when Q(x) hits zero, after which I can read off the solution x ∗ . In the figure I have discarded the first 1000 of the half a million random iterations. 3 Finding magic squares Benjamin Franklin (1706–1790), statesman, inventor, scientist, inventor, philosopher, economist, printer, and musician (he played the guitar, the harp, the viola da gamba, and for good measure invent… view at source ↗
Figure 4
Figure 4. Figure 4: Long magical square Markov chain to see when view at source ↗
Figure 5
Figure 5. Figure 5: The sudoku Markov chain will eventually find the tallest peak. My Himalayan metaphor is misleading view at source ↗
read the original abstract

The sudoku puzzles have a long history, with variations going back more than a hundred years, but its current and perhaps surprising world-wide prominence goes back to certain initiatives and then puzzle-generating computer programmes from just after 2000. To solve a sudoko puzzle, a statistician can put up a probabilitymodel on the enormous space of $9\times9$ matrix possibilities, constructed to favour `good attempts', and then engineer a Markov chain to sample a long enough chain of sudoku table realisations from that model, until the solution is found. The methods work also for other types of puzzles, like constructing `magic squares' with wished-for properties (sums of rows, columns, diagonals equal, etc.), as is also illustrated in this article; via magic models and equally magic Markov chains I find impressively magic $8\times8$ and $10\times10$ squares.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes solving Sudoku puzzles by constructing a probability model on the space of 9x9 grids that favors 'good attempts' (partially valid fillings) and then running a Markov chain to sample from this model until a valid solution is reached. The same probability-model-plus-Markov-chain framework is claimed to generate impressively magic 8x8 and 10x10 squares whose rows, columns, and diagonals sum to the same constant.

Significance. If the models were explicitly defined with verifiable stationary distributions supported only on valid solutions and if the chains were shown to mix in feasible time, the work could illustrate a probabilistic approach to constraint-satisfaction problems and supply accessible examples for MCMC teaching. No such constructions, mixing-time bounds, success rates, or comparisons with standard solvers appear in the manuscript, so the practical or theoretical contribution cannot be assessed.

major comments (2)
  1. No functional form, energy function, or explicit probability measure on the 9x9 (or 8x8/10x10) grids is supplied anywhere in the text. Consequently the central claim that the stationary distribution concentrates on valid Sudoku solutions (or perfect magic squares) remains an unverified assertion rather than a demonstrated construction.
  2. No transition kernel, acceptance probabilities, or irreducibility argument is given for the Markov chain. Without these, it is impossible to verify that the chain is irreducible on the valid set or that its mixing time is short enough for practical use, which is load-bearing for the claim that solutions can be found by sampling 'a long enough chain'.
minor comments (2)
  1. The abstract contains the typo 'sudoko' (should be 'Sudoku').
  2. The manuscript would be strengthened by at least one small-scale concrete illustration (e.g., a 4x4 Sudoku or 3x3 magic square) together with the explicit model and a few transition steps.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and precise comments, which highlight important gaps in the mathematical detail of our presentation. We agree that the manuscript would benefit from explicit constructions and will revise accordingly to strengthen the contribution.

read point-by-point responses
  1. Referee: No functional form, energy function, or explicit probability measure on the 9x9 (or 8x8/10x10) grids is supplied anywhere in the text. Consequently the central claim that the stationary distribution concentrates on valid Sudoku solutions (or perfect magic squares) remains an unverified assertion rather than a demonstrated construction.

    Authors: We acknowledge that the current manuscript describes the probability model conceptually without supplying an explicit functional form or energy function. In the revised version we will define the probability measure on the space of 9x9 grids (and analogously for 8x8 and 10x10 magic squares) by specifying the exact weighting that assigns higher probability to partially valid fillings, and we will prove that the resulting stationary distribution is supported exclusively on the valid solutions. revision: yes

  2. Referee: No transition kernel, acceptance probabilities, or irreducibility argument is given for the Markov chain. Without these, it is impossible to verify that the chain is irreducible on the valid set or that its mixing time is short enough for practical use, which is load-bearing for the claim that solutions can be found by sampling 'a long enough chain'.

    Authors: The manuscript presents the Markov-chain approach at a descriptive level but omits the explicit transition kernel and acceptance probabilities. We will add the precise transition probabilities in the revision, include a proof of irreducibility on the set of valid grids, and provide either theoretical mixing-time bounds or empirical convergence diagnostics to substantiate that sampling a sufficiently long chain yields valid solutions in practice. revision: yes

Circularity Check

0 steps flagged

No circularity: high-level description only, no equations or derivations to inspect

full rationale

The manuscript presents the core idea at the conceptual level only: a probability model is 'put up' on the space of 9x9 matrices 'constructed to favour good attempts' and a Markov chain is 'engineered' to sample until a solution appears. No functional form, energy function, transition kernel, stationary-distribution proof, or mixing-time argument is supplied in the abstract or described text. Consequently there are no equations, fitted parameters, self-citations, or ansatzes whose load-bearing steps can be checked for reduction to their own inputs. The derivation chain is empty by construction, so the circularity score is zero.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger cannot be populated with concrete free parameters, axioms, or invented entities from the manuscript. The central claim implicitly rests on the existence of a suitable probability model and rapid mixing of the chain, both of which are unstated in detail.

pith-pipeline@v0.9.0 · 5442 in / 1206 out tokens · 39226 ms · 2026-05-07T13:54:59.774944+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references

  1. [1]

    and Hjort, N.L

    Claeskens, G. and Hjort, N.L. (2008).Model Selection and Model Averaging.Cambridge University Press. Dickter, R. (2014).Number Time Archetype: The Significance of Magic Squares in China and the West. Grass Valley, Grass Valley. Edwards, A.W.F. (2025).The Latin Square: Essays in Defence of R.A. Fisher.Cam Rivers Publishing, Cambridge, UK. Franklin, B. (179...

  2. [2]

    Geißler, A. (1878). ¨Uber die Phantasmen w¨ ahrend des Einschlafens.Philosophische Studien, vol. 1, 83–93. Hjort, N.L. (2019a). Your Mother is Alive with Probability One Half.FocuStat Blog Post. Hjort, N.L. (2019b). The Magic Square of 33.FocuStat Blog Post. Hjort, N.L. and Varin, C. (2008). ML, PL, QL in Markov chain models.Scandinavian Journal of Statis...