pith. sign in

arxiv: 2606.07516 · v2 · pith:RUOQNAF4new · submitted 2026-06-05 · 🧮 math.PR

Counterintuitive problems in discrete probability

Pith reviewed 2026-06-27 21:01 UTC · model grok-4.3

classification 🧮 math.PR
keywords discrete probabilitycounterintuitive problemscognitive biaseslarge language modelsprobabilistic paradoxesreasoning evaluationdataset
0
0 comments X

The pith

A collection of counterintuitive discrete probability problems with human solutions is released as a public dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles a dataset of discrete probability problems chosen because they reliably produce answers that feel right but are wrong. Some problems come from well-known paradoxes and bias studies, others from recreational sources, and a few were created for this work. Each problem includes a detailed solution written by humans. The explicit goal is to supply a transparent reference set that can be used to test whether large language models repeat the same kinds of errors that humans make under heuristic reasoning.

Core claim

We have gathered and solved a set of discrete probability problems that are constructed to expose the gap between intuitive answers and correct probability calculations, making the full list and the accompanying solutions available for direct use in experiments on reasoning.

What carries the argument

The dataset itself, consisting of selected problems that each target a specific heuristic error in probability reasoning.

If this is right

  • The problems supply a ready-made benchmark for measuring how often language models produce the same incorrect answers that humans reach through heuristics.
  • Researchers can now run controlled comparisons between human performance and model performance on the same fixed list of items.
  • The public release allows other groups to extend the collection or to add new variants while keeping the original solutions as a fixed reference.
  • The problems can be used directly in teaching or in experiments that study the persistence of specific probability misconceptions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same selection criteria could be applied to continuous probability or to problems involving conditional independence to test whether the same pattern of model errors appears.
  • Systematic logging of which problems cause the largest divergence between model and human answers might reveal clusters of related biases that current training data do not correct.
  • Because the solutions are written out in full, the collection could serve as training material for supervised fine-tuning aimed at reducing specific probability errors.

Load-bearing premise

The selected problems do trigger the intended errors and the written solutions contain no hidden mistakes.

What would settle it

Discovery of a mathematically incorrect solution in the provided answers would show that the reference set cannot be trusted as a benchmark.

read the original abstract

This manuscript contains a collection of counterintuitive problems in discrete probability, together with detailed solutions. The dataset was constructed as part of a broader research project investigating the capabilities of the latest-generation Large Language Models (LLMs) in solving discrete probability problems, in order to assess whether LLMs tend to make systematic reasoning errors associated with known cognitive biases. The problems collected here are specifically designed to challenge heuristic reasoning strategies that often lead to intuitively appealing but mathematically incorrect conclusions. The dataset combines several types of problems. Some are adapted from classical probabilistic paradoxes and cognitive-bias literature, while others originate from recreational mathematics sources or were developed by ourselves following similar principles. The primary purpose of this document is to provide a transparent and publicly accessible reference for the problems used in our experimental evaluation of language models, as well as providing detailed human-made solutions. At the same time, we believe that this collection may also prove useful for future research on probabilistic reasoning, cognitive biases, and the evaluation of reasoning capabilities in artificial intelligence systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript presents a curated collection of counterintuitive discrete probability problems accompanied by detailed human-made solutions. It is positioned as a transparent reference dataset for evaluating large language models on tasks designed to expose heuristic reasoning errors and cognitive biases, drawing from classical paradoxes, cognitive-bias literature, recreational mathematics, and original constructions.

Significance. If the supplied solutions are accurate, the collection provides a reusable benchmark resource for research on probabilistic reasoning in AI systems and for studies of cognitive biases. The explicit transparency in documenting the problem set and human solutions supports reproducibility in LLM evaluation experiments.

minor comments (1)
  1. [Abstract] Abstract: the description of the dataset construction mentions adaptation from multiple sources but does not state the total number of problems or their distribution across categories, which would help readers gauge the collection's scope and balance.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript, recognition of its potential utility as a benchmark resource, and recommendation for minor revision. The report contains no major comments to address.

Circularity Check

0 steps flagged

No circularity; descriptive problem collection with no derivations

full rationale

The manuscript presents a curated list of counterintuitive discrete probability problems together with human solutions. It asserts no mathematical identities, scaling relations, predictions, or fitted parameters whose validity depends on unverified self-referential steps. No equations, uniqueness theorems, or ansatzes are introduced; the text is explicitly positioned as a transparent reference dataset for LLM evaluation rather than a vehicle for novel derivations. Self-citations are absent from the provided content, and the construction description (adapting classical paradoxes or developing new problems) is purely declarative with no load-bearing reduction to prior outputs. This is the normal case of a self-contained reference document.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or derivation is presented; the work is a curated list of existing problems.

pith-pipeline@v0.9.1-grok · 5696 in / 909 out tokens · 15770 ms · 2026-06-27T21:01:00.975738+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. How reliable are LLMs when it comes to playing dice?

    cs.CL 2026-06 unverdicted novelty 5.0

    LLMs score 0.96 on standard probability exercises but 0.59 on counterintuitive ones and drop further with biased wording or misleading cues, indicating they are not genuine probabilistic reasoners.

Reference graph

Works this paper leans on

12 extracted references · 4 canonical work pages · cited by 1 Pith paper

  1. [1]

    How reliable are LLMs when it comes to playing dice? 2026

    Luca Avena, Gianmarco Bet and Bernardo Busoni. How reliable are LLMs when it comes to playing dice? 2026. arXiv: 2606.07515 [cs.CL]. URL: https://arxiv.org/abs/2606.07515

  2. [2]

    Aha! Gotcha: Paradoxes to Puzzle and Delight

    Martin Gardner. Aha! Gotcha: Paradoxes to Puzzle and Delight. W. H. Freeman, 1982, p. 164. ISBN : 978-0- 7167-1361-6

  3. [3]

    Time Travel and Other Mathematical Bewilderments

    Martin Gardner. Time Travel and Other Mathematical Bewilderments. New York: W. H. Freeman, 1988, p. 295. ISBN : 978-0-7167-1925-0

  4. [4]

    Grimmett

    Geoffrey R. Grimmett. Alice and Bob on X: reversal, coupling, renewal. 2025. arXiv: 2409.00732 [math.PR]. URL: https://arxiv.org/abs/2409.00732

  5. [5]

    Absent-minded passengers

    Norbert Henze and Günter Last. Absent-minded passengers. 2018. arXiv: 1809 . 10192 [math.PR]. URL: https://arxiv.org/abs/1809.10192

  6. [6]

    Various probability puzzles posted on Daniel Litt’s X profile @littmath

    Daniel Litt. Various probability puzzles posted on Daniel Litt’s X profile @littmath . 2024. URL: https : //x.com/littmath

  7. [7]

    Tuesday Boy

    Oliver Hawkins. Tuesday Boy. BBC News. Accessed: 2026-06-04. 2010. URL: http://news.bbc.co.uk/2/ hi/programmes/more_or_less/8735812.stm

  8. [8]

    Christopher M. Rump. ‘Strategies for Rolling the Efron Dice’. In:Mathematics Magazine 74.3 (2001), pp. 212–

  9. [9]

    URL: https://www.jstor.org/stable/2690722

    DOI: 10.1080/0025570X.2001.11953065. URL: https://www.jstor.org/stable/2690722

  10. [10]

    ‘A Problem in Probability’

    Steve Selvin. ‘A Problem in Probability’. In:The American Statistician 29.1 (Feb. 1975). Letter to the editor, p. 67. DOI: 10.1080/00031305.1975.10479121 . URL: https://www.tandfonline.com/doi/abs/10. 1080/00031305.1975.10479121

  11. [11]

    E. H. Simpson. ‘The Interpretation of Interaction in Contingency Tables’. In:Journal of the Royal Statistical Society: Series B (Methodological) 13.2 (July 1951), pp. 238–241. ISSN : 0035-9246. DOI: 10.1111/j.2517- 6161 . 1951 . tb00088 . x. eprint: https : / / academic . oup . com / jrsssb / article - pdf / 13 / 2 / 238 / 49093972/jrsssb_13_2_238.pdf. UR...

  12. [12]

    ‘Judgment under Uncertainty: Heuristics and Biases’

    Amos Tversky and Daniel Kahneman. ‘Judgment under Uncertainty: Heuristics and Biases’. In:Science 185.4157 (Sept. 1974), pp. 1124–1131. DOI: 10.1126/science.185.4157.1124. URL: https://www.science.org/ doi/10.1126/science.185.4157.1124. 17