pith. sign in

arxiv: 2605.18227 · v1 · pith:NEKHX2LDnew · submitted 2026-05-18 · 💻 cs.DC

ASSESSING THE STOCHASTIC PROPERTIES OF MODERN PSEUDO-RANDOM GENERATORS FOR PARALLEL COMPUTING

Pith reviewed 2026-05-20 00:27 UTC · model grok-4.3

classification 💻 cs.DC
keywords pseudo-random number generatorsPRNGBigCrushTestU01parallel streamsstatistical testinghigh-performance computingAI workloads
0
0 comments X

The pith

Modern PRNGs pass at most 72 percent of BigCrush tests when thousands of parallel streams are evaluated.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests popular pseudo-random generators including Xoshiro, Philox, PCG, and MRG32k3a by running the BigCrush battery on more than one thousand streams per generator instead of the usual single-stream checks. This multi-stream protocol exposes repeated statistical failures that the generators' original single-stream validations missed. Nearly every generator fails multiple tests, and the best performer reaches only a 72 percent success rate across the battery. The evaluation uses consistent initialization and conditions meant to mimic real parallel workloads in high-performance computing and artificial intelligence. All raw results are released in a public repository so others can reproduce or extend the checks.

Core claim

When more than 10^3 independent streams are generated and subjected to the full BigCrush test suite from TestU01, generators drawn from the Xoshiro, Philox, PCG, and MRG32k3a families all exhibit multiple statistical defects, with the highest observed success rate across all tests reaching only 72 percent.

What carries the argument

BigCrush statistical test battery applied independently to more than 1000 parallel streams per generator under uniform initialization.

If this is right

  • Claims of statistical quality made by generator authors on the basis of single-stream tests do not extend to parallel usage.
  • Specific failed tests for each generator family are now documented and can guide selection or mitigation in production code.
  • Large-scale simulations and training runs that rely on these generators may encounter reproducibility or bias problems not caught by prior single-stream validation.
  • Reproducible results in parallel environments require either different generators or additional safeguards beyond published single-stream performance.
  • The four-and-a-half-year computational effort shows that thorough multi-stream testing is feasible but expensive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Monte Carlo methods and stochastic gradient sampling in AI could inherit subtle biases when these generators are used across many parallel workers.
  • Future generator design and validation suites should treat multi-stream testing as a first-class requirement rather than an optional check.
  • Library maintainers might add runtime warnings or default to safer alternatives when users request thousands of independent streams.
  • Automated regression testing frameworks could incorporate periodic multi-stream BigCrush runs to catch regressions after code changes.

Load-bearing premise

BigCrush tests run on over one thousand streams with standard initialization are enough to reveal every statistical defect that would matter in actual parallel HPC or AI workloads.

What would settle it

A generator that passes every BigCrush test without any failure when more than 1000 streams are initialized and run under the same protocol used here.

Figures

Figures reproduced from arXiv: 2605.18227 by David R.C. Hill (ISIMA, INP Clermont Auvergne, INP Clermont Auvergne), LIMOS, Mines Saint-\'Etienne MSE), Th\'eau Wartel (ISIMA.

Figure 4
Figure 4. Figure 4: shows the distribution of PCG32 failures at BigCrush battery. Like all the other generators, the failure peaks are placed on the same tests, which are: the latest CollisionOver, ClosePairs and RandomWalks tests. The tests with the highest failure rate show 33 and 32 occurrences respectively for tests 12 and 25, representing an occurrence rate of less than 1.7%. The other tests are below 30 failures. Out of… view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of failed test for Philox4x32 and PCG32 (using BigCrush test numbers) [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
read the original abstract

Pseudo-random number generators (PRNGs) are widely used in modern computing and are expected to exhibit excellent statistical performance and repeatability. This study evaluates and compares modern PRNGs used in high performance computing and artificial intelligence. Our selections comes from different families, including Xoshiro, Philox, PCG, and MRG32k3a. We systematically assess the quality of these generators; instead of testing a single stream for each generator, we test more than 10 3 streams with the BigCrush battery form the TestU01 library. The results, involving more than 4.5 years of cumulative computing time, are analyzed against the claims made by the generators' creators. The highest success rate is 72%, and all tests have been failed by almost every generator, the failed tests are documented. To ensure fairness, all tests are conducted under consistent conditions and are designed to closely simulate real-world usage. The results of each test are available, usable and reproducible with a git repository.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to evaluate the statistical quality of modern PRNGs (Xoshiro, Philox, PCG, MRG32k3a) for parallel computing by testing more than 10^3 streams per generator with the BigCrush battery from TestU01. It reports a maximum success rate of 72%, with nearly all generators failing multiple tests, based on experiments simulating real-world usage and involving over 4.5 years of compute time. Results are made reproducible via a git repository.

Significance. This empirical study highlights potential issues with PRNGs in parallel environments, which is relevant for HPC and AI applications. The extensive testing across many streams is a strength compared to standard single-stream evaluations. However, the impact is limited by insufficient details on stream initialization, which is essential for interpreting whether the failures indicate real defects in the generators or artifacts of the testing methodology.

major comments (2)
  1. The experimental design for generating the >10^3 parallel streams is not described. The abstract asserts that tests 'closely simulate real-world usage' yet provides no information on the seeding strategy, jump-ahead functions, or splitting methods used for each PRNG family. This omission is load-bearing for the central claim, as improper stream initialization (e.g., simple increment from a global seed) could induce correlations that cause test failures unrelated to the generator's quality.
  2. Results section: The reported 'highest success rate is 72%' and statement that 'all tests have been failed by almost every generator' require a precise definition of 'success rate' (e.g., average percentage of BigCrush tests passed across streams or per-generator aggregate). Without this, it is difficult to assess the quantitative strength of the conclusion that these generators are inadequate for parallel use.
minor comments (2)
  1. Grammatical error in abstract: 'Our selections comes' should be 'Our selections come'.
  2. Typo in abstract: 'form the TestU01 library' should be 'from the TestU01 library'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful and constructive review of our manuscript. We address each of the major comments below and have revised the manuscript to improve clarity on the experimental design and quantitative definitions.

read point-by-point responses
  1. Referee: The experimental design for generating the >10^3 parallel streams is not described. The abstract asserts that tests 'closely simulate real-world usage' yet provides no information on the seeding strategy, jump-ahead functions, or splitting methods used for each PRNG family. This omission is load-bearing for the central claim, as improper stream initialization (e.g., simple increment from a global seed) could induce correlations that cause test failures unrelated to the generator's quality.

    Authors: We agree that a detailed description of stream initialization is essential for interpreting the results. The revised manuscript includes a new subsection in the Methods that specifies the seeding and splitting approach for each generator family. Xoshiro streams were created using the jump-ahead function to advance the internal state by a fixed large increment per stream. Philox and PCG used their respective counter-based and linear-congruential splitting mechanisms with distinct seeds derived from a master seed. MRG32k3a streams followed the standard combination of two MRGs with unique initial states. These choices follow the generators' documented recommendations for parallel use and are accompanied by pseudocode and repository links. We believe this addition removes the possibility that failures arise from naive initialization. revision: yes

  2. Referee: Results section: The reported 'highest success rate is 72%' and statement that 'all tests have been failed by almost every generator' require a precise definition of 'success rate' (e.g., average percentage of BigCrush tests passed across streams or per-generator aggregate). Without this, it is difficult to assess the quantitative strength of the conclusion that these generators are inadequate for parallel use.

    Authors: We accept that the original wording was imprecise. The revised Results section now defines success rate explicitly as the percentage of the 160 BigCrush tests passed by an individual stream. The figure of 72% is the highest such value recorded for any single stream across all generators and all 1000+ streams tested. We have also rephrased the failure statement to indicate that every generator produced streams that failed at least one test and that the large majority of streams failed multiple tests. A supplementary table now reports the mean, median, and range of success rates per generator to give a fuller quantitative picture. revision: yes

Circularity Check

0 steps flagged

No circularity: pure empirical measurement study with external benchmarks

full rationale

The paper conducts direct statistical testing of PRNGs using the BigCrush battery from TestU01 on >10^3 streams per generator. No derivations, predictions, or first-principles results are claimed that could reduce to fitted inputs, self-citations, or ansatzes. All results are reproducible measurements against an independent external test suite, satisfying the self-contained criterion. The skeptic concern about unspecified stream initialization is a methodological gap but does not constitute circularity in any derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study rests on the assumption that the BigCrush battery is an adequate proxy for statistical quality in parallel settings and that the chosen initialization and stream separation method does not introduce artifacts. No free parameters or invented entities are introduced.

axioms (1)
  • domain assumption BigCrush from TestU01 is a sufficient and unbiased test battery for detecting defects relevant to parallel PRNG use.
    Invoked when interpreting pass/fail rates as evidence of generator quality.

pith-pipeline@v0.9.0 · 5747 in / 1258 out tokens · 30484 ms · 2026-05-20T00:27:33.025359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

4 extracted references · 4 canonical work pages

  1. [1]

    Antunes, B., Mazel, C., & Hill, D. R.C. (2023). Identifying quality mersenne twister streams for parallel stochastic simulations. ACM/IEEE, In 2023 Winter Simulation Conference (WSC), 2801-2812. Antunes, B., & Hill, D.R. C. (2024). Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study...

  2. [2]

    Distribution of random streams for simulation practitioners

    Addison - Wesley Professional. Hill D.R.C, Mazel C., Passerat-Palmbach J, Traore M. (2013), “Distribution of random streams for simulation practitioners”. Concurrency and Computation: Practice and Experience, 25(10), 1427-1442. Hill D.R.C., Antunes B, Bertrand A., Nguifo E.M., Yon L., Nautré-Domanski J., Antoine V., “Machine Learning, Simulation and Repro...

  3. [3]

    Vigna, S

    In Proceedings of 2011 Supercomputing Conference (SC11): international conference for high performance computing, networking, storage and analysis, 1-12. Vigna, S. (2016). An experimental exploration of Marsaglia's xorshift generators, scrambled. ACM Transactions on Mathematical Software (TOMS), 42(4), 1-23. Web Reference Salmon, J. K., & Moraes, M. A.,

  4. [4]

    Random123: a Library of Counter-Based Random Number Generators

    “Random123: a Library of Counter-Based Random Number Generators”. Retrieved February 12th, 2025, from: https://www.thesalmons.org/john/random123/releases/latest/docs/index.html Biographies Theau Wartel is a student at the ISIMA - Clermont Auvergne INP engineering school. He specializes in embedded and virtual interactive systems. He is part of the “resear...