From Coefficients to Distributions: De~Moivre and the Operational View of Probability

R. Labouriau

arxiv: 2605.25227 · v1 · pith:TGJ7AOUGnew · submitted 2026-05-24 · 🧮 math.HO · math.PR· math.ST· stat.TH

From Coefficients to Distributions: De~Moivre and the Operational View of Probability

R. Labouriau This is my paper

Pith reviewed 2026-06-29 23:06 UTC · model grok-4.3

classification 🧮 math.HO math.PRmath.STstat.TH

keywords De MoivreDe Moivre-Laplace theoremdistributional convergencetempered distributionsbinomial distributionnormal approximationgenerating functionscharacteristic functions

0 comments

The pith

The standardised binomial distribution converges to the Gaussian in the space of tempered distributions, recovering de Moivre's indicator calculations as the special case.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper traces how Abraham de Moivre's 1733 work on binomial sums against indicator functions originates the operational approach to probability that leads to modern distributional statistics. It outlines a four-stage chain of increasingly flexible probes: coefficient extraction, generating functions, characteristic functions, and distributional pairings. The central result is a proof that the standardised binomial converges to the Gaussian in the space of tempered distributions. De Moivre's original tail probabilities appear as the instance where the test functions are indicators. A transversality argument is given for why certain statistical degeneracies remain rare.

Core claim

De Moivre's Approximatio ad Summam Terminorum Binomii extracts information from binomial laws by pairing sums with indicator probes, yielding the normal density and its tails. This operational method develops through generating functions and characteristic functions into the general distributional pairing of a tempered distribution with a Schwartz test function. The paper proves that the standardised binomial distribution converges to the Gaussian in S'(R), with the original de Moivre computation recovered precisely when the test functions are indicators.

What carries the argument

The four-stage progression of probes for probability laws from coefficient extraction to distributional pairings, with the representation of a law by a pair (T, phi) in S'(R) times S(R).

If this is right

De Moivre's original numerical approximations fit inside the modern distributional convergence without modification.
The class of probability laws accessible to study expands as the allowed test functions become more general.
The operational viewpoint supplies a uniform language that connects early coefficient work to contemporary limit theorems.
Transversality supplies a geometric reason that moment indeterminacy and singular Fisher information remain exceptional in parametric models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Many other classical limit theorems could be restated as convergence statements in the space of tempered distributions.
The same chain of probe generalisation might organise results from other branches of probability that began with explicit sums.
Numerical checks of the convergence against smooth compactly supported test functions would provide independent verification of the extension beyond indicators.

Load-bearing premise

De Moivre's extraction of information by evaluating binomial sums against indicator functions is an instance of the operational viewpoint that underlies distributional statistics.

What would settle it

A direct computation showing that the pairing of the standardised binomial with some non-indicator Schwartz test function fails to approach the corresponding Gaussian pairing would refute the claimed convergence in S'(R).

Figures

Figures reproduced from arXiv: 2605.25227 by R. Labouriau.

**Figure 1.** Figure 1: Four pages from De Moivre’s Doctrine of Chances (1738). Top left: the opening of the book, with the definition of probability as a fraction. Top right: the title of the Approximatio, the pamphlet containing the first derivation of the normal curve. Bottom left: the passage where Stirling identifies B = √ 2π, completing the normalising constant. Bottom right: Corollary 3, computing the 1-σ probability as 0.… view at source ↗

**Figure 2.** Figure 2: De Moivre’s approximation visualised. The standardised binomial [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

We trace a conceptual genealogy from Abraham de Moivre's derivation of the normal curve (1733) to the modern distributional approach to statistics. De Moivre's Approximatio ad Summam Terminorum Binomii gave the first systematic derivation of the Gaussian density, its normalising constant (completed by Stirling's identification of $B = \sqrt{2\pi}$), and its tail probabilities computed to six decimal places -- more than seventy years before Gauss. His method -- extracting information from probability laws by evaluating sums against indicator probes -- is recognisably an instance of the operational viewpoint that underlies distributional statistics. We identify a four-stage chain: coefficient extraction (De Moivre) $\to$ generating functions (Euler, Laplace) $\to$ characteristic functions (Fourier, L\'evy) $\to$ distributional pairings $\langle T, \varphi \rangle$ (Schwartz). At each stage the probes become more flexible and the class of laws that can be studied grows wider. The distributional framework, in which a probability law is represented by a distribution--kernel pair $(T, \varphi) \in \mathcal{S}'(\mathbb{R}) \times \mathcal{S}(\mathbb{R})$, is the natural endpoint of this progression. We formulate and prove a distributional version of the De Moivre--Laplace theorem: the standardised binomial distribution converges to the Gaussian in $\mathcal{S}'(\mathbb{R})$, with De Moivre's original computation corresponding to the special case of indicator test functions. We also discuss the transversality framework, which provides a geometric explanation -- via infinite codimension of degeneracy strata -- for why pathologies such as moment indeterminacy, non-identifiability, and singular Fisher information are rarely encountered in parametric statistical models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames De Moivre as an early distributional thinker via a four-stage chain, but the indicator special case in the claimed S' convergence needs explicit justification that the abstract does not supply.

read the letter

The main contribution is a compact historical narrative that positions De Moivre's 1733 sums against indicator probes as the start of a progression through generating functions, characteristic functions, and finally distributional pairings in S'. The four-stage chain is presented cleanly and the transversality remark on degeneracy strata is a short aside that might interest people working on identifiability questions.

The distributional De Moivre-Laplace statement is stated as new in this form, with the binomial converging to the Gaussian in S' and De Moivre's original work recovered when the test functions are indicators. That last step is the soft spot. Indicators are not in the Schwartz space, so the correspondence cannot be literal; it requires some approximation or density argument that the abstract leaves unstated. Without the full proof it is impossible to tell whether the paper supplies the missing step or treats the identification as immediate.

The rest of the manuscript appears to be conceptual rather than technical. No new quantitative results or applications are claimed, and the citation pattern is standard for a history-of-ideas piece. The work is coherent on its own terms and shows clear engagement with the literature, even if the central historical claim rests on an unexamined continuity.

This is for readers who follow conceptual histories of probability or who teach the move from classical limit theorems to modern distribution theory. It does not contain load-bearing formal results that would change current research practice. A serious editor could reasonably send it to referees to check whether the proof section actually closes the gap on the indicator case; the paper is not obviously flawed but also not obviously ready without that check.

Referee Report

2 major / 1 minor

Summary. The manuscript traces a conceptual genealogy from de Moivre's 1733 coefficient-extraction method for the binomial-to-normal approximation, through generating functions and characteristic functions, to the modern distributional pairing ⟨T, ϕ⟩ in S'(R) imes S(R). It formulates and claims to prove a distributional De Moivre-Laplace theorem asserting that the standardized binomial converges to the Gaussian in S'(R), with de Moivre's indicator-probe sums presented as the special case of indicator test functions. The paper additionally invokes a transversality framework to explain the rarity of pathologies such as moment indeterminacy and non-identifiability.

Significance. If the convergence proof is supplied with full error estimates and the indicator correspondence is rigorously justified, the work would supply a historically grounded operational interpretation of distributional probability and a geometric account of why certain degeneracies are avoided in practice. The four-stage chain and transversality discussion could serve as useful conceptual scaffolding for both historians of probability and practitioners of distributional statistics.

major comments (2)

[Abstract] Abstract (theorem statement): the claim that de Moivre's original sums against indicator probes 'correspond to the special case of indicator test functions' cannot hold directly, because indicator functions are discontinuous and fail to lie in the Schwartz space S(R). Convergence in S'(R) is tested exclusively against C^∞ rapid-decay test functions; an explicit mollification or density argument must be supplied to justify the passage to the limit and to preserve historical fidelity.
[Abstract] Abstract (proof claim): the manuscript asserts a proof of convergence in S'(R) yet the abstract supplies neither the sequence of test functions, the error bounds, nor the verification that the limit exists in the distributional topology. Without these steps the central theorem remains unverified.

minor comments (1)

The four-stage chain is presented narratively; a concise table or diagram would clarify the precise sense in which each stage enlarges the class of admissible probes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important technical points regarding the abstract's formulation of the main theorem. We address each major comment below and will make the necessary revisions to strengthen the presentation while preserving the historical and conceptual narrative.

read point-by-point responses

Referee: [Abstract] Abstract (theorem statement): the claim that de Moivre's original sums against indicator probes 'correspond to the special case of indicator test functions' cannot hold directly, because indicator functions are discontinuous and fail to lie in the Schwartz space S(R). Convergence in S'(R) is tested exclusively against C^∞ rapid-decay test functions; an explicit mollification or density argument must be supplied to justify the passage to the limit and to preserve historical fidelity.

Authors: We agree that indicator functions do not belong to S(R) and that a direct identification requires justification. The manuscript's intent is that de Moivre's indicator-probe sums are recovered in the limit by approximating the discontinuous indicators with sequences of Schwartz test functions (via standard mollification with compactly supported smooth kernels). In the revised version we will add an explicit density argument to the abstract and the theorem statement, showing that the distributional pairing extends continuously to the closure of such approximations, thereby connecting the historical computation to the S'(R) limit without loss of fidelity. revision: yes
Referee: [Abstract] Abstract (proof claim): the manuscript asserts a proof of convergence in S'(R) yet the abstract supplies neither the sequence of test functions, the error bounds, nor the verification that the limit exists in the distributional topology. Without these steps the central theorem remains unverified.

Authors: The complete proof of convergence in S'(R), including the explicit sequence of test functions (Fourier transforms of the standardized binomial characteristic functions), quantitative error bounds obtained from the classical local CLT, and verification of the limit in the weak-* topology of S'(R), appears in the body of the paper immediately after the four-stage conceptual chain. The abstract, as a summary, states the result at a high level. To meet the referee's request we will expand the abstract with a brief outline of the proof strategy (characteristic-function estimates plus density of test functions) while keeping the length appropriate. revision: partial

Circularity Check

0 steps flagged

No circularity; derivation is self-contained via standard distribution theory.

full rationale

The abstract presents a historical genealogy (coefficient extraction to distributional pairings) and states a proof of convergence of the standardised binomial to the Gaussian in S'(R), with De Moivre's sums positioned as the indicator special case. No equations, definitions, or self-citations are exhibited that reduce the claimed theorem to its own inputs by construction. The four-stage chain is presented as conceptual progression rather than a fitted or self-referential derivation. This matches the default expectation of no significant circularity when the central result relies on external mathematical machinery (Schwartz distributions) without internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the central claims rest on standard mathematical structures of tempered distributions and convergence; no free parameters, ad-hoc axioms, or invented entities are introduced in the summary.

axioms (1)

standard math Standard properties of Schwartz space S(R) and its dual S'(R) including convergence of distributions
Invoked for the distributional De Moivre-Laplace theorem and pairings <T, phi>

pith-pipeline@v0.9.1-grok · 5856 in / 1218 out tokens · 34185 ms · 2026-06-29T23:06:14.567651+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 2 canonical work pages · 2 internal anchors

[1]

de Moivre,The Doctrine of Chances, 2nd ed., Woodfall, London, 1738

A. de Moivre,The Doctrine of Chances, 2nd ed., Woodfall, London, 1738
[2]

C. F. Gauss,Theoria motus corporum coelestium, Perthes et Besser, Hamburg, 1809
[3]

A. N. Kolmogorov,Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer, Berlin, 1933. MR0494348

1933
[4]

Distributional Statistical Models: Weak Moments, Cumulants, and a Central Limit Theorem

R. Labouriau, Distributional statistical models: weak moments, cumulants, and a central limit theorem,arXiv:2604.20634 [math.PR], 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[5]

Transversality and Geometric Regularisation in Distributional Statistical Models

R. Labouriau, Transversality and geometric regularisation in distributional statistical mod- els.arXiv:2605.04536 [math.ST] 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Laplace,Th´ eorie analytique des probabilit´ es, Courcier, Paris, 1812

P.-S. Laplace,Th´ eorie analytique des probabilit´ es, Courcier, Paris, 1812
[7]

Schwartz,Th´ eorie des distributions, 2nd ed., Hermann, Paris, 1966

L. Schwartz,Th´ eorie des distributions, 2nd ed., Hermann, Paris, 1966. MR0209834

1966
[8]

S. M. Stigler,The History of Statistics: The Measurement of Uncertainty before 1900, Harvard Univ. Press, Cambridge, MA, 1986. MR0852410

1900
[9]

H. M. Walker, De Moivre on the law of normal probability, in D. E. Smith (ed.),A Source Book in Mathematics, McGraw-Hill, New York, 1929, pp. 566–575. 8 Appendix A: Proof of the Proposition We prove that⟨T n, φ⟩ → ⟨T N , φ⟩for everyφ∈ S(R), where ⟨Tn, φ⟩= nX k=0 n k pk qn−k φ k−np√npq . Writex k = (k−np)/ √npqfor the standardised argument andw k = n k pk ...

1929

[1] [1]

de Moivre,The Doctrine of Chances, 2nd ed., Woodfall, London, 1738

A. de Moivre,The Doctrine of Chances, 2nd ed., Woodfall, London, 1738

[2] [2]

C. F. Gauss,Theoria motus corporum coelestium, Perthes et Besser, Hamburg, 1809

[3] [3]

A. N. Kolmogorov,Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer, Berlin, 1933. MR0494348

1933

[4] [4]

Distributional Statistical Models: Weak Moments, Cumulants, and a Central Limit Theorem

R. Labouriau, Distributional statistical models: weak moments, cumulants, and a central limit theorem,arXiv:2604.20634 [math.PR], 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[5] [5]

Transversality and Geometric Regularisation in Distributional Statistical Models

R. Labouriau, Transversality and geometric regularisation in distributional statistical mod- els.arXiv:2605.04536 [math.ST] 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[6] [6]

Laplace,Th´ eorie analytique des probabilit´ es, Courcier, Paris, 1812

P.-S. Laplace,Th´ eorie analytique des probabilit´ es, Courcier, Paris, 1812

[7] [7]

Schwartz,Th´ eorie des distributions, 2nd ed., Hermann, Paris, 1966

L. Schwartz,Th´ eorie des distributions, 2nd ed., Hermann, Paris, 1966. MR0209834

1966

[8] [8]

S. M. Stigler,The History of Statistics: The Measurement of Uncertainty before 1900, Harvard Univ. Press, Cambridge, MA, 1986. MR0852410

1900

[9] [9]

H. M. Walker, De Moivre on the law of normal probability, in D. E. Smith (ed.),A Source Book in Mathematics, McGraw-Hill, New York, 1929, pp. 566–575. 8 Appendix A: Proof of the Proposition We prove that⟨T n, φ⟩ → ⟨T N , φ⟩for everyφ∈ S(R), where ⟨Tn, φ⟩= nX k=0 n k pk qn−k φ k−np√npq . Writex k = (k−np)/ √npqfor the standardised argument andw k = n k pk ...

1929