From Coefficients to Distributions: De~Moivre and the Operational View of Probability
Pith reviewed 2026-06-29 23:06 UTC · model grok-4.3
The pith
The standardised binomial distribution converges to the Gaussian in the space of tempered distributions, recovering de Moivre's indicator calculations as the special case.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
De Moivre's Approximatio ad Summam Terminorum Binomii extracts information from binomial laws by pairing sums with indicator probes, yielding the normal density and its tails. This operational method develops through generating functions and characteristic functions into the general distributional pairing of a tempered distribution with a Schwartz test function. The paper proves that the standardised binomial distribution converges to the Gaussian in S'(R), with the original de Moivre computation recovered precisely when the test functions are indicators.
What carries the argument
The four-stage progression of probes for probability laws from coefficient extraction to distributional pairings, with the representation of a law by a pair (T, phi) in S'(R) times S(R).
If this is right
- De Moivre's original numerical approximations fit inside the modern distributional convergence without modification.
- The class of probability laws accessible to study expands as the allowed test functions become more general.
- The operational viewpoint supplies a uniform language that connects early coefficient work to contemporary limit theorems.
- Transversality supplies a geometric reason that moment indeterminacy and singular Fisher information remain exceptional in parametric models.
Where Pith is reading between the lines
- Many other classical limit theorems could be restated as convergence statements in the space of tempered distributions.
- The same chain of probe generalisation might organise results from other branches of probability that began with explicit sums.
- Numerical checks of the convergence against smooth compactly supported test functions would provide independent verification of the extension beyond indicators.
Load-bearing premise
De Moivre's extraction of information by evaluating binomial sums against indicator functions is an instance of the operational viewpoint that underlies distributional statistics.
What would settle it
A direct computation showing that the pairing of the standardised binomial with some non-indicator Schwartz test function fails to approach the corresponding Gaussian pairing would refute the claimed convergence in S'(R).
Figures
read the original abstract
We trace a conceptual genealogy from Abraham de Moivre's derivation of the normal curve (1733) to the modern distributional approach to statistics. De Moivre's Approximatio ad Summam Terminorum Binomii gave the first systematic derivation of the Gaussian density, its normalising constant (completed by Stirling's identification of $B = \sqrt{2\pi}$), and its tail probabilities computed to six decimal places -- more than seventy years before Gauss. His method -- extracting information from probability laws by evaluating sums against indicator probes -- is recognisably an instance of the operational viewpoint that underlies distributional statistics. We identify a four-stage chain: coefficient extraction (De Moivre) $\to$ generating functions (Euler, Laplace) $\to$ characteristic functions (Fourier, L\'evy) $\to$ distributional pairings $\langle T, \varphi \rangle$ (Schwartz). At each stage the probes become more flexible and the class of laws that can be studied grows wider. The distributional framework, in which a probability law is represented by a distribution--kernel pair $(T, \varphi) \in \mathcal{S}'(\mathbb{R}) \times \mathcal{S}(\mathbb{R})$, is the natural endpoint of this progression. We formulate and prove a distributional version of the De Moivre--Laplace theorem: the standardised binomial distribution converges to the Gaussian in $\mathcal{S}'(\mathbb{R})$, with De Moivre's original computation corresponding to the special case of indicator test functions. We also discuss the transversality framework, which provides a geometric explanation -- via infinite codimension of degeneracy strata -- for why pathologies such as moment indeterminacy, non-identifiability, and singular Fisher information are rarely encountered in parametric statistical models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript traces a conceptual genealogy from de Moivre's 1733 coefficient-extraction method for the binomial-to-normal approximation, through generating functions and characteristic functions, to the modern distributional pairing ⟨T, ϕ⟩ in S'(R) imes S(R). It formulates and claims to prove a distributional De Moivre-Laplace theorem asserting that the standardized binomial converges to the Gaussian in S'(R), with de Moivre's indicator-probe sums presented as the special case of indicator test functions. The paper additionally invokes a transversality framework to explain the rarity of pathologies such as moment indeterminacy and non-identifiability.
Significance. If the convergence proof is supplied with full error estimates and the indicator correspondence is rigorously justified, the work would supply a historically grounded operational interpretation of distributional probability and a geometric account of why certain degeneracies are avoided in practice. The four-stage chain and transversality discussion could serve as useful conceptual scaffolding for both historians of probability and practitioners of distributional statistics.
major comments (2)
- [Abstract] Abstract (theorem statement): the claim that de Moivre's original sums against indicator probes 'correspond to the special case of indicator test functions' cannot hold directly, because indicator functions are discontinuous and fail to lie in the Schwartz space S(R). Convergence in S'(R) is tested exclusively against C^∞ rapid-decay test functions; an explicit mollification or density argument must be supplied to justify the passage to the limit and to preserve historical fidelity.
- [Abstract] Abstract (proof claim): the manuscript asserts a proof of convergence in S'(R) yet the abstract supplies neither the sequence of test functions, the error bounds, nor the verification that the limit exists in the distributional topology. Without these steps the central theorem remains unverified.
minor comments (1)
- The four-stage chain is presented narratively; a concise table or diagram would clarify the precise sense in which each stage enlarges the class of admissible probes.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important technical points regarding the abstract's formulation of the main theorem. We address each major comment below and will make the necessary revisions to strengthen the presentation while preserving the historical and conceptual narrative.
read point-by-point responses
-
Referee: [Abstract] Abstract (theorem statement): the claim that de Moivre's original sums against indicator probes 'correspond to the special case of indicator test functions' cannot hold directly, because indicator functions are discontinuous and fail to lie in the Schwartz space S(R). Convergence in S'(R) is tested exclusively against C^∞ rapid-decay test functions; an explicit mollification or density argument must be supplied to justify the passage to the limit and to preserve historical fidelity.
Authors: We agree that indicator functions do not belong to S(R) and that a direct identification requires justification. The manuscript's intent is that de Moivre's indicator-probe sums are recovered in the limit by approximating the discontinuous indicators with sequences of Schwartz test functions (via standard mollification with compactly supported smooth kernels). In the revised version we will add an explicit density argument to the abstract and the theorem statement, showing that the distributional pairing extends continuously to the closure of such approximations, thereby connecting the historical computation to the S'(R) limit without loss of fidelity. revision: yes
-
Referee: [Abstract] Abstract (proof claim): the manuscript asserts a proof of convergence in S'(R) yet the abstract supplies neither the sequence of test functions, the error bounds, nor the verification that the limit exists in the distributional topology. Without these steps the central theorem remains unverified.
Authors: The complete proof of convergence in S'(R), including the explicit sequence of test functions (Fourier transforms of the standardized binomial characteristic functions), quantitative error bounds obtained from the classical local CLT, and verification of the limit in the weak-* topology of S'(R), appears in the body of the paper immediately after the four-stage conceptual chain. The abstract, as a summary, states the result at a high level. To meet the referee's request we will expand the abstract with a brief outline of the proof strategy (characteristic-function estimates plus density of test functions) while keeping the length appropriate. revision: partial
Circularity Check
No circularity; derivation is self-contained via standard distribution theory.
full rationale
The abstract presents a historical genealogy (coefficient extraction to distributional pairings) and states a proof of convergence of the standardised binomial to the Gaussian in S'(R), with De Moivre's sums positioned as the indicator special case. No equations, definitions, or self-citations are exhibited that reduce the claimed theorem to its own inputs by construction. The four-stage chain is presented as conceptual progression rather than a fitted or self-referential derivation. This matches the default expectation of no significant circularity when the central result relies on external mathematical machinery (Schwartz distributions) without internal reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard properties of Schwartz space S(R) and its dual S'(R) including convergence of distributions
Reference graph
Works this paper leans on
-
[1]
de Moivre,The Doctrine of Chances, 2nd ed., Woodfall, London, 1738
A. de Moivre,The Doctrine of Chances, 2nd ed., Woodfall, London, 1738
-
[2]
C. F. Gauss,Theoria motus corporum coelestium, Perthes et Besser, Hamburg, 1809
-
[3]
A. N. Kolmogorov,Grundbegriffe der Wahrscheinlichkeitsrechnung, Springer, Berlin, 1933. MR0494348
1933
-
[4]
Distributional Statistical Models: Weak Moments, Cumulants, and a Central Limit Theorem
R. Labouriau, Distributional statistical models: weak moments, cumulants, and a central limit theorem,arXiv:2604.20634 [math.PR], 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[5]
Transversality and Geometric Regularisation in Distributional Statistical Models
R. Labouriau, Transversality and geometric regularisation in distributional statistical mod- els.arXiv:2605.04536 [math.ST] 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[6]
Laplace,Th´ eorie analytique des probabilit´ es, Courcier, Paris, 1812
P.-S. Laplace,Th´ eorie analytique des probabilit´ es, Courcier, Paris, 1812
-
[7]
Schwartz,Th´ eorie des distributions, 2nd ed., Hermann, Paris, 1966
L. Schwartz,Th´ eorie des distributions, 2nd ed., Hermann, Paris, 1966. MR0209834
1966
-
[8]
S. M. Stigler,The History of Statistics: The Measurement of Uncertainty before 1900, Harvard Univ. Press, Cambridge, MA, 1986. MR0852410
1900
-
[9]
H. M. Walker, De Moivre on the law of normal probability, in D. E. Smith (ed.),A Source Book in Mathematics, McGraw-Hill, New York, 1929, pp. 566–575. 8 Appendix A: Proof of the Proposition We prove that⟨T n, φ⟩ → ⟨T N , φ⟩for everyφ∈ S(R), where ⟨Tn, φ⟩= nX k=0 n k pk qn−k φ k−np√npq . Writex k = (k−np)/ √npqfor the standardised argument andw k = n k pk ...
1929
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.