arxiv: 2605.00247 · v1 · submitted 2026-04-30 · 📊 stat.CO · cs.DC· cs.MM· econ.EM

Recognition: unknown

2B or Not 2B: A Tale of Three Algorithms for Streaming: Covariance Estimation after Welford and Chan-Golub-LeVeque

Felix Reichel

Authors on Pith no claims yet

Pith reviewed 2026-05-09 19:28 UTC · model grok-4.3

classification 📊 stat.CO cs.DCcs.MMecon.EM

keywords streaming covariance estimationWelford algorithmChan-Golub-LeVequeconformal predictionnumerical stabilitydistributed computationunbiased sample covarianceonline algorithms

0 comments

The pith

Three streaming covariance algorithms produce identical estimators in exact arithmetic but differ in numerical stability and use cases, with conformal prediction supplying finite-sample entrywise intervals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper unifies the Gram, Welford, and Chan-Golub-LeVeque algorithms by deriving them from equivalent algebraic identities for the unbiased sample covariance. It shows that all three maintain running statistics that yield exactly the same matrix S_t = M_t/(t-1) when arithmetic is exact, while their floating-point behavior and architectural fit diverge. A conformal layer is added that constructs distribution-free confidence sets for each matrix entry at every time step without requiring a full data pass. This matters for anyone who must update covariance estimates from an unending stream or across distributed nodes while also reporting honest uncertainty.

Core claim

The three algorithms produce the same estimator S_t = M_t/(t-1) in exact arithmetic, although their finite-precision behavior differs markedly. The Chan-Golub-LeVeque identity enables exact merging of covariance statistics from separate data blocks. A conformal prediction framework supplies finite-sample, distribution-free confidence sets C_{t,jk} for each entry of S_t.

What carries the argument

The running mean m_t and correction matrix M_t updated via one-pass formulas, the block-merging identity M = M_A + M_B + (n_A n_B)/(n_A+n_B)(m_B-m_A)(m_B-m_A)^T, and the conformal nonconformity scores applied to the resulting estimator.

If this is right

The Gram algorithm is fastest for batch computation of the covariance matrix.
The Welford algorithm is uniquely robust to catastrophic cancellation under large mean shifts.
The Chan-Golub-LeVeque algorithm is optimal for distributed and map-reduce architectures.
Conformal intervals achieve the nominal coverage level across all three algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The merging identity could be reused for other one-pass moment statistics in federated settings where raw data cannot be pooled.
Numerical stability differences suggest choosing Welford when the stream is expected to exhibit large location shifts.
The conformal construction may extend directly to online correlation or precision matrices once the same running statistics are available.

Load-bearing premise

The conformal coverage guarantees assume the incoming data stream satisfies exchangeability or i.i.d. sampling.

What would settle it

Run the conformal procedure on a long stream drawn from an autoregressive process with known dependence and check whether the observed coverage rate for the intervals falls below the nominal level.

read the original abstract

We place three algorithms for computing the unbiased sample covariance matrix in streaming and distributed settings on a common algebraic, numerical, and statistical foundation. The Gram algorithm, derived from the variance reformulation, maintains the running cross-product matrix $G_t = \sum_{i=1}^t x_i x_i^\top$ and the column-sum vector $s_t = \sum_{i=1}^t x_i$, yielding the unbiased covariance estimator $S_t = (t-1)^{-1}(G_t - t^{-1}s_t s_t^\top)$ in $O(p^2)$ time per update. The Welford algorithm propagates a running mean $m_t$ and outer-product corrections $M_t$, with updates $m_t = m_{t-1} + (x_t - m_{t-1})/t$ and $M_t = M_{t-1} + (x_t - m_{t-1})(x_t - m_t)^\top$, achieving the same asymptotic cost with improved numerical stability under large data shifts. The Chan-Golub-LeVeque algorithm supports block-parallel merging through the exact identity $M = M_A + M_B + \frac{n_A n_B}{n_A+n_B}(m_B - m_A)(m_B - m_A)^\top$, making it the natural choice for distributed and map-reduce architectures. All three algorithms produce the same estimator $S_t = M_t/(t-1)$ in exact arithmetic, although their finite-precision behavior differs markedly. Beyond runtime and numerical comparisons, we introduce a conformal prediction framework for streaming covariance estimation that yields finite-sample, distribution-free confidence sets $C_{t,jk}$ for each entry $S_{t,jk}$ of the covariance matrix at any step $t$ of the data stream. Experiments confirm that the Gram algorithm is fastest for batch computation, Welford is uniquely robust to catastrophic cancellation under large mean shifts, CGL is optimal for distributed settings, and conformal intervals achieve the nominal coverage level across all three algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper unifies three known streaming covariance algorithms and adds conformal confidence sets, but the coverage may not hold under typical streaming dependence.

read the letter

This paper lines up the Gram, Welford, and Chan-Golub-LeVeque algorithms for streaming covariance and shows they give identical results in exact arithmetic while differing in stability and parallel use. It then wraps the estimator with conformal prediction to produce entrywise confidence sets. The unification itself recycles known summation tricks, but the side-by-side numerical tests and the conformal addition are the actual work. The experiments show Welford avoids cancellation on large shifts, CGL merges blocks cleanly, and the conformal intervals cover at the stated rate on the simulated streams. The soft spot is the exchangeability requirement for the conformal guarantees. Streaming data often arrives with serial correlation or other dependence, yet the paper only states the assumption at a high level and tests coverage on what looks like independent draws. If dependence breaks exchangeability, the finite-sample coverage can fail even if the point estimator stays unbiased. The math checks out for the equivalences, and there are no hidden parameters or self-referential claims. The citation pattern sticks to the standard references for these methods. Practitioners who maintain distributed covariance trackers in machine learning pipelines will find the algorithm comparison and stability guidance useful. Readers looking for new theory on streaming inference will get less out of it. The work is clear and honest enough to go to peer review. A referee can press on the dependence issue and verify the conformal construction, but the paper supplies enough concrete material to make that worthwhile.

Referee Report

1 major / 2 minor

Summary. The paper unifies three streaming algorithms for unbiased sample covariance estimation—Gram (via running cross-product matrix G_t and sum vector s_t), Welford (via running mean m_t and correction matrix M_t), and Chan-Golub-LeVeque (via block-merging identity)—showing they yield identical S_t = M_t/(t-1) in exact arithmetic while differing in numerical stability and distributed suitability. It further introduces a conformal prediction procedure producing finite-sample, distribution-free intervals C_{t,jk} for each matrix entry at step t, with experiments comparing runtimes, stability under mean shifts, and empirical coverage.

Significance. If the conformal coverage result holds under exchangeability, the work offers a practical contribution to computational statistics by providing numerically stable, mergeable streaming estimators together with a distribution-free uncertainty quantification method. The algebraic unification via standard summation identities and the reported differences in finite-precision behavior (e.g., Welford robustness to cancellation) are useful for large-scale and distributed implementations.

major comments (1)

[Conformal prediction framework] Conformal prediction framework (abstract and § on statistical contribution): The finite-sample coverage guarantee for the sets C_{t,jk} is stated to hold under exchangeability (or i.i.d. sampling), yet the manuscript provides no discussion or experiments examining robustness when the stream exhibits serial dependence, autocorrelation, or other violations common in streaming data. The reported nominal coverage may therefore be specific to the i.i.d. test regimes rather than a general property of the streaming estimator.

minor comments (2)

[Abstract] Abstract: The phrase 'conformal intervals achieve the nominal coverage level across all three algorithms' should be qualified by the exchangeability assumption to avoid implying unconditional validity.
[Algorithm descriptions] Notation: The relationship between M_t in the Welford and CGL sections and the final S_t = M_t/(t-1) is clear in the abstract but would benefit from an explicit cross-reference equation in the main text for readers comparing the three presentations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on the manuscript. We address the single major comment below and will incorporate the suggested clarification in the revision.

read point-by-point responses

Referee: The finite-sample coverage guarantee for the sets C_{t,jk} is stated to hold under exchangeability (or i.i.d. sampling), yet the manuscript provides no discussion or experiments examining robustness when the stream exhibits serial dependence, autocorrelation, or other violations common in streaming data. The reported nominal coverage may therefore be specific to the i.i.d. test regimes rather than a general property of the streaming estimator.

Authors: We agree that the finite-sample, distribution-free coverage of the conformal sets C_{t,jk} is guaranteed only under the exchangeability assumption (or i.i.d. sampling), which is stated in the manuscript. The experiments were performed under i.i.d. regimes to confirm that the procedure attains the nominal level when the assumption holds, consistent with standard practice in the conformal prediction literature. In the revised manuscript we will add a short paragraph in the statistical contribution section that explicitly recalls the exchangeability requirement, notes that serial dependence or autocorrelation can invalidate the guarantee, and states that extensions to dependent streams lie outside the present scope. This will make clear that the reported coverage is tied to the stated assumptions rather than being unconditional. revision: yes

Circularity Check

0 steps flagged

No significant circularity; algebraic identities and standard conformal application are self-contained

full rationale

The paper's core derivations consist of explicit update rules for the Gram, Welford, and Chan-Golub-LeVeque algorithms that are shown to equal the unbiased estimator S_t = M_t/(t-1) via direct algebraic expansion and the provided merging identity; these are independent facts, not self-definitions or fitted inputs renamed as predictions. The conformal framework is introduced as an application of the standard distribution-free coverage property under exchangeability to the streaming estimator, with no parameter fitting, self-citation chains, or ansatz smuggling required for the coverage claim. Experiments serve as external validation rather than constructed outputs. The derivation chain relies on external algebraic and statistical facts and remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper relies on standard summation identities and the exchangeability assumption underlying conformal prediction; no free parameters, invented entities, or ad-hoc constants are introduced in the abstract.

axioms (2)

domain assumption The unbiased sample covariance uses divisor (t-1) for t observations.
Standard statistical definition invoked for the estimator S_t.
domain assumption Conformal prediction yields finite-sample coverage under exchangeability.
Core property used to guarantee the confidence sets C_{t,jk}.

pith-pipeline@v0.9.0 · 5698 in / 1441 out tokens · 41277 ms · 2026-05-09T19:28:51.175472+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references

[1]

Angelopoulos, A. N. and Bates, S. (2023). Conformal prediction: A gentle introduction. Foun- dations and Trends in Machine Learning, 16(4):494–591

2023
[2]

F., Golub, G

Chan, T. F., Golub, G. H., and LeVeque, R. J. (1979). Updating formulae and a pairwise algo- rithm for computing sample variances. Technical Report STAN-CS-79-773, Stanford University

1979
[3]

Golub, G. H. and Van Loan, C. F. (2013). Matrix Computations, 4th ed. Johns Hopkins Univer- sity Press

2013
[4]

Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms, 2nd ed. SIAM

2002
[5]

Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, 2nd ed. Springer

1998
[6]

Newey, W . K. and West, K. D. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55(3):703–708

1987
[7]

Papadopoulos, H., Proedrou, K., Vovk, V ., and Gammerman, A. (2002). Inductive confidence machines for regression. In Proc. ECML, pp. 345–356

2002
[8]

Reichel, F. (2025). High-Performance Variance-Covariance Matrix Construction Using an Un- centered Gram Formulation, Forthcoming in European Journal of Mathematics and Statistics (EJ-MATH). 16 2𝐵 or Not 2𝐵: A Tale of Three Algorithms for Streaming

2025
[9]

Vovk, V ., Gammerman, A., and Shafer, G. (2005). Algorithmic Learning in a Random World . Springer

2005
[10]

Welford, B. P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3):419–420

1962
[11]

White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4):817–838. 13 Ommited Proofs 13.1 Proof of Theorem 3.1: Gram Identity Proof. Expand the ( 𝑘, 𝑙) entry of the left side of ( 2) directly: 𝑡∑ 𝑖=1 ( 𝑥𝑖𝑘 − ̄𝑥𝑘)( 𝑥𝑖𝑙− ̄𝑥𝑙) = 𝑡∑ 𝑖=1 𝑥𝑖𝑘𝑥𝑖𝑙 − ̄𝑥𝑘 𝑡∑ 𝑖=1 𝑥𝑖𝑙 − ̄𝑥𝑙 𝑡∑ 𝑖=1 𝑥𝑖𝑘 +...

1980