A Unified Construction of Streaming Sketches via the L\'evy-Khintchine Representation Theorem
Pith reviewed 2026-05-23 19:23 UTC · model grok-4.3
The pith
Hashing indices to Lévy processes produces streaming sketches for any f-moment whose f is a characteristic exponent, as classified by the Lévy-Khintchine theorem.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that any f arising as the characteristic exponent of a Lévy process yields an f-moment that admits an efficient streaming sketch obtained simply by hashing each stream index to an independent copy of that Lévy process; the Lévy-Khintchine theorem then completely characterizes the admissible f by giving their integral representation in terms of a drift, a Gaussian coefficient, and a Lévy measure. The resulting sketches work in the d-dimensional turnstile model and can be made heterogeneous by drawing different Lévy processes for different indices.
What carries the argument
Hashing each index to a Lévy process whose characteristic exponent is the desired f, so that the sketch aggregates independent realizations whose expectation or variance yields an estimator of the sum; the Lévy-Khintchine theorem supplies the exact functional form of every admissible f.
If this is right
- Many existing sketches for specific moments arise as direct instances of the generic Lévy-process construction.
- A broad class of nearly periodic functions, not previously known to be tractable, now admit efficient sketches.
- The scheme extends verbatim to d-dimensional streams via multidimensional Lévy processes.
- Heterogeneous moments are obtained by projecting different indices onto different Lévy processes.
Where Pith is reading between the lines
- The Fourier-Hahn-Lévy method conjectured in the paper may furnish a complete, parameter-free classification of all tractable f.
- Choosing Lévy measures with particular tail or periodicity properties could systematically generate sketches for new families of moment functions.
- The same hashing-to-process idea might be testable on other stochastic-process representations that admit characteristic exponents.
Load-bearing premise
Indices can be hashed to Lévy processes in the streaming model so that the sketch produces an unbiased or low-variance estimator of the characteristic exponent without further assumptions on the input distribution or on the ability to simulate the processes efficiently.
What would settle it
Exhibit an explicit function f that possesses a Lévy-Khintchine representation yet admits no small-space turnstile sketch, or conversely a function outside every Lévy-Khintchine form that nonetheless possesses such a sketch.
Figures
read the original abstract
In the $d$-dimensional turnstile streaming model, a frequency vector $\mathbf{x}=(\mathbf{x}(1),\ldots,\mathbf{x}(n))\in (\mathbb{R}^d)^n$ is updated entry-wisely over a stream. We consider the problem of $f$-moment estimation, where one wants to estimate $$f(\mathbf{x})=\sum_{v\in[n]}f(\mathbf{x}(v))$$ with a small-space sketch. In this work we present a simple and generic scheme to construct sketches with the novel idea of hashing indices to L\'evy processes, from which one can estimate the $f$-moment $f(\mathbf{x})$ where $f$ is the characteristic exponent of the L\'evy process. The fundamental L\'evy-Khintchine representation theorem completely characterizes the space of all possible characteristic exponents, which in turn characterizes the set of $f$-moments that can be estimated by this generic scheme. The new scheme has strong explanatory power. It unifies the construction of many existing sketches and it implies the tractability of many nearly periodic functions that were previously unclassified. Furthermore, the scheme can be conveniently generalized to multidimensional cases ($d\geq 2$) by considering multidimensional L\'evy processes and can be further generalized to estimate heterogeneous moments by projecting different indices with different L\'evy processes. We conjecture that the set of tractable functions can be characterized using the L\'evy-Khintchine representation theorem via what we called the Fourier-Hahn-L\'evy method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a generic construction for sketches estimating f-moments f(x) = sum_v f(x(v)) in the d-dimensional turnstile streaming model. Indices are hashed to Lévy processes whose characteristic exponent is the target f; the Lévy-Khintchine theorem is invoked to characterize the admissible f. The scheme is claimed to unify existing sketches, render many nearly periodic functions tractable, extend immediately to d ≥ 2 via multidimensional Lévy processes, and support heterogeneous moments by using distinct processes per index. A conjecture asserts that the set of tractable f is characterized by the Lévy-Khintchine representation via a “Fourier-Hahn-Lévy method.”
Significance. If the reduction from an arbitrary Lévy triplet to a constant-time, unbiased, low-variance streaming update is supplied and verified, the work would supply a single probabilistic lens that explains the design of many known sketches and predicts new ones. The appeal to a fundamental theorem rather than ad-hoc parameters is a conceptual strength; the multidimensional and heterogeneous extensions follow naturally once the scalar case is settled.
major comments (3)
- [Abstract] Abstract and conjecture paragraph: the claim that “hashing indices to Lévy processes” yields an estimator of f(x(v)) whose expectation recovers the desired moment is stated without an explicit reduction showing how an arbitrary Lévy triplet (drift, Brownian coefficient, Lévy measure) produces a constant-time update rule and an unbiased statistic in the turnstile model for arbitrary real-valued updates.
- [Conjecture] The paragraph introducing the Fourier-Hahn-Lévy method: no formal definition of the method, no statement of the precise mapping from Lévy triplets to estimators, and no proof sketch or variance calculation are supplied, rendering the conjecture unverifiable from the given text.
- [Abstract] The unification claim: while the abstract asserts that the scheme “unifies the construction of many existing sketches,” no concrete example is worked out (e.g., which existing sketch corresponds to which Lévy triplet and how the update matches the known algorithm).
minor comments (2)
- Notation: the vector x is written both as x ∈ (R^d)^n and as x(v) ∈ R^d; a single consistent definition would remove ambiguity.
- [Abstract] The phrase “nearly periodic functions that were previously unclassified” is used without citing the prior classification attempts or the precise sense in which the new functions become tractable.
Simulated Author's Rebuttal
We thank the referee for their thorough review and valuable suggestions. We will revise the manuscript to provide the explicit details requested, strengthening the presentation of our results.
read point-by-point responses
-
Referee: [Abstract] Abstract and conjecture paragraph: the claim that “hashing indices to Lévy processes” yields an estimator of f(x(v)) whose expectation recovers the desired moment is stated without an explicit reduction showing how an arbitrary Lévy triplet (drift, Brownian coefficient, Lévy measure) produces a constant-time update rule and an unbiased statistic in the turnstile model for arbitrary real-valued updates.
Authors: We agree that the manuscript would benefit from an explicit reduction. In the revised version, we will add a dedicated subsection detailing how any Lévy triplet can be used to construct a constant-time update rule that produces an unbiased estimator for the corresponding f-moment in the turnstile model, including handling of arbitrary real-valued updates. This will make the connection between the Lévy process and the streaming sketch fully rigorous and verifiable. revision: yes
-
Referee: [Conjecture] The paragraph introducing the Fourier-Hahn-Lévy method: no formal definition of the method, no statement of the precise mapping from Lévy triplets to estimators, and no proof sketch or variance calculation are supplied, rendering the conjecture unverifiable from the given text.
Authors: The conjecture is presented at a high level in the current manuscript. We will revise to include a formal definition of the Fourier-Hahn-Lévy method, the precise mapping from Lévy triplets to the corresponding estimators, a proof sketch for the characterization, and initial variance bounds. This will allow readers to verify the conjecture's scope. revision: yes
-
Referee: [Abstract] The unification claim: while the abstract asserts that the scheme “unifies the construction of many existing sketches,” no concrete example is worked out (e.g., which existing sketch corresponds to which Lévy triplet and how the update matches the known algorithm).
Authors: We acknowledge the need for concrete examples to illustrate the unification. The revised manuscript will include a new section or subsection providing specific mappings, for instance, showing how the AMS sketch or Count-Sketch corresponds to particular choices of Lévy triplets (e.g., stable distributions or Brownian motion components), and demonstrating that the resulting update rules coincide with the standard implementations. revision: yes
Circularity Check
No significant circularity; relies on external Lévy-Khintchine theorem
full rationale
The paper presents a generic sketching scheme based on hashing to Lévy processes whose characteristic exponents are given by the external Lévy-Khintchine representation theorem. The abstract explicitly invokes this known theorem from probability theory to characterize the admissible f functions, without any self-citation, fitted parameters, or self-referential definitions that reduce the claimed result to its inputs. No equations or steps in the provided text exhibit a reduction by construction, and the conjecture is presented as such rather than as a derived claim.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math The Lévy-Khintchine representation theorem completely characterizes all possible characteristic exponents of Lévy processes.
invented entities (1)
-
Fourier-Hahn-Lévy method
no independent evidence
Reference graph
Works this paper leans on
-
[1]
The space complexity of approximating the frequency moments.J
Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments.J. Comput. Syst. Sci., 58(1):137–147, 1999
work page 1999
-
[2]
Ziv Bar-Yossef, Thathachar S. Jayram, Ravi Kumar, and D. Sivakumar. An information statistics approach to data stream and communication complexity.Journal of Computer and System Sciences, 68(4):702–732, 2004
work page 2004
-
[3]
Vladimir Braverman and Stephen R Chestnut. Universal sketches for the frequency negative moments and other decreasing streaming sums.Approximation, Randomization, and Combi- natorial Optimization. Algorithms and Techniques, page 591, 2015
work page 2015
-
[4]
Vladimir Braverman, Stephen R. Chestnut, David P. Woodruff, and Lin F. Yang. Streaming space complexity of nearly all functions of one variable on frequency vectors. InProceedings 35th ACM Symposium on Principles of Database Systems (PODS), pages 261–276, 2016
work page 2016
-
[5]
Vladimir Braverman and Rafail Ostrovsky. Zero-one frequency laws. InProceedings 42nd ACM Symposium on Theory of Computing (STOC), pages 281–290, 2010
work page 2010
-
[6]
Vladimir Braverman and Rafail Ostrovsky. Generalizing the layering method of Indyk and Woodruff: Recursive sketches for frequency-based vectors on streams. In Proceedings 16th International Workshop on Approximation, Randomization, and Combinatorial Optimization (APPROX), volume 8096 ofLecture Notes in Computer Science, pages 58–70. Springer, 2013
work page 2013
-
[7]
PhD thesis, Johns Hopkins University, 2015
Stephen Robert Chestnut.Stream sketches, sampling, and sabotage. PhD thesis, Johns Hopkins University, 2015
work page 2015
-
[8]
Comparing data streams using Hamming norms (how to zero in)
Graham Cormode, Mayur Datar, Piotr Indyk, and Shanmugavelayutham Muthukrishnan. Comparing data streams using Hamming norms (how to zero in). IEEE Transactions on Knowledge and Data Engineering, 15(3):529–540, 2003
work page 2003
-
[9]
A unifying framework forℓ0-sampling algorithms
Graham Cormode and Donatella Firmani. A unifying framework forℓ0-sampling algorithms. Distributed Parallel Databases, 32(3):315–335, 2014
work page 2014
-
[10]
Cambridge University Press, 2019
Rick Durrett.Probability: theory and examples. Cambridge University Press, 2019
work page 2019
-
[11]
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm
Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. InProceedings of the 18th Inter- national Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA), pages 127–146, 2007
work page 2007
-
[12]
Philippe Flajolet and G. Nigel Martin. Probabilistic counting algorithms for data base appli- cations. Journal of Computer and System Sciences, 31(2):182–209, 1985
work page 1985
-
[13]
Estimating frequency moments of data streams using random linear com- binations
Sumit Ganguly. Estimating frequency moments of data streams using random linear com- binations. In Proceedings 8th International Workshop on Randomization and Computation (RANDOM), volume 3122 ofLecture Notes in Computer Science, pages 369–380, 2004
work page 2004
-
[14]
Estimating hybrid frequency moments of data streams
Sumit Ganguly, Mohit Bansal, and Shruti Dube. Estimating hybrid frequency moments of data streams. Journal of Combinatorial Optimization, 23(3):373–394, 2012. 21
work page 2012
-
[15]
Stable distributions, pseudorandom generators, embeddings, and data stream computation
Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM, 53(3):307–323, 2006
work page 2006
-
[16]
Kane, Jelani Nelson, and David P
Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinct elements problem. In Proceedings 29th ACM Symposium on Principles of Database Systems (PODS), pages 41–52, 2010
work page 2010
-
[17]
Kevin J. Lang. Back to the future: an even more nearly optimal cardinality estimation algo- rithm. CoRR, abs/1708.06839, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[18]
Ping Li. Estimators and tail bounds for dimension reduction inlα (0 < α ≤ 2) using stable ran- dom projections. InProceedings 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 10–19, 2008
work page 2008
-
[19]
Yi Li, Huy L. Nguyen, and David P. Woodruff. Turnstile streaming algorithms might as well be linear sketches. In Proceedings 46th Annual ACM Symposium on Theory of Computing (STOC), pages 174–183, 2014
work page 2014
-
[20]
Counting large number of events in small registers.Communications of the ACM, 21(10):840–842, 1978
Robert Morris. Counting large number of events in small registers.Communications of the ACM, 21(10):840–842, 1978
work page 1978
-
[21]
J. Ian Munro and Mike Paterson. Selection and sorting with limited storage.Theor. Comput. Sci., 12:315–323, 1980
work page 1980
-
[22]
Information theoretic limits of cardinality estimation: Fisher meets Shannon
Seth Pettie and Dingyu Wang. Information theoretic limits of cardinality estimation: Fisher meets Shannon. In Proceedings 53rd Annual ACM Symposium on Theory of Computing (STOC), pages 556–569, 2021
work page 2021
-
[23]
Cambridge University Press, 1999
Ken-Iti Sato.Lévy processes and infinitely divisible distributions. Cambridge University Press, 1999
work page 1999
-
[24]
Better cardinality estimators for HyperLogLog, PCSA, and beyond
Dingyu Wang and Seth Pettie. Better cardinality estimators for HyperLogLog, PCSA, and beyond. In Proceedings 42nd ACM Symposium on Principles of Database Systems (PODS), pages 317–327, 2023. A Analysis of gnp,w Proof of Lemma 7.This proof is by induction onw. It is true forw = 1 since gnp,1(x) = 2 + 1 3 · 2 (1 − cos(πx)) = 1 (2 ∤ x) . Observe that forx su...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.