A scheme that hashes indices to Lévy processes and invokes the Lévy-Khintchine theorem unifies prior sketches for f-moment estimation and extends tractability to many nearly periodic functions.
Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We describe a new cardinality estimation algorithm that is extremely space-efficient. It applies one of three novel estimators to the compressed state of the Flajolet-Martin-85 coupon collection process. In an apples-to-apples empirical comparison against compressed HyperLogLog sketches, the new algorithm simultaneously wins on all three dimensions of the time/space/accuracy tradeoff. Our prototype uses the zstd compression library, and produces sketches that are smaller than the entropy of HLL, so no possible implementation of compressed HLL can match its space efficiency. The paper's technical contributions include analyses and simulations of the three new estimators, accurate values for the entropies of FM85 and HLL, and a non-trivial method for estimating a double asymptotic limit via simulation.
fields
cs.DS 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Unified Construction of Streaming Sketches via the L\'evy-Khintchine Representation Theorem
A scheme that hashes indices to Lévy processes and invokes the Lévy-Khintchine theorem unifies prior sketches for f-moment estimation and extends tractability to many nearly periodic functions.