pith. sign in

arxiv: 1810.13187 · v1 · pith:2UJUDAZAnew · submitted 2018-10-31 · 💻 cs.DS

Non-Empty Bins with Simple Tabulation Hashing

classification 💻 cs.DS
keywords hashingbinsbloomnon-emptysimpletabulationfalse-positivefilter
0
0 comments X
read the original abstract

We consider the hashing of a set $X\subseteq U$ with $|X|=m$ using a simple tabulation hash function $h:U\to [n]=\{0,\dots,n-1\}$ and analyse the number of non-empty bins, that is, the size of $h(X)$. We show that the expected size of $h(X)$ matches that with fully random hashing to within low-order terms. We also provide concentration bounds. The number of non-empty bins is a fundamental measure in the balls and bins paradigm, and it is critical in applications such as Bloom filters and Filter hashing. For example, normally Bloom filters are proportioned for a desired low false-positive probability assuming fully random hashing (see \url{en.wikipedia.org/wiki/Bloom_filter}). Our results imply that if we implement the hashing with simple tabulation, we obtain the same low false-positive probability for any possible input.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.