Optimal Data-Dependent Hashing for Approximate Near Neighbors

Alexandr Andoni; Ilya Razenshteyn

arxiv: 1501.01062 · v3 · pith:PHQTRXDDnew · submitted 2015-01-06 · 💻 cs.DS

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Alexandr Andoni , Ilya Razenshteyn This is my paper

classification 💻 cs.DS

keywords hashingoptimalspacedatadata-dependentainr14approximateapproximation

0 comments

read the original abstract

We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an $n$-point data set in a $d$-dimensional space our data structure achieves query time $O(d n^{\rho+o(1)})$ and space $O(n^{1+\rho+o(1)} + dn)$, where $\rho=\tfrac{1}{2c^2-1}$ for the Euclidean space and approximation $c>1$. For the Hamming space, we obtain an exponent of $\rho=\tfrac{1}{2c-1}$. Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors $c>1$. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On the practicality of quantum sieving algorithms for the shortest vector problem
quant-ph 2024-10 unverdicted novelty 6.0

Quantum sieving for SVP in dimension 400 needs ~10^13 physical qubits and ~10^31 years under optimistic assumptions, offering no practical speedup over classical methods.