Recognition: no theorem link
On the LSH Distortion of Ulam and Cayley Similarities
Pith reviewed 2026-05-13 04:49 UTC · model grok-4.3
The pith
Ulam similarity on permutations has LSH distortion O(n / sqrt(log n)) with lower bound Omega(n to the 0.12), while Cayley similarity has distortion Theta(n).
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper shows that the Ulam similarity admits a sublinear LSH distortion of O(n / √log n) and proves a matching-style lower bound of Ω(n^{0.12}) on the best achievable distortion. It further shows that the LSH distortion of the Cayley similarity is Θ(n).
What carries the argument
LSH distortion: the smallest multiplicative scaling factor c such that the scaled similarity c·S admits an exact locality-sensitive hash family where the collision probability equals the scaled similarity value.
If this is right
- An LSH family exists for the Ulam similarity whose collision probabilities match the similarity up to a factor O(n / sqrt(log n)).
- No LSH family for the Ulam similarity can achieve collision probabilities within a factor o(n^{0.12}) of the true similarity.
- Any LSH family for the Cayley similarity must distort the similarity by a factor Theta(n).
- The overhead of LSH-based approximate nearest-neighbor search under Ulam similarity grows sublinearly in n, while under Cayley similarity it grows linearly.
Where Pith is reading between the lines
- The polynomial gap between the Ulam upper and lower bounds indicates that further tightening of the distortion analysis remains possible.
- For data sets of permutations, Cayley-based search may require index structures other than LSH to avoid linear blowup.
- Similar distortion calculations could be performed for other common permutation distances such as Kendall tau.
Load-bearing premise
The multiplicative definition of LSH distortion, together with normalizing similarities to the interval [0,1], is assumed to capture the relevant performance limits of hashing for nearest-neighbor tasks.
What would settle it
An explicit LSH family for the Ulam similarity whose collision probabilities stay within a factor o(n / sqrt(log n)) of the true similarity values, or a direct computation for moderate n showing that the minimal required distortion grows faster than the claimed upper bound.
Figures
read the original abstract
Locality-sensitive hashing (LSH) has found widespread use as a fundamental primitive, particularly to accelerate nearest neighbor search. An LSH scheme for a similarity function $S:\mathcal{X} \times \mathcal{X} \to [0,1]$ is a distribution over hash functions on $\mathcal{X}$ with the property that the probability of collision of any two elements $x,y\in \mathcal{X}$ is exactly equal to $S(x,y)$. However, not all similarity functions admit exact LSH schemes. The notion of LSH distortion measures how multiplicatively close a similarity function is to having an LSH scheme. In this work, we study the LSH distortion of the Ulam and Cayley similarities, which are popular similarity measures on permutations of $n$ elements. We show that the Ulam similarity admits a sublinear LSH distortion of $O(n / \sqrt{\log n})$; we also prove a lower bound of $\Omega(n^{0.12})$ on the best LSH distortion achievable. On the other hand, we show that the LSH distortion of the Cayley similarity is $\Theta(n)$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the LSH distortion of Ulam and Cayley similarities on permutations of n elements. It proves that the Ulam similarity admits an LSH distortion of O(n / √log n) via an embedding construction into a space with exact LSH, together with a matching-style lower bound of Ω(n^{0.12}) obtained by a direct-product reduction; separately, it shows that the Cayley similarity has LSH distortion exactly Θ(n) by a variance argument on collision probabilities.
Significance. If the results hold, they give the first non-trivial characterization of LSH approximability for two standard permutation similarities, with the sublinear upper bound on Ulam and the explicit Ω(n^{0.12}) lower bound being technically notable; the Θ(n) result for Cayley is a clean negative statement. The proofs are internally consistent with the multiplicative distortion definition, and the embedding and reduction techniques strengthen the contribution beyond the abstract bounds.
minor comments (2)
- [Abstract] Abstract: the statement of the Ulam upper bound would be clearer if it briefly indicated the embedding technique used to obtain the O(n / √log n) factor.
- [Section 2] The normalization of similarities to [0,1] and the precise multiplicative distortion definition (referenced in Section 2) should be restated once in the main body with an explicit example for small n to aid readers unfamiliar with the LSH literature.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our work on the LSH distortion of Ulam and Cayley similarities, including recognition of the sublinear upper bound for Ulam, the Ω(n^{0.12}) lower bound, and the tight Θ(n) characterization for Cayley. The recommendation for minor revision is noted.
Circularity Check
No significant circularity detected
full rationale
The paper's upper-bound construction for Ulam similarity (via embedding into a space with exact LSH) and lower-bound reduction (via direct product yielding the n^0.12 exponent), along with the simple variance argument for Cayley similarity's Θ(n) distortion, rely on the multiplicative distortion definition from Section 2 and independent probabilistic arguments. No steps reduce by construction to fitted parameters, self-definitional equivalences, or load-bearing self-citations; the derivation chain is self-contained against the stated definitions and external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math LSH scheme requires collision probability exactly equal to similarity S(x,y)
- domain assumption Ulam and Cayley similarities are normalized to [0,1] on permutations
Reference graph
Works this paper leans on
-
[1]
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. CACM, 51 0 (1): 0 117--122, 2008
work page 2008
-
[2]
The computational hardness of estimating edit distance
Alexandr Andoni and Robert Krauthgamer. The computational hardness of estimating edit distance. SICOMP, 39 0 (6): 0 2398--2429, 2010
work page 2010
-
[3]
Overcoming the l\( _ 1 \) non-embeddability barrier: algorithms for product metrics
Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. Overcoming the l\( _ 1 \) non-embeddability barrier: algorithms for product metrics. In SODA, pages 865--874, 2009
work page 2009
-
[4]
On the resemblance and containment of documents
Andrei Z Broder. On the resemblance and containment of documents. In SEQUENCES, pages 21--29, 1997
work page 1997
-
[5]
Broder, Moses Charikar, Alan M
Andrei Z. Broder, Moses Charikar, Alan M. Frieze, and Michael Mitzenmacher. Min-wise independent permutations. JCSS, 60 0 (3): 0 630--659, 2000
work page 2000
-
[6]
Similarity estimation techniques from rounding algorithms
Moses Charikar. Similarity estimation techniques from rounding algorithms. In STOC, pages 380--388, 2002
work page 2002
-
[7]
Embedding the U lam metric into l\( _ 1 \)
Moses Charikar and Robert Krauthgamer. Embedding the U lam metric into l\( _ 1 \). Theory Comput., 2 0 (11): 0 207--224, 2006
work page 2006
-
[8]
Moses Charikar, Ofir Geri, Michael P. Kim, and William Kuszmaul. On estimating edit distance: Alignment, dimension reduction, and embeddings. In ICALP, pages 34:1--34:14, 2018
work page 2018
-
[9]
On the distortion of locality sensitive hashing
Flavio Chierichetti, Ravi Kumar, Alessandro Panconesi, and Erisa Terolli. On the distortion of locality sensitive hashing. SICOMP, 48 0 (2): 0 350--372, 2019
work page 2019
-
[10]
Group Representations in Probability and Statistics
Persi Diaconis. Group Representations in Probability and Statistics. Institute of Mathematical Statistics, 1988
work page 1988
-
[11]
Similarity search in high dimensions via hashing
Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB, pages 518--529, 1999
work page 1999
-
[12]
Approximate nearest neighbors: towards removing the curse of dimensionality
Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In STOC, pages 604--613, 1998
work page 1998
-
[13]
Piotr Indyk, Rajeev Motwani, Prabhakar Raghavan, and Santosh S. Vempala. Locality-preserving hashing in multidimensional spaces. In STOC, pages 618--625, 1997
work page 1997
-
[14]
Group Theoretical Methods in Machine Learning
Imre Risi Kondor. Group Theoretical Methods in Machine Learning. Columbia University, 2008
work page 2008
-
[15]
Characters of symmetric groups: sharp bounds and applications
Michael Larsen and Aner Shalev. Characters of symmetric groups: sharp bounds and applications. Inventiones mathematicae, 174 0 (3): 0 645--687, 2008
work page 2008
-
[16]
Multi-probe LSH: efficient indexing for high-dimensional similarity search
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. In VLDB, pages 950--961, 2007
work page 2007
-
[17]
A combinatorial proof of a symmetric group character involution
Paul Renteln. A combinatorial proof of a symmetric group character involution. The American Mathematical Monthly, 130 0 (9): 0 855--858, 2023
work page 2023
-
[18]
Upper bound on the characters of the symmetric groups
Yuval Roichman. Upper bound on the characters of the symmetric groups. Invent. Math., 125: 0 451--485, 1996
work page 1996
-
[19]
The Symmetric Group: Representations, Combinatorial Algorithms, and Symmetric Functions
Bruce Sagan. The Symmetric Group: Representations, Combinatorial Algorithms, and Symmetric Functions. Springer Science & Business Media, 2001
work page 2001
-
[20]
Bayesian locality sensitive hashing for fast similarity search
Venu Satuluri and Srinivasan Parthasarathy. Bayesian locality sensitive hashing for fast similarity search. Proc. VLDB Endow. , 5 0 (5): 0 430--441, 2012
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.