pith. sign in

arxiv: cs/9901013 · v1 · submitted 1999-01-26 · 💻 cs.CG

Analysis of approximate nearest neighbor searching with clustered point sets

classification 💻 cs.CG
keywords datanearestneighboranalysisapproximatekd-treemethodpoints
0
0 comments X
read the original abstract

We present an empirical analysis of data structures for approximate nearest neighbor searching. We compare the well-known optimized kd-tree splitting method against two alternative splitting methods. The first, called the sliding-midpoint method, which attempts to balance the goals of producing subdivision cells of bounded aspect ratio, while not producing any empty cells. The second, called the minimum-ambiguity method is a query-based approach. In addition to the data points, it is also given a training set of query points for preprocessing. It employs a simple greedy algorithm to select the splitting plane that minimizes the average amount of ambiguity in the choice of the nearest neighbor for the training points. We provide an empirical analysis comparing these two methods against the optimized kd-tree construction for a number of synthetically generated data and query sets. We demonstrate that for clustered data and query sets, these algorithms can provide significant improvements over the standard kd-tree construction for approximate nearest neighbor searching.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Excitons in Large Disordered Boron-Nitride Layer using Linear-Scaling Bethe-Salpeter Simulations

    cond-mat.mtrl-sci 2026-06 unverdicted novelty 7.0

    A new real-space linear-scaling Bethe-Salpeter framework with sublattice-resolved decoupling and Kernel Polynomial Method enables O(N) excitonic absorption spectra for Anderson-disordered hBN, showing asymmetric broad...

  2. Automated Erythrocyte Detection and Tracking for Retinal Blood Flow Quantification in Erythrocyte-Mediated Angiography

    cs.CV 2026-05 unverdicted novelty 6.0

    EMTrack framework with flow-context detection and topology-aware tracking outperforms baselines on the new RBF-EMA dataset for erythrocyte detection, tracking, and retinal blood flow quantification.

  3. UD-DML: Uniform Design Subsampling for Double Machine Learning over Massive Data

    stat.ME 2026-05 unverdicted novelty 6.0

    UD-DML creates balanced representative subsamples via uniform design in PCA space for efficient double machine learning estimation of average treatment effects on large datasets.

  4. SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python

    cs.MS 2019-07 accept novelty 2.0

    SciPy 1.0 documents a mature open-source library that has become the de facto standard for scientific algorithms in Python with broad adoption across research projects.