pith. machine review for the scientific record. sign in

arxiv: 1109.2378 · v1 · submitted 2011-09-12 · 📊 stat.ML · cs.DS

Recognition: unknown

Modern hierarchical, agglomerative clustering algorithms

Authors on Pith no claims yet
classification 📊 stat.ML cs.DS
keywords algorithmsclusteringdataagglomerativecurrentefficientlygivenhierarchical
0
0 comments X
read the original abstract

This paper presents algorithms for hierarchical, agglomerative clustering which perform most efficiently in the general-purpose setup that is given in modern standard software. Requirements are: (1) the input data is given by pairwise dissimilarities between data points, but extensions to vector data are also discussed (2) the output is a "stepwise dendrogram", a data structure which is shared by all implementations in current standard software. We present algorithms (old and new) which perform clustering in this setting efficiently, both in an asymptotic worst-case analysis and from a practical point of view. The main contributions of this paper are: (1) We present a new algorithm which is suitable for any distance update scheme and performs significantly better than the existing algorithms. (2) We prove the correctness of two algorithms by Rohlf and Murtagh, which is necessary in each case for different reasons. (3) We give well-founded recommendations for the best current algorithms for the various agglomerative clustering schemes.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering

    cs.CL 2026-04 unverdicted novelty 7.0

    Claim2Vec is a contrastively fine-tuned multilingual encoder that improves claim clustering performance and embedding space structure on multilingual fact-check datasets.

  2. VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection

    cs.AI 2026-05 unverdicted novelty 6.0

    VecCISC filters equivalent, degenerate, or hallucinated reasoning traces via semantic clustering before critic evaluation, reducing token use by 47% with no loss in accuracy versus standard CISC.

  3. Multiscale Cochran-Mantel-Haenszel Scanning for Conditional Dependency

    stat.ME 2026-04 unverdicted novelty 6.0

    Multiscale CMH scanning generalizes the classic test to continuous spaces, achieving consistency for conditional independence testing by conditioning on marginal order statistics without requiring large stratum sizes.

  4. ClusterChirp: Scalable Interactive Exploration of Omics Data with Natural Language-Guided Analysis

    q-bio.GN 2026-02 accept novelty 6.0

    ClusterChirp is a freely available web tool for scalable interactive visualization, hierarchical clustering, and natural-language-guided analysis of high-dimensional omics datasets.

  5. GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion

    cs.AI 2026-04 unverdicted novelty 5.0

    GS-Quant generates coarse-to-fine discrete codes for KG entities via semantic hierarchy injection and causal sequence reconstruction, enabling LLMs to perform knowledge graph completion by treating the codes as vocabu...