Recognition: unknown
Modern hierarchical, agglomerative clustering algorithms
read the original abstract
This paper presents algorithms for hierarchical, agglomerative clustering which perform most efficiently in the general-purpose setup that is given in modern standard software. Requirements are: (1) the input data is given by pairwise dissimilarities between data points, but extensions to vector data are also discussed (2) the output is a "stepwise dendrogram", a data structure which is shared by all implementations in current standard software. We present algorithms (old and new) which perform clustering in this setting efficiently, both in an asymptotic worst-case analysis and from a practical point of view. The main contributions of this paper are: (1) We present a new algorithm which is suitable for any distance update scheme and performs significantly better than the existing algorithms. (2) We prove the correctness of two algorithms by Rohlf and Murtagh, which is necessary in each case for different reasons. (3) We give well-founded recommendations for the best current algorithms for the various agglomerative clustering schemes.
This paper has not been read by Pith yet.
Forward citations
Cited by 5 Pith papers
-
Claim2Vec: Embedding Fact-Check Claims for Multilingual Similarity and Clustering
Claim2Vec is a contrastively fine-tuned multilingual encoder that improves claim clustering performance and embedding space structure on multilingual fact-check datasets.
-
VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection
VecCISC filters equivalent, degenerate, or hallucinated reasoning traces via semantic clustering before critic evaluation, reducing token use by 47% with no loss in accuracy versus standard CISC.
-
Multiscale Cochran-Mantel-Haenszel Scanning for Conditional Dependency
Multiscale CMH scanning generalizes the classic test to continuous spaces, achieving consistency for conditional independence testing by conditioning on marginal order statistics without requiring large stratum sizes.
-
ClusterChirp: Scalable Interactive Exploration of Omics Data with Natural Language-Guided Analysis
ClusterChirp is a freely available web tool for scalable interactive visualization, hierarchical clustering, and natural-language-guided analysis of high-dimensional omics datasets.
-
GS-Quant: Granular Semantic and Generative Structural Quantization for Knowledge Graph Completion
GS-Quant generates coarse-to-fine discrete codes for KG entities via semantic hierarchy injection and causal sequence reconstruction, enabling LLMs to perform knowledge graph completion by treating the codes as vocabu...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.