K-tree: Large Scale Document Clustering

Christopher M. De Vries; Shlomo Geva

arxiv: 1001.0830 · v1 · submitted 2010-01-06 · 💻 cs.IR · cs.AI· cs.DS

K-tree: Large Scale Document Clustering

Christopher M. De Vries , Shlomo Geva This is my paper

classification 💻 cs.IR cs.AIcs.DS

keywords documentk-treeclusteringcollectionsefficientk-meanslargeaddress

0 comments

read the original abstract

We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.

This paper has not been read by Pith yet.

K-tree: Large Scale Document Clustering

discussion (0)