A Random Finite Set Model for Data Clustering

Dinh Phung , Ba-Ngu Bo

Authors on Pith no claims yet

classification 📊 stat.ML

keywords dataclusteringclustersnumberexistingfinitemanymixture

read the original abstract

The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses set-valued data as well as unknown number of clusters, using a Dirichlet Process mixture of Poisson random finite sets. We also develop an efficient Markov Chain Monte Carlo posterior inference technique that can learn the number of clusters and mixture parameters automatically from the data. Numerical studies are presented to demonstrate the salient features of this new model, in particular its capacity to discover extremely unbalanced clusters in data.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Laplace Variational Inference for Dirichlet Process Mixtures of Marked Poisson Point Processes
stat.ME 2026-05 unverdicted novelty 6.0

A Dirichlet process mixture model for marked Poisson point processes with squared-link intensities and Laplace variational inference jointly infers clusters, cluster count, and continuous mark-specific intensity surfaces.