Derives accessible O(κΦ ln(κΦ ||w*||/ε)) iteration bound for rPDHG on unique-optima LPs, with computable Φ, two-stage performance, and equivalence to stability and sharpness.
On the power of linear programming for K-means clustering
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
In [SIAM J. Optim., 2022], the authors introduced a new linear programming (LP) relaxation for K-means clustering. In this paper, we further investigate both theoretical and computational properties of this relaxation. As evident from our numerical experiments with both synthetic real-world data sets, the proposed LP relaxation is almost always tight; i.e. its optimal solution is feasible for the original nonconvex problem. To better understand this unexpected behaviour, on the theoretical side, we focus on K-means clustering with two clusters, and we obtain sufficient conditions under which the LP relaxation is tight. We further analyze the sufficient conditions when the input is generated according to a popular stochastic model and obtain recovery guarantees for the LP relaxation. We conclude our theoretical study by constructing a family of inputs for which the LP relaxation is never tight. Denoting by $n$ the number of data points to be clustered, the LP relaxation contains $\Omega(n^3)$ inequalities making it impractical for large data sets. To address the scalability issue, by building upon a cutting-plane algorithm together with the GPU implementation of PDLP, a first-order method LP solver, we develop an efficient algorithm that solves the proposed LP and hence the K-means clustering problem, for up to $n \leq 4000$ data points.
verdicts
UNVERDICTED 2representative citing papers
A modular CPU-GPU batching framework for branch-and-bound delivers 10-100x speedups with zero optimality gap when certifying optimal cardinality-constrained GLMs.
citing papers explorer
-
Accessible Complexity Bounds for Restarted PDHG on Linear Programs with a Unique Optimizer
Derives accessible O(κΦ ln(κΦ ||w*||/ε)) iteration bound for rPDHG on unique-optima LPs, with computable Φ, two-stage performance, and equivalence to stability and sharpness.
-
From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs
A modular CPU-GPU batching framework for branch-and-bound delivers 10-100x speedups with zero optimality gap when certifying optimal cardinality-constrained GLMs.