logo
today local_bar
Poster Session 6 · Friday, December 5, 2025 4:30 PM → 7:30 PM
#815

A Beyond-Worst-Case Analysis of Greedy k-means++

NeurIPS Slides Poster OpenReview

Abstract

-means++ and the related greedy -means++ algorithm are celebrated algorithms that efficiently compute seeds for Lloyd's algorithm. Greedy -means++ is a generalization of -means++ where, in each iteration, a new seed is greedily chosen among multiple points sampled, as opposed to a single seed being sampled in -means++. While empirical studies consistently show the superior performance of greedy -means++, making it a preferred method in practice, a discrepancy exists between theory and practice. No theoretical justification currently explains this improved performance. Indeed, the prevailing theory suggests that greedy -means++ exhibits worse performance than -means++ in worst-case scenarios.
This paper presents an analysis demonstrating the outperformance of the greedy algorithm compared to -means++ for a natural class of well-separated instances with exponentially decaying distributions, such as Gaussian, specifically when , a common parameter setting in practical applications.
Poster