We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.
翻译:我们引入了一种草图和解析方法,以加快K- means群集的Peng-Wei半确定性放松。当数据被适当分离时,我们确定 k- means 最佳组合。 否则,我们的方法为最佳k- means 值提供了高度信任的下限。这个较低的组合是数据驱动的,对数据及其生成方式不作任何假设。我们提供了一套代码和一系列广泛的数字实验,我们利用这个方法来验证K- means++获得的组合解决方案的近似最佳性。