$k$-means clustering is a well-studied problem due to its wide applicability. Unfortunately, there exist strong theoretical limits on the performance of any algorithm for the $k$-means problem on worst-case inputs. To overcome this barrier, we consider a scenario where "advice" is provided to help perform clustering. Specifically, we consider the $k$-means problem augmented with a predictor that, given any point, returns its cluster label in an approximately optimal clustering up to some, possibly adversarial, error. We present an algorithm whose performance improves along with the accuracy of the predictor, even though na\"{i}vely following the accurate predictor can still lead to a high clustering cost. Thus if the predictor is sufficiently accurate, we can retrieve a close to optimal clustering with nearly optimal runtime, breaking known computational barriers for algorithms that do not have access to such advice. We evaluate our algorithms on real datasets and show significant improvements in the quality of clustering.
翻译:$k$ 比例分组是一个研究周全的问题, 因为它具有广泛适用性。 不幸的是, 最坏的投入中, $k$ 比例问题的任何算法的性能在理论上都存在很强的限制。 为了克服这一障碍, 我们考虑一个“ 建议” 来帮助进行分组。 具体地说, 我们认为, $k$ 比例问题因一个预测器而加剧, 预测器在任何点上, 将它的分组标签以大致最佳的组合方式返回到某些, 可能是对立的错误。 我们提出了一个算法, 它的性能与预测器的准确性一起提高, 尽管跟踪准确的预测器仍然可以导致高的组合成本。 因此, 如果预测器足够准确, 我们可以找到接近于最佳运行时间的最佳组合, 打破已知的计算障碍, 用于无法获取这种建议。 我们评估了真实数据集的算法, 并显示组合质量的显著改善 。