Given a metric space, the $(k,z)$-clustering problem consists of finding $k$ centers such that the sum of the of distances raised to the power $z$ of every point to its closest center is minimized. This encapsulates the famous $k$-median ($z=1$) and $k$-means ($z=2$) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known as \emph{coresets}, has been an important research direction over the last 15 years. In this paper, we present a new, simple coreset framework that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases.
翻译:考虑到一个公制空间,$(k,z)美元集群问题包括寻找$(k,z)美元中心,这样可以将每个点到其最接近中心的电站的距离总和最小化。这囊括了著名的$(k)美元=1美元和美元(k)美元(z=2美元)集群问题。设计大约保存解决方案成本(又称\emph{coresets})的数据的小空间草图在过去15年中一直是一个重要的研究方向。在本文中,我们提出了一个新的、简单的核心设置框架,同时改进了各种环境的已知最佳界限,包括欧克利底空间、翻倍的通用度、无线度度度度度度和一般指标案例。