Constructing small-sized coresets for various clustering problems in different metric spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for $(k,z)$-clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for $k$-means ($z=2$) is $\min \{O(k^{3/2} \varepsilon^{-2}),O(k \varepsilon^{-4})\}$ [1,2], while the best known lower bound is $\Omega(k\varepsilon^{-2})$ [1]. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i.e., $\varepsilon, k$), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound $\Omega(k \varepsilon^{-z-2})$ for Euclidean $(k,z)$-clustering when $\varepsilon \geq \Omega(k^{-1/(z+2)})$. In view of the prior upper bound $\tilde{O}_z(k \varepsilon^{-z-2})$ [1], the bound is optimal. The new lower bound is surprising since $\Omega(k\varepsilon^{-2})$ [1] is ``conjectured" to be the correct bound in some recent works (see e.g., [1,2]]). (2) For the upper bound, we provide efficient coreset construction algorithms for $(k,z)$-clustering with improved or optimal coreset sizes in several metric spaces. [1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.
翻译:在不同度量空间中建立小规模的储备已经引起了过去十年的极大关注。储备文献中的一个核心问题是理解欧几里得空间中$(k,z)$-聚类的最佳储备规模。尽管在该问题上已经取得了显着的进展,但目前的上限和下限仍存在差距。例如,对于$k$-means($z=2$),已知的最佳上界是$\min \{O(k^{3/2} \varepsilon^{-2}),O(k \varepsilon^{-4})\}$[1,2],而已知的最佳下界是$\Omega(k\varepsilon^{-2})$[1]。在本文中,我们在上限和下限上都取得了显着进展。对于很大范围的参数(即$\varepsilon, k$),我们完全了解了最优储备规模。特别地,我们获得了以下结果:(1)当$\varepsilon \geq \Omega(k^{-1/(z+2)})$时,我们提供欧几里得$(k,z)$-聚类的新储备下限$\Omega(k \varepsilon^{-z-2})$。考虑到以前的上限$\tilde{O}_z(k \varepsilon^{-z-2})$[1],这个下限是最优的。这个新的下限很令人惊讶,因为在一些最近的工作中(例如,[1,2]),“猜想”$\Omega(k\varepsilon^{-2})$是正确的下限。(2)关于上限,我们提供了在多个度量空间中实现$(k,z)$-聚类的有效储备构造算法,储备规模得到了改进或达到了最优。[1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.