Translated title: FPT近似容量/公平聚类离群值问题 Translated abstract: 聚类问题如$k$-Median和$k$-Means等是源自应用程序（如位置规划、无监督学习等）的。在这些应用中，重要的是找到不包含过多点的群集，即不应该包含太多点的群集。这通过对群集大小的容量约束来建模。另一个重要的聚类考虑因素是如何处理数据中存在的离群值。实际上，已经在文献中将这些聚类问题进行了推广，以分别处理容量约束和离群值。据我们所知，很少有研究同时处理容量和离群问题的群集问题的近似性质。我们开始研究带有离群值的容量$k$-Median问题（C$k$MO）。在这里，我们希望将除$m$个离群值以外的所有内容聚类到不超过$k$个簇中，使得（i）簇遵守容量约束，（ii）聚类的成本定义为每个非离群值点到其分配的簇中心的距离之和，最小化。我们设计了C$k$MO的首个常数近似算法。特别地，我们的算法在一般度量空间中返回一个（3+ϵ）-近似值，而在常数维度的欧几里得空间中返回一个（1+ϵ）-近似值，其运行时间为$f(k,m,ϵ)⋅|I_m|^{O(1)}$，其中 $|I_m|$ 表示输入大小。我们还可以将这些结果扩展到更广泛的问题类别，包括带有离群值的容量k-Means/k-Facility Location问题，以及带有离群值的Size-Balanced公平聚类问题。对于每个这些问题，我们获得了近似比，该比率与相应的无离群值问题的最佳已知保证相匹配。 (FPT Approximations for Capacitated/Fair Clustering with Outliers)

翻译：Translated title: FPT近似容量/公平聚类离群值问题 Translated abstract: 聚类问题如$k$-Median和$k$-Means等是源自应用程序（如位置规划、无监督学习等）的。在这些应用中，重要的是找到不包含过多点的群集，即不应该包含太多点的群集。这通过对群集大小的容量约束来建模。另一个重要的聚类考虑因素是如何处理数据中存在的离群值。实际上，已经在文献中将这些聚类问题进行了推广，以分别处理容量约束和离群值。据我们所知，很少有研究同时处理容量和离群问题的群集问题的近似性质。我们开始研究带有离群值的容量$k$-Median问题（C$k$MO）。在这里，我们希望将除$m$个离群值以外的所有内容聚类到不超过$k$个簇中，使得（i）簇遵守容量约束，（ii）聚类的成本定义为每个非离群值点到其分配的簇中心的距离之和，最小化。我们设计了C$k$MO的首个常数近似算法。特别地，我们的算法在一般度量空间中返回一个（3+ϵ）-近似值，而在常数维度的欧几里得空间中返回一个（1+ϵ）-近似值，其运行时间为$f(k,m,ϵ)⋅|I_m|^{O(1)}$，其中 $|I_m|$ 表示输入大小。我们还可以将这些结果扩展到更广泛的问题类别，包括带有离群值的容量k-Means/k-Facility Location问题，以及带有离群值的Size-Balanced公平聚类问题。对于每个这些问题，我们获得了近似比，该比率与相应的无离群值问题的最佳已知保证相匹配。

Rajni Dabas,Neelima Gupta,Tanmay Inamdar

from arxiv, Abstract shortened to meet arxiv requirements

Clustering problems such as $k$-Median, and $k$-Means, are motivated from applications such as location planning, unsupervised learning among others. In such applications, it is important to find the clustering of points that is not ``skewed'' in terms of the number of points, i.e., no cluster should contain too many points. This is modeled by capacity constraints on the sizes of clusters. In an orthogonal direction, another important consideration in clustering is how to handle the presence of outliers in the data. Indeed, these clustering problems have been generalized in the literature to separately handle capacity constraints and outliers. To the best of our knowledge, there has been very little work on studying the approximability of clustering problems that can simultaneously handle both capacities and outliers. We initiate the study of the Capacitated $k$-Median with Outliers (C$k$MO) problem. Here, we want to cluster all except $m$ outlier points into at most $k$ clusters, such that (i) the clusters respect the capacity constraints, and (ii) the cost of clustering, defined as the sum of distances of each non-outlier point to its assigned cluster-center, is minimized. We design the first constant-factor approximation algorithms for C$k$MO. In particular, our algorithm returns a (3+\epsilon)-approximation for C$k$MO in general metric spaces, and a (1+\epsilon)-approximation in Euclidean spaces of constant dimension, that runs in time in time $f(k, m, \epsilon) \cdot |I_m|^{O(1)}$, where $|I_m|$ denotes the input size. We can also extend these results to a broader class of problems, including Capacitated k-Means/k-Facility Location with Outliers, and Size-Balanced Fair Clustering problems with Outliers. For each of these problems, we obtain an approximation ratio that matches the best known guarantee of the corresponding outlier-free problem.

翻译：