Center-based clustering techniques are fundamental in some areas of machine learning such as data summarization. Generic $k$-center algorithms can produce biased cluster representatives so there has been a recent interest in fair $k$-center clustering. Our main theoretical contributions are two new $(3+\epsilon)$-approximation algorithms for solving the fair $k$-center problem in (1) the dynamic incremental, i.e., one-pass streaming, model and (2) the MapReduce model. Our dynamic incremental algorithm is the first such algorithm for this problem (previous streaming algorithms required two passes) and our MapReduce one improves upon the previous approximation factor of $(17+\epsilon).$ Both algorithms work by maintaining a small coreset to represent the full point set and their analysis requires that the underlying metric has finite-doubling dimension. We also provide related heuristics for higher dimensional data and experimental results that compare the performance of our algorithms to existing ones.
翻译:以中心为基础的集群技术对于诸如数据总和等机器学习的某些领域至关重要。 通用的 $k$ 中心算法可以产生偏差的组群代表, 因此最近对公平的 $k$ 中心群集产生了兴趣。 我们的主要理论贡献是两种新的( 3 ⁇ epsilon) $- approcolation 算法, 以解决公平的 $k$ 中心问题:(1) 动态递增, 即单向流、 模型和 (2) 地图显示模型。 我们的动态递增算法是这一问题的首种( 以往的流算法需要两次通行证), 而我们的马普鲁算法在先前的近似系数( 17 ⁇ - epsilon) 上有所改进 。 两种算法都通过维持一个小核心来代表全点集, 其分析要求基本指标具有限定的多维维维维维度。 我们还提供了相关的高维数据和实验结果, 将我们的算法的性能和实验结果进行比较。