We study a variant of classical clustering formulations in the context of algorithmic fairness, known as diversity-aware clustering. In this variant we are given a collection of facility subsets, and a solution must contain at least a specified number of facilities from each subset while simultaneously minimizing the clustering objective ($k$-median or $k$-means). We investigate the fixed-parameter tractability of these problems and show several negative hardness and inapproximability results, even when we afford exponential running time with respect to some parameters. Motivated by these results we identify natural parameters of the problem, and present fixed-parameter approximation algorithms with approximation ratios $\big(1 + \frac{2}{e} +\epsilon \big)$ and $\big(1 + \frac{8}{e}+ \epsilon \big)$ for diversity-aware $k$-median and diversity-aware $k$-means respectively, and argue that these ratios are essentially tight assuming the gap-exponential time hypothesis. We also present a simple and more practical bicriteria approximation algorithm with better running time bounds. We finally propose efficient and practical heuristics. We evaluate the scalability and effectiveness of our methods in a wide variety of rigorously conducted experiments, on both real and synthetic data.
翻译:我们从算法公平的角度研究传统组群配方的变式,即多样性认知群集。在这个变式中,我们得到了一个设施子集的集合,解决方案必须包含每个子组的至少一定数量的设施,同时最大限度地降低组群目标(美元-中间值或美元-比例 ) 。我们调查了这些问题的固定参数可移动性,并显示出一些负面的硬性和不协调性结果,即使我们在某些参数上提供了指数性运行时间,因此,我们根据这些结果确定了问题的自然参数,并提出了固定参数近距离近似算法,近似率为$\big(1+\frac{%2} ⁇ sepsilon\big) 和$\big(1+\frac{ ⁇ _ ⁇ _ ⁇ ⁇ \ e\ eepsilon\ big) 。我们提出了一种简单和更加实际的、更精确的、更精确的、更精确的合成标准,我们最后提出了一种更精确的、更精确的、更精确的、更精确的模型。