Fair clustering is the process of grouping similar entities together, while satisfying a mathematically well-defined fairness metric as a constraint. Due to the practical challenges in precise model specification, the prescribed fairness constraints are often incomplete and act as proxies to the intended fairness requirement, leading to biased outcomes when the system is deployed. We examine how to identify the intended fairness constraint for a problem based on limited demonstrations from an expert. Each demonstration is a clustering over a subset of the data. We present an algorithm to identify the fairness metric from demonstrations and generate clusters using existing off-the-shelf clustering techniques, and analyze its theoretical properties. To extend our approach to novel fairness metrics for which clustering algorithms do not currently exist, we present a greedy method for clustering. Additionally, we investigate how to generate interpretable solutions using our approach. Empirical evaluation on three real-world datasets demonstrates the effectiveness of our approach in quickly identifying the underlying fairness and interpretability constraints, which are then used to generate fair and interpretable clusters.
翻译:公平分组是将类似实体集中在一起的过程,同时满足了数学上定义明确的公平度量作为制约。由于精确的模型规格中的实际挑战,规定的公平性约束往往不完整,并成为预期公平性要求的替代物,导致在系统部署时出现偏差结果。我们研究如何确定一个基于专家有限演示的问题的预期公平性约束。每次演示都是在数据的一个子集上进行分组。我们提出了一个算法,用以确定示范的公平度量度,并利用现有的现成集群技术生成集群,并分析其理论属性。为了扩大我们的方法,将目前不存在的集群算法的新的公平度度量度扩大,我们提出了一种贪婪的集群方法。此外,我们研究如何利用我们的方法产生可解释的解决办法。对三个真实世界数据集进行的实证性评估表明我们迅速确定基本公平性和可解释性制约的方法的有效性,然后用来产生公平和可解释的集群。