Maximizing a submodular function has a wide range of applications in machine learning and data mining. One such application is data summarization whose goal is to select a small set of representative and diverse data items from a large dataset. However, data items might have sensitive attributes such as race or gender, in this setting, it is important to design \emph{fairness-aware} algorithms to mitigate potential algorithmic bias that may cause over- or under- representation of particular groups. Motivated by that, we propose and study the classic non-monotone submodular maximization problem subject to novel group fairness constraints. Our goal is to select a set of items that maximizes a non-monotone submodular function, while ensuring that the number of selected items from each group is proportionate to its size, to the extent specified by the decision maker. We develop the first constant-factor approximation algorithms for this problem. We also extend the basic model to incorporate an additional global size constraint on the total number of selected items.
翻译:在机器学习和数据挖掘中,一个子模块函数的最大化应用范围很广,其应用范围很广。一个这样的应用是数据汇总,目标是从大型数据集中选择一组有代表性和多样性的小型数据项目。然而,在这种背景下,数据项目可能具有种族或性别等敏感属性,因此,必须设计 emph{fairness-aware} 算法,以减少可能导致特定群体代表过多或代表性不足的潜在算法偏差。为此,我们提议并研究传统的非分子子模块最大化问题,但须受新的群体公平性限制。我们的目标是选择一组项目,使非分子子模块功能最大化,同时确保每个组的选定项目数量与决策制定者规定的范围相称。我们为此开发了第一个不变要素近似算法。我们还扩展了基本模型,以纳入对选定项目总数的额外全球规模限制。