为公平分类选择无偏见的子数据:统一框架和可伸缩的算法 (Unbiased Subdata Selection for Fair Classification: A Unified Framework and Scalable Algorithms)

As an important problem in modern data analytics, classification has witnessed varieties of applications from different domains. Different from conventional classification approaches, fair classification concerns the issues of unintentional biases against the sensitive features (e.g., gender, race). Due to high nonconvexity of fairness measures, existing methods are often unable to model exact fairness, which can cause inferior fair classification outcomes. This paper fills the gap by developing a novel unified framework to jointly optimize accuracy and fairness. The proposed framework is versatile and can incorporate different fairness measures studied in literature precisely as well as can be applicable to many classifiers including deep classification models. Specifically, in this paper, we first prove Fisher consistency of the proposed framework. We then show that many classification models within this framework can be recast as mixed-integer convex programs, which can be solved effectively by off-the-shelf solvers when the instance sizes are moderate and can be used as benchmarks to compare the efficiency of approximation algorithms. We prove that in the proposed framework, when the classification outcomes are known, the resulting problem, termed "unbiased subdata selection," is strongly polynomial-solvable and can be used to enhance the classification fairness by selecting more representative data points. This motivates us to develop an iterative refining strategy (IRS) to solve the large-scale instances, where we improve the classification accuracy and conduct the unbiased subdata selection in an alternating fashion. We study the convergence property of IRS and derive its approximation bound. More broadly, this framework can be leveraged to improve classification models with unbalanced data by taking F1 score into consideration.

翻译：作为现代数据分析中的一个重要问题,分类在现代数据分析中见证了不同领域的多种应用。与传统的分类方法不同,公平的分类涉及对敏感特征(如性别、种族)的无意偏见问题。由于公平措施高度不精确,现有方法往往无法模拟准确公平,这可能造成不公平的分类结果。本文件通过开发新的统一框架来填补差距,共同优化准确性和公平性。拟议框架具有多面性,可以纳入文献中研究的不同公平措施,也可以适用于许多分类者,包括深度分类模型。具体而言,在本文件中,我们首先证明拟议框架的渔业一致性问题。然后我们表明,这一框架中的许多分类模式可以重新作为混合 Interger convex 程序进行重新表述,如果实例大小不高,则可以有效地通过现成的解决方案解决问题,并且可以用来作为比较近比值算法效率的基准。我们证明,在拟议的框架中,当了解分类结果时,由此产生的问题被称作“不偏向性子数据选择”的分类方法。我们首先证明,从宽度的分类中可以将许多分类模式重新表述为混合的分类,然后通过我们更精确的精确的分类方法来进行一个升级的分类。我们用一个升级的计算方法来改进的分类,然后用一个升级的顺序来提高的分类方法来改进我们可以改进的分类。我们用来改进的顺序式的顺序式的计算方法来改进的分类。我们用来改进的计算方法来改进了一种升级的顺序的分类。我们用来改进的分类。我们用来用来改进了一种方法来改进的顺序的分类。我们用来改进了一种方法,从而改进的分类方法,从而改进了一种更精确性化的分类。我们用来改进了一种更精确性化的精确性化的精确性化的精确性化的分类。我们用来改进了一种改进了一种改进了一种方法,用来改进了一种可以改进了一个可以改进的分类。