分割或不分割:分类中分离处理的影响 (To Split or Not to Split: The Impact of Disparate Treatment in Classification)

Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.

翻译：当机器学习模式根据敏感属性(如年龄、性别)对个人作出不同决定时,就会出现不平等待遇。在预测准确性至关重要的领域,如果适合一个显示不同待遇的模型,就可能被接受。为了评估不同待遇的影响,我们将分裂分类者(即经过培训和单独部署的分类者)的绩效与群体盲分分类者(即不使用敏感属性的分类者)的绩效进行比较(即,不使用敏感属性的分类者)的绩效进行区分。我们引入了分解分类者量化业绩改进的分化好处。计算直接从分类者定义中分解的好处可能难以解决,因为它涉及在无限维度功能空间上解决优化问题。在不同的业绩计量下,我们(一)证明分分解的分类者(即分别培训和单独部署的分类者)与群体分解分类者(即不使用敏感属性的分类者)的绩效相当;(二)为分拆分分分解的分解者(即分类者提供分解分解分类者性绩效改进的精确条件,而分解的分解的好处可能难以解决的问题。在无限的功能功能功能空间空间空间空间空间空间中,我们最终理解数据不会产生有利的结果。