学习如何处理存在歧视性的训练数据 (Learning from Discriminatory Training Data)

Supervised learning systems are trained using historical data and, if the data was tainted by discrimination, they may unintentionally learn to discriminate against protected groups. We propose that fair learning methods, despite training on potentially discriminatory datasets, shall perform well on fair test datasets. Such dataset shifts crystallize application scenarios for specific fair learning methods. For instance, the removal of direct discrimination can be represented as a particular dataset shift problem. For this scenario, we propose a learning method that provably minimizes model error on fair datasets, while blindly training on datasets poisoned with direct additive discrimination. The method is compatible with existing legal systems and provides a solution to the widely discussed issue of protected groups' intersectionality by striking a balance between the protected groups. Technically, the method applies probabilistic interventions, has causal and counterfactual formulations, and is computationally lightweight - it can be used with any supervised learning model to prevent discrimination via proxies while maximizing model accuracy for business necessity.

翻译：监督学习系统是通过历史数据进行训练的，如果该数据存在歧视，那么这些系统可能会意外地学会对受到保护群体进行歧视。我们认为，即使在可能存在歧视性数据的情况下进行训练，公平的学习方法也应该在公平的测试数据集上表现良好。此类数据集转换为特定公平学习方法的应用场景。例如，去除直接歧视可以被表示为一种特定的数据集转移问题。对于这种情况，我们提出了一种学习方法，该方法可证明在公平数据集上最小化模型错误，同时在受到直接加性歧视污染的数据集上进行盲目训练。该方法与现有法律体系相容，并通过平衡受保护群体之间的权衡来解决广泛讨论的保护群体交叉问题。技术上，该方法应用概率干预，具有因果关系和反事实表述，并且计算复杂度较低 - 可以与任何监督学习模型一起使用，以避免通过代理进行歧视，同时最大程度地提高模型的准确性以满足商业需求。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日