As machine learning becomes prevalent, mitigating any unfairness present in the training data becomes critical. Among the various notions of fairness, this paper focuses on the well-known individual fairness, which states that similar individuals should be treated similarly. While individual fairness can be improved when training a model (in-processing), we contend that fixing the data before model training (pre-processing) is a more fundamental solution. In particular, we show that label flipping is an effective pre-processing technique for improving individual fairness. Our system iFlipper solves the optimization problem of minimally flipping labels given a limit to the individual fairness violations, where a violation occurs when two similar examples in the training data have different labels. We first prove that the problem is NP-hard. We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips. We also propose techniques for making the linear programming solution more optimal without exceeding the violations limit. Experiments on real datasets show that iFlipper significantly outperforms other pre-processing baselines in terms of individual fairness and accuracy on unseen test sets. In addition, iFlipper can be combined with in-processing techniques for even better results.
翻译:由于机器学习变得普遍,减少培训数据中存在的不公平现象变得至关重要。在各种公平概念中,本文件侧重于众所周知的个人公平,指出类似的个人应受到类似的对待。虽然在培训模型(在处理中)时个人公平性可以提高,但我们认为,在模型培训(预处理)之前确定数据是一个更根本的解决办法。特别是,我们表明,标签翻转是一种提高个人公平性的有效预处理技术。我们的系统iFlipper解决了最小翻转标签的最优化问题,因为对个人公平性有限制,培训数据中两个类似的例子都有不同的标签,出现违反情况。我们首先证明问题很严重。我们然后提出大致线性编程算法,从理论上保证其结果在标签翻转数方面接近最佳解决办法。我们还提出了使线性编程解决方案在不超出违规限度的情况下更优化的技术。对真实数据集的实验显示,iFlipper在个人公平性和准确性方面大大优于其他预处理前基线。此外,在秘密处理技术中,iFlipper可以使用更好的结果。