Algorithmic decision making driven by neural networks has become very prominent in applications that directly affect people's quality of life. In this paper, we study the problem of verifying, training, and guaranteeing individual fairness of neural network models. A popular approach for enforcing fairness is to translate a fairness notion into constraints over the parameters of the model. However, such a translation does not always guarantee fair predictions of the trained neural network model. To address this challenge, we develop a counterexample-guided post-processing technique to provably enforce fairness constraints at prediction time. Contrary to prior work that enforces fairness only on points around test or train data, we are able to enforce and guarantee fairness on all points in the input domain. Additionally, we propose an in-processing technique to use fairness as an inductive bias by iteratively incorporating fairness counterexamples in the learning process. We have implemented these techniques in a tool called FETA. Empirical evaluation on real-world datasets indicates that FETA is not only able to guarantee fairness on-the-fly at prediction time but also is able to train accurate models exhibiting a much higher degree of individual fairness.
翻译:在直接影响到人们生活质量的应用程序中,由神经网络驱动的解析决策已变得非常突出。在本文中,我们研究了核查、培训和保证神经网络模型个人公平的问题。一个提高公平性的流行方法是将公平概念转化为模型参数的限制。然而,这种翻译并不总是保证对经过训练的神经网络模型作出公平的预测。为了应对这一挑战,我们开发了一种反验证制后处理技术,以便在预测时可以执行公平性限制。与以前只对测试或培训数据周围的点执行公平性的工作相反,我们有能力在输入领域的所有点执行和保证公平性。此外,我们提出一种在处理中采用公平性技术,通过在学习过程中相互纳入公平反标本来将公平性作为诱导偏见。我们已在一个名为FETA的工具中应用了这些技术。对现实世界数据集的Epiricacal评价表明,FETA不仅能够保证在预测时间上的公平性,而且能够培训准确的模型,展示个人公平性要高得多。