Machine learning models are increasingly deployed for critical decision-making tasks, making it important to verify that they do not contain gender or racial biases picked up from training data. Typical approaches to achieve fairness revolve around efforts to clean or curate training data, with post-hoc statistical evaluation of the fairness of the model on evaluation data. In contrast, we propose techniques to \emph{prove} fairness using recently developed formal methods that verify properties of neural network models.Beyond the strength of guarantee implied by a formal proof, our methods have the advantage that we do not need explicit training or evaluation data (which is often proprietary) in order to analyze a given trained model. In experiments on two familiar datasets in the fairness literature (COMPAS and ADULTS), we show that through proper training, we can reduce unfairness by an average of 65.4\% at a cost of less than 1\% in AUC score.
翻译:机械学习模式越来越多地用于关键的决策任务,因此,必须核实它们并不包含从培训数据中摘取的性别或种族偏见。实现公平的典型方法围绕清洁或整理培训数据的努力,对评价数据模型的公正性进行热后统计评估。相反,我们建议采用最近开发的核查神经网络模型特性的正式方法实现公平。除了正式证据所隐含的保障力度之外,我们的方法的优点是,我们不需要明确的培训或评价数据(往往是专有的)来分析特定的培训模式。在对公平文献中两个熟悉的数据集(COMPAS和ADUTS)的实验中,我们表明,通过适当的培训,我们可以以低于1分的成本平均减少65.4 ⁇ 的不公现象。