In recent years, a growing body of work has emerged on how to learn machine learning models under fairness constraints, often expressed with respect to some sensitive attributes. In this work, we consider the setting in which an adversary has black-box access to a target model and show that information about this model's fairness can be exploited by the adversary to enhance his reconstruction of the sensitive attributes of the training data. More precisely, we propose a generic reconstruction correction method, which takes as input an initial guess made by the adversary and corrects it to comply with some user-defined constraints (such as the fairness information) while minimizing the changes in the adversary's guess. The proposed method is agnostic to the type of target model, the fairness-aware learning method as well as the auxiliary knowledge of the adversary. To assess the applicability of our approach, we have conducted a thorough experimental evaluation on two state-of-the-art fair learning methods, using four different fairness metrics with a wide range of tolerances and with three datasets of diverse sizes and sensitive attributes. The experimental results demonstrate the effectiveness of the proposed approach to improve the reconstruction of the sensitive attributes of the training set.
翻译:近年来,在如何在公平限制下学习机器学习模式方面出现了越来越多的工作,这些工作往往表现在某些敏感属性方面。在这项工作中,我们考虑了对手能够黑箱进入目标模型的背景,并表明对手可以利用关于这一模式公平性的信息,加强他重建培训数据敏感属性的工作。更准确地说,我们建议了一种通用的重建纠正方法,将对手的初步猜测作为投入,并纠正它,以遵守一些用户界定的限制(如公平信息),同时尽量减少对手猜测的变化。提议的方法对目标模型的类型、公平意识学习方法以及对手的辅助知识具有概念性。为了评估我们的方法的适用性,我们用四种不同的公平度量度来进行彻底的实验性评价,该方法有广泛的容忍度,并有三套不同大小和敏感属性的数据。实验结果表明,拟议的方法对于改进培训成套敏感属性的重建是有效的。