Both fair machine learning and adversarial learning have been extensively studied. However, attacking fair machine learning models has received less attention. In this paper, we present a framework that seeks to effectively generate poisoning samples to attack both model accuracy and algorithmic fairness. Our attacking framework can target fair machine learning models trained with a variety of group based fairness notions such as demographic parity and equalized odds. We develop three online attacks, adversarial sampling , adversarial labeling, and adversarial feature modification. All three attacks effectively and efficiently produce poisoning samples via sampling, labeling, or modifying a fraction of training data in order to reduce the test accuracy. Our framework enables attackers to flexibly adjust the attack's focus on prediction accuracy or fairness and accurately quantify the impact of each candidate point to both accuracy loss and fairness violation, thus producing effective poisoning samples. Experiments on two real datasets demonstrate the effectiveness and efficiency of our framework.
翻译:对公平的机器学习和对抗性学习进行了广泛的研究,但对公平的机器学习和对抗性学习进行了广泛的研究,但是,攻击公平的机器学习模式受到的关注较少,在本文中,我们提出了一个框架,力求有效地生成中毒样本,以衡量模型的准确性和算法公正性。我们的攻击框架可以针对经过各种群体公平概念培训的公平机器学习模式,如人口均等和均等率。我们开发了三次在线攻击、对抗性抽样、对抗性标签和对抗性特征修改。所有三次攻击都通过抽样、标签或修改一部分培训数据,有效和高效地生成中毒样本,以降低测试的准确性。我们的框架使攻击者能够灵活调整攻击的重点放在预测准确性或公平性上,并准确地量化每个候选人点的影响,从而导致准确性损失和公平性,从而产生有效的中毒样本。对两个真实数据集的实验显示了我们框架的有效性和效率。