In this paper, we take a first step towards answering the question of how to design fair machine learning algorithms that are robust to adversarial attacks. Using a minimax framework, we aim to design an adversarially robust fair regression model that achieves optimal performance in the presence of an attacker who is able to add a carefully designed adversarial data point to the dataset or perform a rank-one attack on the dataset. By solving the proposed nonsmooth nonconvex-nonconcave minimax problem, the optimal adversary as well as the robust fairness-aware regression model are obtained. For both synthetic data and real-world datasets, numerical results illustrate that the proposed adversarially robust fair models have better performance on poisoned datasets than other fair machine learning models in both prediction accuracy and group-based fairness measure.
翻译:在本文中,我们迈出了第一步,回答如何设计对对抗性攻击具有活力的公平机器学习算法的问题。我们的目标是设计一个对抗性强的公平回归模型,在能够将精心设计的对抗性数据点添加到数据集或对数据集进行一等攻击的攻击者面前达到最佳性能。通过解决拟议的非摩擦非convex-nonconccave小型算法问题,获得了最佳的对手以及强大的公平觉察回归模型。对于合成数据和真实世界数据集,数字结果表明,在预测准确性和群体公平度衡量方面,拟议的对抗性强的公平模型在有毒数据集上比其他公平机器学习模型有更好的性能。