In this work we explore the intersection fairness and robustness in the context of ranking: when a ranking model has been calibrated to achieve some definition of fairness, is it possible for an external adversary to make the ranking model behave unfairly without having access to the model or training data? To investigate this question, we present a case study in which we develop and then attack a state-of-the-art, fairness-aware image search engine using images that have been maliciously modified using a Generative Adversarial Perturbation (GAP) model. These perturbations attempt to cause the fair re-ranking algorithm to unfairly boost the rank of images containing people from an adversary-selected subpopulation. We present results from extensive experiments demonstrating that our attacks can successfully confer significant unfair advantage to people from the majority class relative to fairly-ranked baseline search results. We demonstrate that our attacks are robust across a number of variables, that they have close to zero impact on the relevance of search results, and that they succeed under a strict threat model. Our findings highlight the danger of deploying fair machine learning algorithms in-the-wild when (1) the data necessary to achieve fairness may be adversarially manipulated, and (2) the models themselves are not robust against attacks.
翻译:在这项工作中,我们探索了排名背景下的交叉公平和稳健性:当一个排名模型经过调整,以达到某种公平定义时,外部对手有可能使排名模型在无法获得模型或培训数据的情况下使排名模型出现不公平的行为吗?为了调查这一问题,我们提出了一个案例研究,在其中我们开发并随后攻击一个最先进的、对公平的认识图像搜索引擎,使用一种利用基因反对调(GAP)模型恶意修改的图像进行搜索。这些扰动试图促使公平的重新排序算法不公平地提升包含敌方所选子群中的人的图像的等级。我们介绍了广泛的实验结果,表明我们的攻击能够成功地给多数阶层的人带来与相当的基线搜索结果相比的巨大不公平优势。我们证明我们的攻击在一系列变量中是强大的,对搜索结果的相关性产生了接近于零的影响,并且在严格的威胁模式下取得了成功。我们的调查结果强调了在边缘部署公平的机器学习算法的危险,在(1) 实现公平性攻击所需的数据本身不是敌对性操纵的时候,我们的攻击可能具有敌对性。