In this work we explore the intersection fairness and robustness in the context of ranking: \textit{when a ranking model has been calibrated to achieve some definition of fairness, is it possible for an external adversary to make the ranking model behave unfairly without having access to the model or training data?} To investigate this question, we present a case study in which we develop and then attack a state-of-the-art, fairness-aware image search engine using images that have been maliciously modified using a Generative Adversarial Perturbation (GAP) model. These perturbations attempt to cause the fair re-ranking algorithm to unfairly boost the rank of images containing people from an adversary-selected subpopulation. We present results from extensive experiments demonstrating that our attacks can successfully confer significant unfair advantage to people from the majority class relative to fairly-ranked baseline search results. We demonstrate that our attacks are robust across a number of variables, that they have close to zero impact on the relevance of search results, and that they succeed under a strict threat model. Our findings highlight the danger of deploying fair machine learning algorithms in-the-wild when (1) the data necessary to achieve fairness may be adversarially manipulated, and (2) the models themselves are not robust against attacks.
翻译:在这项工作中,我们探索了排名背景下的交叉公平和稳健性: \ textit{ 当一个排名模型经过校准以达到某种公平定义时,外部对手有可能使排名模型在无法获得模型或培训数据的情况下造成不公平的行为吗?}为了调查这一问题,我们提出了一个案例研究,我们在这个案例研究中开发并随后攻击一个最先进的公平图像搜索引擎,这些搜索引擎使用的图像是利用“创用反逆渗透”模型恶意地修改的。这些扰动试图导致公平的重新排序算法不公平地提升包含来自敌对者所选择的亚群人群的图像的等级。我们介绍了广泛的实验结果,表明我们的攻击能够成功地给多数阶层的相对群体带来重大不公平的优势,而使基线搜索的结果得到相当的排序。我们证明我们的攻击在一系列变量中是强大的,它们几乎对搜索结果的相关性产生了零影响,并且在严格的威胁模式下成功。我们的研究结果突出表明,在对攻击本身进行公平机器学习的模型进行公平性分析时,(1) 必要的数据可能是实现公平性。