通过强力实现公平:调查深层学习中的强力差异 (Fairness Through Robustness: Investigating Robustness Disparity in Deep Learning)

Deep neural networks (DNNs) are increasingly used in real-world applications (e.g. facial recognition). This has resulted in concerns about the fairness of decisions made by these models. Various notions and measures of fairness have been proposed to ensure that a decision-making system does not disproportionately harm (or benefit) particular subgroups of the population. In this paper, we argue that traditional notions of fairness that are only based on models' outputs are not sufficient when the model is vulnerable to adversarial attacks. We argue that in some cases, it may be easier for an attacker to target a particular subgroup, resulting in a form of \textit{robustness bias}. We show that measuring robustness bias is a challenging task for DNNs and propose two methods to measure this form of bias. We then conduct an empirical study on state-of-the-art neural networks on commonly used real-world datasets such as CIFAR-10, CIFAR-100, Adience, and UTKFace and show that in almost all cases there are subgroups (in some cases based on sensitive attributes like race, gender, etc) which are less robust and are thus at a disadvantage. We argue that this kind of bias arises due to both the data distribution and the highly complex nature of the learned decision boundary in the case of DNNs, thus making mitigation of such biases a non-trivial task. Our results show that robustness bias is an important criterion to consider while auditing real-world systems that rely on DNNs for decision making. Code to reproduce all our results can be found here: \url{https://github.com/nvedant07/Fairness-Through-Robustness}

翻译：深心神经网络(DNNS)越来越多地用于现实世界的应用(如面部识别)。这引起了人们对这些模型所作决定的公正性的关切。提出了各种公平概念和措施,以确保决策系统不会过度伤害(或惠益)特定人群群体。在本文中,我们认为,在模型容易受到对抗性攻击时,仅仅基于模型产出的传统公平概念是不够的。我们争辩说,在某些情况下,攻击者可能更容易针对某个子群,从而形成一种形式上的 Rotit{robust 偏差。我们表明,衡量稳性偏差对于DNNIS来说是一项艰巨的任务,并提出了衡量这种偏差形式的两种方法。我们随后对通常使用的真实世界数据集(如CIFAR-10、CIFAR-100、Adience和UTKFace)进行实证研究,并表明,几乎在所有情况下都存在分组(有些案例基于敏感的种族、性别/robarty 偏差等属性,因此,我们所了解的判错性标准),因此,我们所了解的判错性数据是相当稳健的。