The deep feedforward neural networks (DNNs) are increasingly deployed in socioeconomic critical decision support software systems. DNNs are exceptionally good at finding minimal, sufficient statistical patterns within their training data. Consequently, DNNs may learn to encode decisions -- amplifying existing biases or introducing new ones -- that may disadvantage protected individuals/groups and may stand to violate legal protections. While the existing search based software testing approaches have been effective in discovering fairness defects, they do not supplement these defects with debugging aids -- such as severity and causal explanations -- crucial to help developers triage and decide on the next course of action. Can we measure the severity of fairness defects in DNNs? Are these defects symptomatic of improper training or they merely reflect biases present in the training data? To answer such questions, we present DICE: an information-theoretic testing and debugging framework to discover and localize fairness defects in DNNs. The key goal of DICE is to assist software developers in triaging fairness defects by ordering them by their severity. Towards this goal, we quantify fairness in terms of protected information (in bits) used in decision making. A quantitative view of fairness defects not only helps in ordering these defects, our empirical evaluation shows that it improves the search efficiency due to resulting smoothness of the search space. Guided by the quantitative fairness, we present a causal debugging framework to localize inadequately trained layers and neurons responsible for fairness defects. Our experiments over ten DNNs, developed for socially critical tasks, show that DICE efficiently characterizes the amounts of discrimination, effectively generates discriminatory instances, and localizes layers/neurons with significant biases.
翻译:深度前馈神经网络(DNN)越来越多地部署在社经重要决策支持软件系统中。DNN在找到其训练数据中的最小、充分统计模式方面非常出色。因此,DNN可能会学习编码决策——放大现有偏见或引入新的偏见——这可能会使受保护的个体/团体处于不利地位,并可能违反法律保护。虽然现有的基于搜索的软件测试方法在发现公平性缺陷方面非常有效,但它们没有提供调试支持,例如错误严重性和原因解释——这些对于帮助开发人员确定下一步行动至关重要。我们能否测量DNN中公平性缺陷的严重程度?这些缺陷是不当的训练引起的症状还是仅反映了训练数据中存在的偏见?为了回答这些问题,我们提出了DICE:一种信息论测试和调试框架,用于发现和定位DNN中的公平性缺陷。DICE的关键目标是通过它们的严重性来协助软件开发人员对公平性缺陷进行排查。为此,我们用决策中使用的受保护信息(以位为单位)来量化公平性。公平缺陷的定量视图不仅有助于对这些缺陷进行排序,而且我们的实证评估表明,由于搜索空间的平滑性,它提高了搜索效率。在量化公平性的指导下,我们提出了一个因果调试框架,用于定位导致公平缺陷的不适当训练层和神经元。我们在开发用于社会重要任务的十个DNN上进行的实验表明,DICE有效地描述了歧视的数量,有效地生成了具有歧视性的实例,并定位了具有重大偏见的层/神经元。