The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications, such as justice system, drug/vaccination design, and medical diagnosis. Although there are effective methods to train fair models from scratch, how to automatically reveal and explain the unfairness of a trained model remains a challenging task. Revealing unfairness of machine learning models in interpretable fashion is a critical step towards fair and trustworthy AI. In this paper, we systematically tackle the novel task of revealing unfair models by mining interpretable evidence (RUMIE). The key idea is to find solid evidence in the form of a group of data instances discriminated most by the model. To make the evidence interpretable, we also find a set of human-understandable key attributes and decision rules that characterize the discriminated data instances and distinguish them from the other non-discriminated data. As demonstrated by extensive experiments on many real-world data sets, our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models. Moreover, it is much more scalable than all of the baseline methods.
翻译:机器学习的普及增加了不公平模型被应用到司法系统、药物/疫苗接种设计和医学诊断等高吸收应用中的风险。尽管有从零开始培训公平模型的有效方法,但如何自动披露和解释训练有素模型的不公正仍然是一个艰巨的任务。 以可解释的方式揭示机器学习模型的不公平性是迈向公平和可信的AI的关键步骤。 在本文件中,我们系统地处理通过采矿可解释证据(RUMIE)来揭示不公平模型的新任务。 关键的想法是以一组受模型最歧视的数据案例的形式找到确凿的证据。 为了使证据可以解释,我们还找到一套人类无法理解的关键属性和决定规则,用以描述受歧视数据案例的特点,并将它们与其他非分散的数据区分开来。正如许多真实世界数据集的广泛实验所证明的那样,我们的方法发现高度可解释和可靠的证据可以有效地揭示经过训练的模型的不公道。此外,它比所有基线方法都要大得多。