As the application of deep neural networks proliferates in numerous areas such as medical imaging, video surveillance, and self driving cars, the need for explaining the decisions of these models has become a hot research topic, both at the global and local level. Locally, most explanation methods have focused on identifying relevance of features, limiting the types of explanations possible. In this paper, we investigate a new direction by leveraging latent features to generate contrastive explanations; predictions are explained not only by highlighting aspects that are in themselves sufficient to justify the classification, but also by new aspects which if added will change the classification. The key contribution of this paper lies in how we add features to rich data in a formal yet humanly interpretable way that leads to meaningful results. Our new definition of "addition" uses latent features to move beyond the limitations of previous explanations and resolve an open question laid out in Dhurandhar, et. al. (2018), which creates local contrastive explanations but is limited to simple datasets such as grayscale images. The strength of our approach in creating intuitive explanations that are also quantitatively superior to other methods is demonstrated on three diverse image datasets (skin lesions, faces, and fashion apparel). A user study with 200 participants further exemplifies the benefits of contrastive information, which can be viewed as complementary to other state-of-the-art interpretability methods.
翻译:由于深心神经网络的应用在医学成像、视频监视和自我驾驶汽车等许多领域扩散,解释这些模型决定的必要性已成为全球和地方两级的热题研究课题。从地方上看,大多数解释方法都侧重于查明特征的相关性,限制解释的可能类型。在本文件中,我们通过利用潜在特征来调查一个新的方向,以产生对比解释;预测的解释不仅通过强调本身足以证明分类理由的方面来解释,而且通过新的方面来解释,如果增加的话,将会改变分类。本文的主要贡献在于我们如何以正式但人文解释的方式增加丰富数据的特点,从而产生有意义的结果。我们新的“添加”定义利用潜在特点,超越了先前解释的局限性,解决了杜兰德哈尔等人(2018年)提出的一个公开问题,它产生地方对比性解释,但仅限于灰度图像等简单数据集。我们在创建直观解释方法方面的力量,这些直观解释在数量上也优于其他方法,从而导致有意义的结果。我们的新定义利用潜在特征,超越了以前解释的局限性,解决了杜兰德哈尔等人(2018年)提出的一个开放问题,可以进一步解释。我们在三种不同图像数据模型的研究中展示的方法中展示了其他用户的优点。