As deep learning models are increasingly used in safety-critical applications, explainability and trustworthiness become major concerns. For simple images, such as low-resolution face portraits, synthesizing visual counterfactual explanations has recently been proposed as a way to uncover the decision mechanisms of a trained classification model. In this work, we address the problem of producing counterfactual explanations for high-quality images and complex scenes. Leveraging recent semantic-to-image models, we propose a new generative counterfactual explanation framework that produces plausible and sparse modifications which preserve the overall scene structure. Furthermore, we introduce the concept of "region-targeted counterfactual explanations", and a corresponding framework, where users can guide the generation of counterfactuals by specifying a set of semantic regions of the query image the explanation must be about. Extensive experiments are conducted on challenging datasets including high-quality portraits (CelebAMask-HQ) and driving scenes (BDD100k).
翻译:由于在安全关键应用中越来越多地使用深层次学习模型,解释性和可信赖性成为主要问题。对于低分辨率脸部肖像等简单图像,最近提出了合成视觉反事实解释,作为发现经过训练的分类模型决策机制的一种方法。在这项工作中,我们处理为高质量图像和复杂场景提供反事实解释的问题。利用最近的语义到图像模型,我们提议一个新的基因反事实解释框架,产生合理和稀少的修改,从而保护整个场景结构。此外,我们引入了“针对区域反事实解释”的概念,以及相应的框架,用户可以在此通过具体说明一系列必须解释的查询图象的语义区域来指导反事实的产生。对具有挑战性的数据集进行了广泛的实验,包括高质量肖像(CelebAMask-HQ)和驱动场(BDD100k)。