人工合成人脸检测模型的显著性可解释性：基于显著性的解释方法 (Explain To Me: Salience-Based Explainability for Synthetic Face Detection Models)

The performance of convolutional neural networks has continued to improve over the last decade. At the same time, as model complexity grows, it becomes increasingly more difficult to explain model decisions. Such explanations may be of critical importance for reliable operation of human-machine pairing setups, or for model selection when the "best" model among many equally-accurate models must be established. Saliency maps represent one popular way of explaining model decisions by highlighting image regions models deem important when making a prediction. However, examining salience maps at scale is not practical. In this paper, we propose five novel methods of leveraging model salience to explain a model behavior at scale. These methods ask: (a) what is the average entropy for a model's salience maps, (b) how does model salience change when fed out-of-set samples, (c) how closely does model salience follow geometrical transformations, (d) what is the stability of model salience across independent training runs, and (e) how does model salience react to salience-guided image degradations. To assess the proposed measures on a concrete and topical problem, we conducted a series of experiments for the task of synthetic face detection with two types of models: those trained traditionally with cross-entropy loss, and those guided by human salience when training to increase model generalizability. These two types of models are characterized by different, interpretable properties of their salience maps, which allows for the evaluation of the correctness of the proposed measures. We offer source codes for each measure along with this paper.

翻译：卷积神经网络的性能在过去十年内持续改进。然而，随着模型复杂度的增加，解释模型决策变得越来越困难。这样的解释对于可靠地操作人机配对设置或在许多同样准确的模型中选择“最佳”模型时可能至关重要。显著图是一种流行的解释模型决策的方式，它在进行预测时强调模型认为重要的图像区域。但是，在大规模检查显著图是不切实际的。在本文中，我们提出了五种利用模型显著性解释其行为的新方法。这些方法问：（a）模型显著图的平均熵是多少，（b）在输入超出集合的样本时，模型显著性如何改变，（c）模型显著性如何遵循几何变换，（d）在独立的训练运行中，模型显著性的稳定性是多少，以及（e）模型显著性如何对显著性引导图像退化做出反应。为了评估所提出的措施在一个具体而时尚的问题的作用，我们对使用两种模型进行了一系列用于人工合成人脸检测的实验：一种是传统使用交叉熵损失进行训练的模型，另一种是以人类显著性为指导来训练以增加模型的泛化性。这两种模型具有不同的可解释的显著图的特性，这有助于评估所提出的措施的正确性。我们提供了每个措施的源代码以及本文。