A large number of deep neural network based techniques have been developed to address the challenging problem of face presentation attack detection (PAD). Whereas such techniques' focus has been on improving PAD performance in terms of classification accuracy and robustness against unseen attacks and environmental conditions, there exists little attention on the explainability of PAD predictions. In this paper, we tackle the problem of explaining PAD predictions through natural language. Our approach passes feature representations of a deep layer of the PAD model to a language model to generate text describing the reasoning behind the PAD prediction. Due to the limited amount of annotated data in our study, we apply a light-weight LSTM network as our natural language generation model. We investigate how the quality of the generated explanations is affected by different loss functions, including the commonly used word-wise cross entropy loss, a sentence discriminative loss, and a sentence semantic loss. We perform our experiments using face images from a dataset consisting of 1,105 bona-fide and 924 presentation attack samples. Our quantitative and qualitative results show the effectiveness of our model for generating proper PAD explanations through text as well as the power of the sentence-wise losses. To the best of our knowledge, this is the first introduction of a joint biometrics-NLP task. Our dataset can be obtained through our GitHub page.
翻译:为解决面部演示攻击探测(PAD)这一具有挑战性的问题,开发了大量深层神经网络技术,以解决面部演示攻击探测(PAD)这一具有挑战性的问题。虽然这种技术的重点是提高PAD在分类准确性和抵御隐形攻击和环境条件的稳健性方面的性能,但对PAD预测的可解释性却很少注意。在本文中,我们处理的是用自然语言解释PAD预测的问题。我们的方法将PAD模型的深层表现方法转变为一种语言模型,以生成描述PAD预测背后推理的文本。由于我们研究中附加说明的数据数量有限,我们应用了轻量LSTM网络作为自然语言生成模型。我们调查产生的解释质量如何受到不同损失功能的影响,包括常用的单词跨加密损失、判决性损失和判决语义语义性损失。我们用由1 105 善意和 924 演示攻击样品组成的数据集提供的面像进行实验。我们的定量和定性结果显示我们通过文字生成适当的PAD解释的模型的有效性,以及我们所获取的GI-L数据的最佳能力。