Presentation attacks are recurrent threats to biometric systems, where impostors attempt to bypass these systems. Humans often use background information as contextual cues for their visual system. Yet, regarding face-based systems, the background is often discarded, since face presentation attack detection (PAD) models are mostly trained with face crops. This work presents a comparative study of face PAD models (including multi-task learning, adversarial training and dynamic frame selection) in two settings: with and without crops. The results show that the performance is consistently better when the background is present in the images. The proposed multi-task methodology beats the state-of-the-art results on the ROSE-Youtu dataset by a large margin with an equal error rate of 0.2%. Furthermore, we analyze the models' predictions with Grad-CAM++ with the aim to investigate to what extent the models focus on background elements that are known to be useful for human inspection. From this analysis we can conclude that the background cues are not relevant across all the attacks. Thus, showing the capability of the model to leverage the background information only when necessary.
翻译:演示式攻击是对生物鉴别系统的经常性威胁, 假冒者试图绕过这些系统。 人类经常使用背景资料作为其视觉系统的背景提示。 然而, 关于脸基系统, 背景往往被丢弃, 因为脸部攻击探测模型( PAD) 大多是用面部作物训练的。 这项工作展示了对两种环境中的PAD模型( 包括多任务学习、 对抗培训和动态框架选择)的对比研究: 有和没有作物的。 结果显示, 当图像的背景存在时, 性能总是好得多。 拟议的多任务方法比ROSE- Youtu数据集的最新结果要强, 以0.2%的相同误差率以大差率击败。 此外, 我们用 Grad- CAM++来分析模型的预测, 目的是调查模型在多大程度上侧重于已知对人体检查有用的背景要素。 我们从这一分析中可以得出结论, 背景提示与所有攻击无关。 因此, 显示模型仅在必要情况下才能利用背景信息的能力 。