For face presentation attack detection (PAD), most of the spoofing cues are subtle, local image patterns (e.g., local image distortion, 3D mask edge and cut photo edges). The representations of existing PAD works with simple global pooling method, however, lose the local feature discriminability. In this paper, the VLAD aggregation method is adopted to quantize local features with visual vocabulary locally partitioning the feature space, and hence preserve the local discriminability. We further propose the vocabulary separation and adaptation method to modify VLAD for cross-domain PADtask. The proposed vocabulary separation method divides vocabulary into domain-shared and domain-specific visual words to cope with the diversity of live and attack faces under the cross-domain scenario. The proposed vocabulary adaptation method imitates the maximization step of the k-means algorithm in the end-to-end training, which guarantees the visual words be close to the center of assigned local features and thus brings robust similarity measurement. We give illustrations and extensive experiments to demonstrate the effectiveness of VLAD with the proposed vocabulary separation and adaptation method on standard cross-domain PAD benchmarks. The codes are available at https://github.com/Liubinggunzu/VLAD-VSA.
翻译:对于面部攻击探测(PAD),大多数表面显示提示是微妙的当地图像模式(例如,当地图像扭曲、3D蒙面边缘和剪切的图像边缘),现有PAD的表示方式使用简单的全球集合方法,但丧失了本地特征的区别性。在本文中,VLAD汇总方法采用以视觉词汇对本地特征进行局部分隔,从而保护本地差异性。我们进一步提议用词汇区分和调整方法修改VLAD, 用于跨多曼式的PADtask。提议的词汇区分法将词汇分为域共享和特定域的直观词,以应对跨多域情景下的活人和攻击人面的多样性。拟议的词汇调整方法模仿了端到端培训中k means算法的最大化步骤,保证视觉词接近指定本地特征的中心,从而带来强有力的相似性测量。我们提供了插图和广泛的实验,以展示VLADD在标准的跨多巴基根/PADAD基准上拟议的词汇分离和调整方法的有效性。