Recently, significant progress has been made in face presentation attack detection (PAD), which aims to secure face recognition systems against presentation attacks, owing to the availability of several face PAD datasets. However, all available datasets are based on privacy and legally-sensitive authentic biometric data with a limited number of subjects. To target these legal and technical challenges, this work presents the first synthetic-based face PAD dataset, named SynthASpoof, as a large-scale PAD development dataset. The bona fide samples in SynthASpoof are synthetically generated and the attack samples are collected by presenting such synthetic data to capture systems in a real attack scenario. The experimental results demonstrate the feasibility of using SynthASpoof for the development of face PAD. Moreover, we boost the performance of such a solution by incorporating the domain generalization tool MixStyle into the PAD solutions. Additionally, we showed the viability of using synthetic data as a supplement to enrich the diversity of limited authentic training data and consistently enhance PAD performances. The SynthASpoof dataset, containing 25,000 bona fide and 78,800 attack samples, the implementation, and the pre-trained weights are made publicly available.
翻译:近年来,由于面部 PAD 数据集的可用性,面部展示攻击(PAD) 检测取得了重要进展。然而,所有可用的数据集都是基于隐私和法律敏感的真实生物特征数据,受限于受试者人数。为了解决这些法律和技术挑战,本文提出了第一个基于合成数据的面部 PAD 数据集:SynthASpoof,作为大规模 PAD 开发数据集。SynthASpoof 中的真实样本是通过合成生成的,攻击样本是通过在真实攻击场景中提交这些合成数据而收集的。实验结果表明,SynthASpoof 可用于开发面部 PAD。此外,我们通过将领域泛化工具 MixStyle 融入到 PAD 解决方案中,提高了该解决方案的性能。另外,我们展示了使用合成数据作为真实训练数据的补充,以增加有限真实训练数据的多样性并持续提高 PAD 性能的可行性。包含25,000 个真实样本和78,800 个攻击样本的 SynthASpoof 数据集、实现和预训练权重已公开可用。