The availability of large-scale face datasets has been key in the progress of face recognition. However, due to licensing issues or copyright infringement, some datasets are not available anymore (e.g. MS-Celeb-1M). Recent advances in Generative Adversarial Networks (GANs), to synthesize realistic face images, provide a pathway to replace real datasets by synthetic datasets, both to train and benchmark face recognition (FR) systems. The work presented in this paper provides a study on benchmarking FR systems using a synthetic dataset. First, we introduce the proposed methodology to generate a synthetic dataset, without the need for human intervention, by exploiting the latent structure of a StyleGAN2 model with multiple controlled factors of variation. Then, we confirm that (i) the generated synthetic identities are not data subjects from the GAN's training dataset, which is verified on a synthetic dataset with 10K+ identities; (ii) benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.
翻译:然而,由于许可证问题或侵犯版权,一些数据集不再可用(例如MS-Celeb-1M)。 基因反对流网络(GANs)最近的进展是将现实的图像综合起来,为用合成数据集取代真实数据集提供了一条途径,以培训和基准面貌识别系统取代真实数据集,以培训和基准面貌识别系统。本文件所介绍的工作提供了利用合成数据集对FR系统进行基准衡量的研究。首先,我们采用了拟议方法,通过利用具有多种受控变量的StyleGAN2模型的潜在结构,生成合成数据集。然后,我们确认:(一) 生成的合成身份不是来自GAN培训数据集的数据主体,该数据集在合成数据集上经过10K+身份验证;(二) 合成数据集的基准结果是一种良好的替代,往往提供错误率和系统排序与实际数据集的基准相似。