Enabling highly secure applications (such as border crossing) with face recognition requires extensive biometric performance tests through large scale data. However, using real face images raises concerns about privacy as the laws do not allow the images to be used for other purposes than originally intended. Using representative and subsets of face data can also lead to unwanted demographic biases and cause an imbalance in datasets. One possible solution to overcome these issues is to replace real face images with synthetically generated samples. While generating synthetic images has benefited from recent advancements in computer vision, generating multiple samples of the same synthetic identity resembling real-world variations is still unaddressed, i.e., mated samples. This work proposes a non-deterministic method for generating mated face images by exploiting the well-structured latent space of StyleGAN. Mated samples are generated by manipulating latent vectors, and more precisely, we exploit Principal Component Analysis (PCA) to define semantically meaningful directions in the latent space and control the similarity between the original and the mated samples using a pre-trained face recognition system. We create a new dataset of synthetic face images (SymFace) consisting of 77,034 samples including 25,919 synthetic IDs. Through our analysis using well-established face image quality metrics, we demonstrate the differences in the biometric quality of synthetic samples mimicking characteristics of real biometric data. The analysis and results thereof indicate the use of synthetic samples created using the proposed approach as a viable alternative to replacing real biometric data.
翻译:以面部识别为高度安全的应用程序(如边境过境),如面部识别,则需要通过大比例数据进行广泛的生物鉴别性工作测试。然而,使用真实面部图像引起对隐私的关切,因为法律不允许将图像用于原定目的以外的其他目的。使用面部数据的代表性和子集,还可能导致不必要的人口偏差,并造成数据集失衡。克服这些问题的一个可能解决办法是用合成生成的样本取代真实面部图像。在生成合成图像时,计算机视野最近的进步使合成图像获益于合成图像,但仍没有解决同一可生存的合成身份的合成合成身份与真实世界变异的多个样本,即配对样本。这项工作提议采用非定式方法,利用StyleGAN结构完善的潜在空间生成面部位图像。我们利用主要构成部分分析(PCA)来界定暗中空间的具有实际意义的方向,并用经过事先培训的面部识别系统来控制原始和配对合成样本之间的相似性。我们创建了一个新的面部面图像数据集集(SyFA),利用了已经建立的数据质量分析的2号样本,包括了我们所建的模型质量分析。