以身份前瞻为指导的非确定性合成面形数据集的生成 (Generation of Non-Deterministic Synthetic Face Datasets Guided by Identity Priors)

Enabling highly secure applications (such as border crossing) with face recognition requires extensive biometric performance tests through large scale data. However, using real face images raises concerns about privacy as the laws do not allow the images to be used for other purposes than originally intended. Using representative and subsets of face data can also lead to unwanted demographic biases and cause an imbalance in datasets. One possible solution to overcome these issues is to replace real face images with synthetically generated samples. While generating synthetic images has benefited from recent advancements in computer vision, generating multiple samples of the same synthetic identity resembling real-world variations is still unaddressed, i.e., mated samples. This work proposes a non-deterministic method for generating mated face images by exploiting the well-structured latent space of StyleGAN. Mated samples are generated by manipulating latent vectors, and more precisely, we exploit Principal Component Analysis (PCA) to define semantically meaningful directions in the latent space and control the similarity between the original and the mated samples using a pre-trained face recognition system. We create a new dataset of synthetic face images (SymFace) consisting of 77,034 samples including 25,919 synthetic IDs. Through our analysis using well-established face image quality metrics, we demonstrate the differences in the biometric quality of synthetic samples mimicking characteristics of real biometric data. The analysis and results thereof indicate the use of synthetic samples created using the proposed approach as a viable alternative to replacing real biometric data.

翻译：以面部识别为高度安全的应用程序(如边境过境),如面部识别,则需要通过大比例数据进行广泛的生物鉴别性工作测试。然而,使用真实面部图像引起对隐私的关切,因为法律不允许将图像用于原定目的以外的其他目的。使用面部数据的代表性和子集,还可能导致不必要的人口偏差,并造成数据集失衡。克服这些问题的一个可能解决办法是用合成生成的样本取代真实面部图像。在生成合成图像时,计算机视野最近的进步使合成图像获益于合成图像,但仍没有解决同一可生存的合成身份的合成合成身份与真实世界变异的多个样本,即配对样本。这项工作提议采用非定式方法,利用StyleGAN结构完善的潜在空间生成面部位图像。我们利用主要构成部分分析(PCA)来界定暗中空间的具有实际意义的方向,并用经过事先培训的面部识别系统来控制原始和配对合成样本之间的相似性。我们创建了一个新的面部面图像数据集集(SyFA),利用了已经建立的数据质量分析的2号样本,包括了我们所建的模型质量分析。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

医疗健康领域的短文本理解

专知会员服务

33+阅读 · 2021年1月2日

图像分割方法综述

专知会员服务

56+阅读 · 2020年11月22日

【斯坦福大学博士论文】大规模和高维统计学习方法和算法，147页pdf， Large-scale and high-dimensional statistical learning methods and algorithms

专知会员服务

26+阅读 · 2020年6月13日