State-of-the-art face recognition systems require huge amounts of labeled training data. Given the priority of privacy in face recognition applications, the data is limited to celebrity web crawls, which have issues such as skewed distributions of ethnicities and limited numbers of identities. On the other hand, the self-supervised revolution in the industry motivates research on adaptation of the related techniques to facial recognition. One of the most popular practical tricks is to augment the dataset by the samples drawn from the high-resolution high-fidelity models (e.g. StyleGAN-like), while preserving the identity. We show that a simple approach based on fine-tuning an encoder for StyleGAN allows to improve upon the state-of-the-art facial recognition and performs better compared to training on synthetic face identities. We also collect large-scale unlabeled datasets with controllable ethnic constitution -- AfricanFaceSet-5M (5 million images of different people) and AsianFaceSet-3M (3 million images of different people) and we show that pretraining on each of them improves recognition of the respective ethnicities (as well as also others), while combining all unlabeled datasets results in the biggest performance increase. Our self-supervised strategy is the most useful with limited amounts of labeled training data, which can be beneficial for more tailored face recognition tasks and when facing privacy concerns. Evaluation is provided based on a standard RFW dataset and a new large-scale RB-WebFace benchmark.
翻译:高端脸部识别系统需要大量的标签化培训数据。 鉴于在面部识别应用程序中隐私的优先地位,数据仅限于名人网络爬行, 诸如种族分布偏斜和身份数量有限的问题。 另一方面, 行业的自我监督革命激励了相关技术适应面部识别技术的研究。 最受欢迎的实用技巧之一是在保存身份的同时, 增加从高分辨率高性格模型( 如StyleGAN) 提取的样本所收集的数据集( 如StyleGAN) 。 我们显示,基于StyleGAN 精细调整一个编码器的简单方法可以改进最先进的面部识别, 并比合成面部身份培训更好地表现。 我们还收集了大规模、无标签的数据集,以可控的族裔宪法( AfricaFaceSet-5M (500万个不同人群的图像) 和亚洲FaceSetet-3M (300万个不同人群的图像) 。 我们显示, 在对各自的脸部位基准进行精细化的精确度识别( ), 将我们最有价值的数据与最精确性的数据整合的自我识别, 同时将我们最有价值的数据与最有价值的数据合并。