To achieve good performance in face recognition, a large scale training dataset is usually required. A simple yet effective way to improve recognition performance is to use a dataset as large as possible by combining multiple datasets in the training. However, it is problematic and troublesome to naively combine different datasets due to two major issues. First, the same person can possibly appear in different datasets, leading to an identity overlapping issue between different datasets. Naively treating the same person as different classes in different datasets during training will affect back-propagation and generate non-representative embeddings. On the other hand, manually cleaning labels may take formidable human efforts, especially when there are millions of images and thousands of identities. Second, different datasets are collected in different situations and thus will lead to different domain distributions. Naively combining datasets will make it difficult to learn domain invariant embeddings across different datasets. In this paper, we propose DAIL: Dataset-Aware and Invariant Learning to resolve the above-mentioned issues. To solve the first issue of identity overlapping, we propose a dataset-aware loss for multi-dataset training by reducing the penalty when the same person appears in multiple datasets. This can be readily achieved with a modified softmax loss with a dataset-aware term. To solve the second issue, domain adaptation with gradient reversal layers is employed for dataset invariant learning. The proposed approach not only achieves state-of-the-art results on several commonly used face recognition validation sets, including LFW, CFP-FP, and AgeDB-30, but also shows great benefit for practical use.
翻译:要在面对面的识别中取得良好的表现,通常需要一个大规模的培训数据集。一个简单而有效的提高识别性的方法是使用尽可能大的数据集,将培训中的多个数据集组合起来。然而,由于两大问题,将不同的数据集天真地结合成不同的数据集是成问题和麻烦的。首先,同一人可能出现在不同的数据集中,导致不同数据集之间的身份重叠问题。在培训期间将同一人作为不同数据集的不同类别对待,将影响后变换,并生成不具有代表性的嵌入。另一方面,手工清理标签可能需要巨大的人类努力,特别是当有数百万个图像和数千个身份时。第二,在不同情况下收集不同的数据集,从而导致不同的域分布。同一人可能出现在不同的数据集中,从而导致不同数据集之间的差异性化嵌入问题。在这个文件中,我们建议DAIL:数据设置-Aware and Involitle Learning 解决上述问题,为了解决身份重叠的第一个问题,我们提议对数字变换的系统进行一个数据变换的系统确认,对于数据变换数据系统来说,这种变式的系统将显示一个数据变式的数据变式数据变式的版本。