Recently, face recognition in the wild has achieved remarkable success and one key engine is the increasing size of training data. For example, the largest face dataset, WebFace42M contains about 2 million identities and 42 million faces. However, a massive number of faces raise the constraints in training time, computing resources, and memory cost. The current research on this problem mainly focuses on designing an efficient Fully-connected layer (FC) to reduce GPU memory consumption caused by a large number of identities. In this work, we relax these constraints by resolving the redundancy problem of the up-to-date face datasets caused by the greedily collecting operation (i.e. the core-set selection perspective). As the first attempt in this perspective on the face recognition problem, we find that existing methods are limited in both performance and efficiency. For superior cost-efficiency, we contribute a novel filtering strategy dubbed Face-NMS. Face-NMS works on feature space and simultaneously considers the local and global sparsity in generating core sets. In practice, Face-NMS is analogous to Non-Maximum Suppression (NMS) in the object detection community. It ranks the faces by their potential contribution to the overall sparsity and filters out the superfluous face in the pairs with high similarity for local sparsity. With respect to the efficiency aspect, Face-NMS accelerates the whole pipeline by applying a smaller but sufficient proxy dataset in training the proxy model. As a result, with Face-NMS, we successfully scale down the WebFace42M dataset to 60% while retaining its performance on the main benchmarks, offering a 40% resource-saving and 1.64 times acceleration. The code is publicly available for reference at https://github.com/HuangJunJie2017/Face-NMS.
翻译:最近,野生的表面认知已经取得了显著的成功,一个关键引擎是培训数据的规模正在扩大。例如,最大的面对面数据集,WebFace42M包含大约200万个身份和4 200万个面孔。然而,大量面孔增加了培训时间、计算资源和记忆成本方面的制约因素。目前对这一问题的研究主要侧重于设计一个高效的全网层(FC),以减少大量身份造成的GPU内存消耗。在这项工作中,我们通过解决40个最新脸数据集的冗余问题来放松这些限制。例如,Feb-Face42M包含大约200万个身份和4 200万个面孔。作为这一视角的第一次尝试,我们发现在面对面识别问题上,现有方法限制了业绩和效率。为了更高的成本效益,我们贡献了一个全新的过滤战略,Face-NMS在特性空间上工作,同时考虑创建核心数据集的地方和全球模型。在实践中,Face-NMS类似于在目标检测中的“NMS-MS”基准(NMS)类似于“NM-imimum ”基准(NMS),在目标检测界的60个节路标选择中,S-deal-deal-dealal-deal-deal-de)在充分的S 上,它代表了它的潜在数据质量上,其潜在数据比值为一个潜在的数据比值,它具有一个潜在的数据比值。它的潜在数据比值,它具有一个潜在的数据比值。它向一个潜在的工具,它向一个潜在的工具,它向上,它向上,它向上,它向上,它向着向上,它向着向着向着向着向着向整个,它向上,它提供了一种潜在的数据比值。它向上,它提供了一个潜在的一个比,它向上一个比,它向一个潜在的一个比值。它向一个比值。它向着向着向一个比。