Although deep face recognition benefits significantly from large-scale training data, a current bottleneck is the labelling cost. A feasible solution to this problem is semi-supervised learning, exploiting a small portion of labelled data and large amounts of unlabelled data. The major challenge, however, is the accumulated label errors through auto-labelling, compromising the training. This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling. Specifically, we introduce a multi-agent method, named GroupNet (GN), to endow our solution with the ability to identify the wrongly labelled samples and preserve the clean samples. We show that GN alone achieves the leading accuracy in traditional supervised face recognition even when the noisy labels take over 50\% of the training data. Further, we develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN. It starts with a small amount of labelled data and consequently conducts high-confidence labelling on a large amount of unlabelled data to boost further training. The more data is labelled by NRoLL, the higher confidence is with the label in the dataset. To evaluate the competitiveness of our method, we run NRoLL with a rough condition that only one-fifth of the labelled MSCeleb is available and the rest is used as unlabelled data. On a wide range of benchmarks, our method compares favorably against the state-of-the-art methods.
翻译:尽管从大规模培训数据中获得深刻的认知,但目前的瓶颈是标签成本。这个问题的一个可行解决办法是半监督学习,利用一小部分贴标签数据和大量未贴标签数据。然而,主要挑战是通过自动标签累积标签错误,损害培训。本文是半监督面识别的有效解决方案,它对于自动标签所引发的标签噪音是强有力的。具体地说,我们引入了一个名为GroupNet(GroupNet(GN)的多试剂方法,以最终实现我们的解决办法,从而有能力识别标签标签标签错误的样品并保存干净的样品。我们表明,即使音响标签占用培训数据50%以上,仅GNGN就能在传统监督的面部识别中取得领先的准确性。此外,我们开发了一个半监督面识别解决方案,名为Noise Robust Learning-Labell(NROLLL),它以GNGN增强的培训能力为基础。它从少量的标签数据开始,从而最终在不贴标签的五分级的样本中进行高度的标签标签。我们用一个标签上的数据是进一步的标签上的数据。