Deep semi-supervised learning (SSL) has experienced significant attention in recent years, to leverage a huge amount of unlabeled data to improve the performance of deep learning with limited labeled data. Pseudo-labeling is a popular approach to expand the labeled dataset. However, whether there is a more effective way of labeling remains an open problem. In this paper, we propose to label only the most representative samples to expand the labeled set. Representative samples, selected by indegree of corresponding nodes on a directed k-nearest neighbor (kNN) graph, lie in the k-nearest neighborhood of many other samples. We design a graph neural network (GNN) labeler to label them in a progressive learning manner. Aided by the progressive GNN labeler, our deep SSL approach outperforms state-of-the-art methods on several popular SSL benchmarks including CIFAR-10, SVHN, and ILSVRC-2012. Notably, we achieve 72.1% top-1 accuracy, surpassing the previous best result by 3.3%, on the challenging ImageNet benchmark with only $10\%$ labeled data.
翻译:近些年来,深海半监督学习(SSL)受到极大关注,利用大量未贴标签的数据来提高使用有限标签数据进行深层学习的绩效。 Pseudo标签是一种扩大标签数据集的流行方法。 然而,是否有更有效的标签方法仍是一个尚未解决的问题。 在本文中,我们提议只给最具代表性的样本贴上标签,以扩大标签数据集。 代表样本,通过对方向K-最近邻(kNNN)图的相应节点程度选定,位于许多其他样本的最近处。我们设计了一个图表神经网络标签,以渐进学习的方式标记它们。在进步的GNNN标签上,我们的深层SSL方法在几个受欢迎的SL基准上,包括CIFAR-10、SVHN和ILSVRC-2012年。 值得注意的是,我们在具有挑战性的图像网络基准上,我们实现了72.1%的顶端-1的精确度,超过3.3%的最高结果,只有10美元标签数据。