Following the success of supervised learning, semi-supervised learning (SSL) is now becoming increasingly popular. SSL is a family of methods, which in addition to a labeled training set, also use a sizable collection of unlabeled data for fitting a model. Most of the recent successful SSL methods are based on pseudo-labeling approaches: letting confident model predictions act as training labels. While these methods have shown impressive results on many benchmark datasets, a drawback of this approach is that not all unlabeled data are used during training. We propose a new SSL algorithm, DoubleMatch, which combines the pseudo-labeling technique with a self-supervised loss, enabling the model to utilize all unlabeled data in the training process. We show that this method achieves state-of-the-art accuracies on multiple benchmark datasets while also reducing training times compared to existing SSL methods. Code is available at https://github.com/walline/doublematch.
翻译:在监督学习取得成功之后,半监督学习(SSL)现在越来越受欢迎。SSL是一个方法的组合,除了标签式培训外,它还包括一套方法,除了使用标签式培训外,还使用大量无标签数据来安装模型。最近成功的大多数SSL方法都以伪标签方法为基础:让自信模型预测作为培训标签。虽然这些方法在许多基准数据集上显示出令人印象深刻的结果,但这种方法的一个缺点是,在培训期间没有使用所有未标签数据。我们建议采用新的SSL算法,即双匹配法,将伪标签技术与自标式损失结合起来,使模型能够在培训过程中使用所有未标签式数据。我们显示,这种方法在多个基准数据集上达到了最新水平的精度,同时与现有的SLS方法相比,也减少了培训时间。我们可在https://github.com/walline/breumatch查阅代码。