A common classification task situation is where one has a large amount of data available for training, but only a small portion is annotated with class labels. The goal of semi-supervised training, in this context, is to improve classification accuracy by leverage information not only from labeled data but also from a large amount of unlabeled data. Recent works have developed significant improvements by exploring the consistency constrain between differently augmented labeled and unlabeled data. Following this path, we propose a novel unsupervised objective that focuses on the less studied relationship between the high confidence unlabeled data that are similar to each other. The new proposed Pair Loss minimizes the statistical distance between high confidence pseudo labels with similarity above a certain threshold. Combining the Pair Loss with the techniques developed by the MixMatch family, our proposed SimPLE algorithm shows significant performance gains over previous algorithms on CIFAR-100 and Mini-ImageNet, and is on par with the state-of-the-art methods on CIFAR-10 and SVHN. Furthermore, SimPLE also outperforms the state-of-the-art methods in the transfer learning setting, where models are initialized by the weights pre-trained on ImageNet or DomainNet-Real. The code is available at github.com/zijian-hu/SimPLE.
翻译:常见的分类任务情况是,一个人拥有大量可用于培训的数据,但只有一小部分数据带有等级标签。在这方面,半监督培训的目标是通过利用不仅来自标签数据的信息,而且来自大量未标签数据的信息,提高分类准确性。最近的工作有了重大改进,探索了不同扩大的标签数据和未标签数据之间的一致性限制。沿着这条路径,我们提出了一个新的、不受监督的目标,重点是研究程度较低的互不相类似的高信任无标签数据之间的关系。新提议的 Pair Loss 将高度信任假标签与某一阈值相近之间的统计距离降到最低。将Pair Loss与MixMatch家族开发的技术相结合,我们拟议的SimPLE算法显示,比以前CFAR-100和Mini-Imaget的算法取得了显著的绩效收益。我们提出的这个目标与CIFAR-10和SVHN的先进方法相当。此外,SimPLE还超越了在MER/RegIW中的现有状态前/高级方法。