We evaluate the effectiveness of semi-supervised learning (SSL) on a realistic benchmark where data exhibits considerable class imbalance and contains images from novel classes. Our benchmark consists of two fine-grained classification datasets obtained by sampling classes from the Aves and Fungi taxonomy. We find that recently proposed SSL methods provide significant benefits, and can effectively use out-of-class data to improve performance when deep networks are trained from scratch. Yet their performance pales in comparison to a transfer learning baseline, an alternative approach for learning from a few examples. Furthermore, in the transfer setting, while existing SSL methods provide improvements, the presence of out-of-class is often detrimental. In this setting, standard fine-tuning followed by distillation-based self-training is the most robust. Our work suggests that semi-supervised learning with experts on realistic datasets may require different strategies than those currently prevalent in the literature.
翻译:我们评估了在现实基准上的半监督学习(SSL)的有效性,在现实基准上,数据显示出相当严重的阶级不平衡,并包含来自新类的图像。我们的基准包括两个精细的分类数据集,由Aves和Fungi分类学抽样分类组获得。我们发现,最近提出的SSL方法提供了巨大的好处,并且可以有效地利用课外数据来提高深层网络从零开始接受培训时的绩效。然而,与转移学习基线相比,它们的绩效不如转移学习基线,而转移基线则是从几个例子中学习的替代方法。此外,在转让设置中,虽然现有的SSL方法提供了改进,但校外存在往往有害。在这一设置中,标准微调以及基于蒸馏的自我培训是最有力的。我们的工作表明,与专家就现实数据集进行半监督学习可能需要不同于文献中目前流行的战略。