大规模数据集是否对自我监督的预培训是必要的? (Are Large-scale Datasets Necessary for Self-Supervised Pre-training?)

Pre-training models on large scale datasets, like ImageNet, is a standard practice in computer vision. This paradigm is especially effective for tasks with small training sets, for which high-capacity models tend to overfit. In this work, we consider a self-supervised pre-training scenario that only leverages the target task data. We consider datasets, like Stanford Cars, Sketch or COCO, which are order(s) of magnitude smaller than Imagenet. Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings.We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains. On COCO, when pre-training solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting.

翻译：大型数据集的培训前模型,如图像网络,是计算机愿景的标准做法。对于小型培训组的任务来说,这种模式特别有效,因为高容量模型往往会过度适应。在这项工作中,我们考虑的是自我监督的培训前假设情景,它只能利用目标任务数据。我们考虑的是像斯坦福汽车、Strach或COCO这样的数据集,它们比图像网规模小得多。我们的研究显示,取消自动编码器,例如BeiT或本文中介绍的变体,对于培训前数据的类型和规模来说,比通过比较图像嵌入所培训的流行的自我监督方法更为有力。我们取得了与图像网络关于不同领域各种分类数据集的预先培训相比的竞争性业绩。关于COCOCO,当培训前仅使用COCO图像,检测和实例分解性业绩超过在可比环境中监督的图像网络前培训。

相关内容

ImageNet (数据集)

关注 21

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

【UC伯克利】自监督视觉表示学习，356页ppt，Self-Supervised Visual Learning

专知会员服务

66+阅读 · 2021年1月10日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

18+阅读 · 2020年11月17日

【google】监督对比学习，Supervised Contrastive Learning

专知会员服务

32+阅读 · 2020年4月23日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日