推展自我监督的ResNet的极限:如果没有图像网络的标签,我们能否超过监督的学习? (Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?)

Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from Mitrovic et al., 2021, we propose ReLICv2 which combines an explicit invariance loss with a contrastive objective over a varied set of appropriately constructed data views. ReLICv2 achieves 77.1% top-1 classification accuracy on ImageNet using linear evaluation with a ResNet50 architecture and 80.6% with larger ResNet models, outperforming previous state-of-the-art self-supervised approaches by a wide margin. Most notably, ReLICv2 is the first representation learning method to consistently outperform the supervised baseline in a like-for-like comparison using a range of standard ResNet architectures. Finally we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.

翻译：尽管在与剩余网络进行代表学习方面最近通过自我监督的方法取得了进展,但它们仍然在图像网络分类基准的监督下学习方面表现不佳,限制了其在性能临界环境中的适用性。基于Mitrovic等人(2021年)先前的理论见解,我们提议RLICv2, 将明显易损和对比目标结合到一套不同的适当构建的数据视图中。 ReLICv2在图像网络上实现了77.1%的最高至1分类精确度, 使用了ResNet50结构的线性评价, 以及80.6%的大型ResNet模型, 表现得比以前最先进的自我监督方法要好得多。最显著的是, ReLICv2 是第一个使用一系列标准ResNet结构进行类似比较, 持续超越受监督基线的代理学习方法。最后,我们表明,尽管使用了ResNet encers, ReLICv2 与最先进的自我监督的视觉变异器相比, 但ReLICV2 却达到了77.1 % 。

相关内容

ImageNet (数据集)

关注 21

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日