重新标签图像网络: 从单标签到多标签, 从全球标签到本地化标签 (Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels)

ImageNet has been arguably the most popular image classification benchmark, but it is also the one with a significant level of label noise. Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark. They have thus proposed to turn ImageNet evaluation into a multi-label task, with exhaustive multi-label annotations per image. However, they have not fixed the training set, presumably because of a formidable annotation cost. We argue that the mismatch between single-label annotations and effectively multi-label images is equally, if not more, problematic in the training setup, where random crops are applied. With the single-label annotations, a random crop of an image may contain an entirely different object from the ground truth, introducing noisy or even incorrect supervision during training. We thus re-label the ImageNet training set with multi-labels. We address the annotation cost barrier by letting a strong image classifier, trained on an extra source of data, generate the multi-labels. We utilize the pixel-wise multi-label predictions before the final pooling layer, in order to exploit the additional location-specific supervision signals. Training on the re-labeled samples results in improved model performances across the board. ResNet-50 attains the top-1 classification accuracy of 78.9% on ImageNet with our localized multi-labels, which can be further boosted to 80.2% with the CutMix regularization. We show that the models trained with localized multi-labels also outperforms the baselines on transfer learning to object detection and instance segmentation tasks, and various robustness benchmarks. The re-labeled ImageNet training set, pre-trained weights, and the source code are available at {https://github.com/naver-ai/relabel_imagenet}.

翻译：图像网络可以说是最受欢迎的图像分类基准 {, 但是它也是一个非常受欢迎的图像分类基准。最近的研究显示, 许多样本包含多个类, 尽管假设是一个单一标签基准。因此, 他们提议将图像网络评价转换成一个多标签任务, 每个图像都有详尽的多标签说明。但是, 他们还没有固定培训数据集, 可能是因为一个可怕的注释成本。我们争辩说, 单标签说明和有效多标签图像之间的不匹配, 如果不是更多的话, 在使用随机作物的训练设置中同样( 如果不是更多的话) 存在问题。有了单标签说明, 一个图像的随机特性可能包含一个与地面真相完全不同的对象。在培训中引入噪音甚至不正确的监管。因此我们用多标签重新标签将图像网络培训设置成多标签。我们用一个强大的图像前分类器, 生成多标签。我们使用像素的多标签预测, 在最后的集合层之前, 我们使用多标签的多标签基准, 以便利用更多具体位置的缩略图。

相关内容

ImageNet (数据集)

关注 21

ImageNet项目是一个用于视觉对象识别软件研究的大型可视化数据库。超过1400万的图像URL被ImageNet手动注释，以指示图片中的对象;在至少一百万个图像中，还提供了边界框。ImageNet包含2万多个类别; [2]一个典型的类别，如“气球”或“草莓”，包含数百个图像。第三方图像URL的注释数据库可以直接从ImageNet免费获得;但是，实际的图像不属于ImageNet。自2010年以来，ImageNet项目每年举办一次软件比赛，即ImageNet大规模视觉识别挑战赛（ILSVRC），软件程序竞相正确分类检测物体和场景。 ImageNet挑战使用了一个“修剪”的1000个非重叠类的列表。2012年在解决ImageNet挑战方面取得了巨大的突破，被广泛认为是2010年的深度学习革命的开始。

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【Google-CMU】元伪标签的元学习，Meta Pseudo Labels

专知会员服务

32+阅读 · 2020年3月30日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日