Transfer learning from large-scale pre-trained models has become essential for many computer vision tasks. Recent studies have shown that datasets like ImageNet are weakly labeled since images with multiple object classes present are assigned a single label. This ambiguity biases models towards a single prediction, which could result in the suppression of classes that tend to co-occur in the data. Inspired by language emergence literature, we propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels using the framework of iterated learning. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions through successive generations of teacher and student networks with a learning bottleneck. Experiments show that our approach exhibits systematic benefits on ImageNet accuracy as well as ReaL F1 score, which indicates that MILe deals better with label ambiguity than the standard training procedure, even when fine-tuning from self-supervised weights. We also show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision. Furthermore, MILe improves performance in class incremental settings such as IIRC and it is robust to distribution shifts. Code: https://github.com/rajeswar18/MILe
翻译:从大规模预先培训的模型中进行转移学习对于许多计算机视觉任务至关重要。最近的研究表明,像图像网络这样的数据集标签薄弱,因为有多个对象类的图像被分配到一个单一标签。这种模棱两可的偏向模型对单一预测,可能导致抑制倾向于在数据中共出现的类。在语言出现文献的启发下,我们提议多标签迭代学习(MILe),以纳入使用迭代学习框架从单一标签学习多标签的感应偏差。MILe是一个简单而有效的程序,通过连续几代教师和学生网络以学习瓶颈的方式传播二进制预测,对图像进行多标签描述。实验显示,我们的方法在图像网络精度和ReaLF1评分上表现出系统性的好处,这表明,即使从自我控制重量的微调,MILeLe是一个有效的标签噪音减少效果,在现实-RC 大型/ MII 级中实现状态和状态的二进化性表现,例如:LELEVA/II级的递增性数据。