Supervised learning of image classifiers distills human knowledge into a parametric model through pairs of images and corresponding labels (X,Y). We argue that this simple and widely used representation of human knowledge neglects rich auxiliary information from the annotation procedure, such as the time-series of mouse traces and clicks left after image selection. Our insight is that such annotation byproducts Z provide approximate human attention that weakly guides the model to focus on the foreground cues, reducing spurious correlations and discouraging shortcut learning. To verify this, we create ImageNet-AB and COCO-AB. They are ImageNet and COCO training sets enriched with sample-wise annotation byproducts, collected by replicating the respective original annotation tasks. We refer to the new paradigm of training models with annotation byproducts as learning using annotation byproducts (LUAB). We show that a simple multitask loss for regressing Z together with Y already improves the generalisability and robustness of the learned models. Compared to the original supervised learning, LUAB does not require extra annotation costs. ImageNet-AB and COCO-AB are at https://github.com/naver-ai/NeglectedFreeLunch.
翻译:----
图像分类器的监督学习通过图像对和相应的标签(X,Y)将人类知识提炼成参数模型。我们认为这种简单而广泛使用的人类知识表示方式忽略了注释过程的丰富辅助信息,例如图像选择后留下的鼠标轨迹和点击的时间序列注释副产品 Z 。我们的见解是,这种注释副产品 Z 提供了近似的人类注意力,可以弱化模型关注前景线索,减少错误相关性并阻止捷径学习。为了验证这一点,我们创建了 ImageNet-AB 和 COCO-AB。它们是通过复制相应原始注释任务收集的样本级注释副产品来丰富 ImageNet 和 COCO 训练集得到的。我们称利用注释副产品训练模型的新范式为 LUAB。我们证明,通过与 Y 一起回归 Z 的简单多任务损失已经改善了学习模型的普适性和鲁棒性。与原始监督学习相比,LUAB 不需要额外的注释成本。ImageNet-AB 和 COCO-AB 可在 https://github.com/naver-ai/NeglectedFreeLunch 上找到。