This paper starts by revealing a surprising finding: without any learning, a randomly initialized CNN can localize objects surprisingly well. That is, a CNN has an inductive bias to naturally focus on objects, named as Tobias (``The object is at sight'') in this paper. This empirical inductive bias is further analyzed and successfully applied to self-supervised learning. A CNN is encouraged to learn representations that focus on the foreground object, by transforming every image into various versions with different backgrounds, where the foreground and background separation is guided by Tobias. Experimental results show that the proposed Tobias significantly improves downstream tasks, especially for object detection. This paper also shows that Tobias has consistent improvements on training sets of different sizes, and is more resilient to changes in image augmentations. Our codes will be available at https://github.com/CupidJay/Tobias.
翻译:本文首先揭示出一个令人惊讶的发现:一个随机初始的CNN可以将对象定位得惊人。 也就是说, CNN在本文中带有自然聚焦对象的感应偏向性, 名为 Tobias (“ 对象就在眼前 ” ) 。 这个实验性诱导偏向进一步分析, 并成功地应用于自我监督的学习。 CNN鼓励一个CNN通过将每个图像转换为不同背景的不同版本, 其前台和背景分离由Tobias指导。 实验结果显示, 拟议的Tobias 显著改进了下游任务, 特别是对象探测任务。 本文还显示, Tobias 在不同尺寸的培训组合方面不断改进, 并且更适应图像增强的变化。 我们的代码将在 https://github. com/ CupidJay/ Tobias 上查阅 。