This paper starts by revealing a surprising finding: without any learning, a randomly initialized CNN can localize objects surprisingly well. That is, a CNN has an inductive bias to naturally focus on objects, named as Tobias ("The object is at sight") in this paper. This empirical inductive bias is further analyzed and successfully applied to self-supervised learning (SSL). A CNN is encouraged to learn representations that focus on the foreground object, by transforming every image into various versions with different backgrounds, where the foreground and background separation is guided by Tobias. Experimental results show that the proposed Tobias significantly improves downstream tasks, especially for object detection. This paper also shows that Tobias has consistent improvements on training sets of different sizes, and is more resilient to changes in image augmentations. Code is available at https://github.com/CupidJay/Tobias.
翻译:本文首先揭示出一个令人惊讶的发现:一个随机初始的CNN可以将对象定位得惊人。 也就是说, CNN对自然聚焦物体有感应偏向性偏向, 在本文中被命名为 Tobias (“ 对象就在眼前 ” ) 。 这种实验性诱导偏向会进一步分析并成功应用于自我监督的学习( SSL ) 。 CNN鼓励一个CNN通过将每个图像转换为具有不同背景的不同版本,使前台和背景分离由Tobias 指导的不同版本,来学习以前台和背景分离为焦点的演示。 实验结果显示, 拟议的Tobias 显著改进了下台任务, 特别是对象探测。 本文还显示, Tobias 在不同尺寸的培训组合上不断改进, 并且更适应图像放大的变化。 代码可在 https://github. com/ CupidJay/ Tobias 上查阅 。