Localizing objects in image collections without supervision can help to avoid expensive annotation campaigns. We propose a simple approach to this problem, that leverages the activation features of a vision transformer pre-trained in a self-supervised manner. Our method, LOST, does not require any external object proposal nor any exploration of the image collection; it operates on a single image. Yet, we outperform state-of-the-art object discovery methods by up to 8 CorLoc points on PASCAL VOC 2012. We also show that training a class-agnostic detector on the discovered objects boosts results by another 7 points. Moreover, we show promising results on the unsupervised object discovery task. The code to reproduce our results can be found at https://github.com/valeoai/LOST.
翻译:在图像收藏中不经监督地定位对象可有助于避免昂贵的批注运动。 我们提出一个简单的方法来解决这一问题, 以自我监督的方式利用预先训练过的视觉变压器的启动功能。 我们的方法LOST, 不需要任何外部对象提议, 也不需要对图像收藏进行任何探索; 它以单一图像运作。 然而, 我们通过在 PASAL VOC 2012 上达到 8 个 CorLoc 点, 取得了优于最先进的物体发现方法 。 我们还表明, 对发现物体的班级知觉器进行培训能将结果提升到另外7 个点 。 此外, 我们在未监督对象发现任务上展示了有希望的结果 。 复制结果的代码可以在 https://gitub.com/ valeoai/ LOST 上找到 。