We propose Cut-and-LEaRn (CutLER), a simple approach for training unsupervised object detection and segmentation models. We leverage the property of self-supervised models to 'discover' objects without supervision and amplify it to train a state-of-the-art localization model without any human labels. CutLER first uses our proposed MaskCut approach to generate coarse masks for multiple objects in an image and then learns a detector on these masks using our robust loss function. We further improve the performance by self-training the model on its predictions. Compared to prior work, CutLER is simpler, compatible with different detection architectures, and detects multiple objects. CutLER is also a zero-shot unsupervised detector and improves detection performance AP50 by over 2.7 times on 11 benchmarks across domains like video frames, paintings, sketches, etc. With finetuning, CutLER serves as a low-shot detector surpassing MoCo-v2 by 7.3% APbox and 6.6% APmask on COCO when training with 5% labels.
翻译:我们提出Cut-LaRn (Cut-LaRn) (Cut-La-LaRn), 这是一种用于培训不受监督的物体探测和分离模型的简单方法。 我们利用自我监督模型的特性来“发现”物体,而没有监督, 并扩展它来培训没有人类标签的先进本地化模型。 CutLER首先使用我们提议的MaskCut方法为图像中的多个物体生成粗糙的遮罩,然后利用我们强大的损失功能在这些遮罩上学习探测器。 我们通过对模型的预测进行自我培训,进一步改进模型的性能。 与先前的工作相比, CutLER更为简单, 与不同的探测结构相容, 并探测多个物体。 CutLER还是一个零发式的未经监督的探测器, 并在视频框架、 绘画、 草图等11个基准上提高AP50的性能, 超过2.7次。 微调后, CutLER作为低频探测器, 超过0.3% APbox 和 6.6% APmask CO 的CO 上 CO 的低频探测器。