In this work we present point-level region contrast, a self-supervised pre-training approach for the task of object detection. This approach is motivated by the two key factors in detection: localization and recognition. While accurate localization favors models that operate at the pixel- or point-level, correct recognition typically relies on a more holistic, region-level view of objects. Incorporating this perspective in pre-training, our approach performs contrastive learning by directly sampling individual point pairs from different regions. Compared to an aggregated representation per region, our approach is more robust to the change in input region quality, and further enables us to implicitly improve initial region assignments via online knowledge distillation during training. Both advantages are important when dealing with imperfect regions encountered in the unsupervised setting. Experiments show point-level region contrast improves on state-of-the-art pre-training methods for object detection and segmentation across multiple tasks and datasets, and we provide extensive ablation studies and visualizations to aid understanding. Code will be made available.
翻译:在这项工作中,我们展示了点点区域对比,一种自我监督的物体探测任务培训前方法。这一方法的动机是检测的两个关键因素:本地化和识别。准确的本地化有利于在像素或点一级运作的模型,正确的识别通常依赖于对物体的更全面、区域一级的视角。将这一视角纳入培训前,我们的方法通过对不同区域的点对进行直接抽样抽样来进行对比性学习。与每个区域的总体代表性相比,我们的方法对输入区域质量的变化更加有力,并使我们能够通过培训期间的在线知识蒸馏来默示改善初始区域任务。在处理未受监督环境中遇到的不完善区域时,两种优势都很重要。实验显示,点一级区域在对不同任务和数据集的物体探测和分化的先进培训前方法上取得了改进,我们提供了广泛的对比研究和可视化以帮助理解的代码。