Contrastive learning has emerged as a competitive pretraining method for object detection. Despite this progress, there has been minimal investigation into the robustness of contrastively pretrained detectors when faced with domain shifts. To address this gap, we conduct an empirical study of contrastive learning and out-of-domain object detection, studying how contrastive view design affects robustness. In particular, we perform a case study of the detection-focused pretext task Instance Localization (InsLoc) and propose strategies to augment views and enhance robustness in appearance-shifted and context-shifted scenarios. Amongst these strategies, we propose changes to cropping such as altering the percentage used, adding IoU constraints, and integrating saliency based object priors. We also explore the addition of shortcut-reducing augmentations such as Poisson blending, texture flattening, and elastic deformation. We benchmark these strategies on abstract, weather, and context domain shifts and illustrate robust ways to combine them, in both pretraining on single-object and multi-object image datasets. Overall, our results and insights show how to ensure robustness through the choice of views in contrastive learning.
翻译:尽管取得了这一进展,但我们对在面对域变时有对比的预先训练的探测器是否可靠进行了最低限度的调查。为了解决这一差距,我们还对对比性学习和外出物体探测进行了经验性研究,研究对比性视觉设计如何影响稳健性。特别是,我们对以探测为重点的托辞任务进行了案例研究,即地方化(InsLoc),并提出了在外观变换和环境变换情景中增加观点和增强稳健性的战略。在这些战略中,我们建议改变裁剪裁幅,例如改变使用的百分比,增加IoU限制,并整合基于突出对象的前期。我们还探讨了增加减速增量,如Poisson混合、纹质平坦和弹性变形等。我们将这些战略以抽象、天气和上下文领域变换为基准,并在单位图象和多位变形图集的预培训中展示了有力的组合方法。总体而言,我们的成果和洞察显示如何通过选择对比性学习中的观点确保稳健性。