Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation. In this paper, we notice that some other useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift that is too large to be aligned well. This observation motivates us to relax the exact alignment objective to better cultivate stronger augmentations. Taking image rotation as a case study, we develop a generic approach, Pretext-aware Residual Relaxation (Prelax), that relaxes the exact alignment by allowing an adaptive residual vector between different views and encoding the semantic shift through pretext-aware learning. Extensive experiments on different backbones show that our method can not only improve multi-view methods with existing augmentations, but also benefit from stronger image augmentations like rotation.
翻译:多视图方法通过对同一图像的多重观点及其性能进行匹配来学习表达方式,这在很大程度上取决于数据增强的选择。 在本文中,我们注意到一些其他有用的增强手段,如图像旋转,对多视图方法有害,因为它们导致语义变化太大,无法很好地匹配。 这一观察促使我们放松精确的调整目标,以更好地培养更强的增强力。 以图像旋转为案例研究,我们开发了一种通用方法,即先知后知后识的残余放松(Prelax ), 通过允许不同观点之间的适应性剩余矢量调整,并通过借口认知学习将语义变化编码起来,放松了精确的调整。 在不同骨干上的广泛实验显示,我们的方法不仅可以改进现有增强力的多视图方法,还可以从更强的图像增强(如旋转)中受益。