Self-supervision has emerged as a propitious method for visual representation learning after the recent paradigm shift from handcrafted pretext tasks to instance-similarity based approaches. Most state-of-the-art methods enforce similarity between various augmentations of a given image, while some methods additionally use contrastive approaches to explicitly ensure diverse representations. While these approaches have indeed shown promising direction, they require a significantly larger number of training iterations when compared to the supervised counterparts. In this work, we explore reasons for the slow convergence of these methods, and further propose to strengthen them using well-posed auxiliary tasks that converge significantly faster, and are also useful for representation learning. The proposed method utilizes the task of rotation prediction to improve the efficiency of existing state-of-the-art methods. We demonstrate significant gains in performance using the proposed method on multiple datasets, specifically for lower training epochs.
翻译:在最近从手工制造的借口任务转向以实例相似性为基础的方法的范式转变之后,自我监督已成为视觉代表学习的有利方法。大多数最先进的方法在特定图像的各种增强之间实行相似性,而有些方法则另外采用对比性方法,以明确确保不同的表达方式。虽然这些方法确实显示了有希望的方向,但与受监督的对应方相比,它们需要大量的培训迭代。在这项工作中,我们探讨了这些方法趋同缓慢的原因,并进一步提议使用精心安排的辅助任务来加强这些方法,这些辅助任务聚集得很快,而且对代表性学习也有用。拟议方法利用轮值预测任务来提高现有最新方法的效率。我们展示了在使用多数据集的拟议方法,特别是低级培训方法方面所取得的显著成绩。