Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudo labels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning methods for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we have a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make substantial progress.
翻译:自我监督的学习由于能够避免批注大型数据集的成本而获得受欢迎。 它能够采用自定义的假标签作为监督,并使用学到的下游任务的演示。 具体地说, 对比学习最近已成为计算机视觉、自然语言处理(NLP)和其他领域的自监督学习方法的主要组成部分。 它旨在将同一样本的扩大版本嵌入彼此接近的样本中,同时试图从不同的样本中推开嵌入。 本文对采用对比方法的自监督方法进行了广泛的审查。 这项工作解释了常用的借口任务,在对比式学习结构中进行了解释,随后又提出了迄今为止已经提出的不同结构。 其次,我们对多种下游任务的不同方法进行了业绩比较,例如图像分类、对象探测和行动识别。 最后,我们的结论是目前方法的局限性,以及需要进一步技术和今后的方向,才能取得实质性进展。