Self-Supervised Learning (SSL) has emerged as the solution of choice to learn transferable representations from unlabeled data. However, SSL requires to build samples that are known to be semantically akin, i.e. positive views. Requiring such knowledge is the main limitation of SSL and is often tackled by ad-hoc strategies e.g. applying known data-augmentations to the same input. In this work, we generalize and formalize this principle through Positive Active Learning (PAL) where an oracle queries semantic relationships between samples. PAL achieves three main objectives. First, it unveils a theoretically grounded learning framework beyond SSL, that can be extended to tackle supervised and semi-supervised learning depending on the employed oracle. Second, it provides a consistent algorithm to embed a priori knowledge, e.g. some observed labels, into any SSL losses without any change in the training pipeline. Third, it provides a proper active learning framework yielding low-cost solutions to annotate datasets, arguably bringing the gap between theory and practice of active learning that is based on simple-to-answer-by-non-experts queries of semantic relationships between inputs.
翻译:自监督学习(Self-Supervised Learning, SSL)已经成为从无标签数据中学习可传递表示的首选解决方案。但是,SSL需要构建已知语义上相似的样本,即正面视图。需要这种知识是SSL的主要限制,并经常通过特定策略(例如将已知的数据增强应用于同一输入)来解决。在这项工作中,我们通过积极正学习(PAL)推广和形式化了这个理论原理,其中一个oracle查询样本之间的语义关系。PAL实现了三个主要目标。首先,它揭示了一个超越SSL的理论基础的学习框架,可以根据所使用的oracle扩展到解决监督学习和半监督学习。其次,它提供了一种一致的算法,将先验知识,例如某些观察到的标签,嵌入到任何SSL损失中,而不需要改变训练流程。第三,它提供了一个适当的主动学习框架,使得简单查询相似关系的语义关系(例如由非专业人员回答)的低成本解决方案用于注释数据集,可以认为缩小了基于实践的主动学习的理论和实践之间的差距。