Unsupervised representation learning has recently received lots of interest due to its powerful generalizability through effectively leveraging large-scale unlabeled data. There are two prevalent approaches for this, contrastive learning and generative pre-training, where the former learns representations from instance-wise discrimination tasks and the latter learns them from estimating the likelihood. These seemingly orthogonal approaches have their own strengths and weaknesses. Contrastive learning tends to extract semantic information and discards details irrelevant for classifying objects, making the representations effective for discriminative tasks while degrading robustness to out-of-distribution data. On the other hand, the generative pre-training directly estimates the data distribution, so the representations tend to be robust but not optimal for discriminative tasks. In this paper, we show that we could achieve the best of both worlds by a hybrid training scheme. Specifically, we demonstrated that a transformer-based encoder-decoder architecture trained with both contrastive and generative losses can learn highly discriminative and robust representations without hurting the generative performance. We extensively validate our approach on various tasks.
翻译:未经监督的代言学习最近由于通过有效地利用大规模无标签数据得到有力的通用性而引起了很大的兴趣。在这方面,有两种普遍的办法,即对比性学习和基因化培训前的预培训,前者从实例歧视任务中学习代表,后者从估计可能性中学习。这些看起来正统的方法本身有其优点和弱点。相反的学习往往提取出与物体分类无关的语义信息和抛弃细节,使表述对歧视性任务有效,同时降低分配数据外的稳健性。另一方面,基因化的预培训直接估算了数据分布,因此,这种表述倾向于稳健,但对于歧视性任务来说并非最佳。在本文中,我们表明我们可以通过混合培训计划实现两个世界的最佳。具体地说,我们证明,经过对比性和基因化损失训练的变异器变异器变异器结构可以学到高度歧视性和稳健的表述,而不会损害基因化性能。我们广泛验证了我们在各种任务上的做法。