Self-supervised learning is a popular and powerful method for utilizing large amounts of unlabeled data, for which a wide variety of training objectives have been proposed in the literature. In this study, we perform a Bayesian analysis of state-of-the-art self-supervised learning objectives and propose a unified formulation based on likelihood learning. Our analysis suggests a simple method for integrating self-supervised learning with generative models, allowing for the joint training of these two seemingly distinct approaches. We refer to this combined framework as GEDI, which stands for GEnerative and DIscriminative training. Additionally, we demonstrate an instantiation of the GEDI framework by integrating an energy-based model with a cluster-based self-supervised learning model. Through experiments on synthetic and real-world data, including SVHN, CIFAR10, and CIFAR100, we show that GEDI outperforms existing self-supervised learning strategies in terms of clustering performance by a wide margin. We also demonstrate that GEDI can be integrated into a neural-symbolic framework to address tasks in the small data regime, where it can use logical constraints to further improve clustering and classification performance.
翻译:自我监督的学习是利用大量未贴标签数据的一种普遍而有力的方法,文献中已经为此提出了各种各样的培训目标。在本研究中,我们对最先进的自监督的学习目标进行巴伊西亚分析,并根据可能学习的情况提出统一的方法。我们的分析提出了一种简单的方法,将自监督的学习与基因化模型结合起来,以便能够对这两种看似不同的方法进行联合培训。我们把这一联合框架称为GEDI, 即GEDI, 即GEDI, 指GEDI, 指GENI 和DISIMI培训。此外,我们通过将基于能源的模型与基于集群的自监督学习模型结合起来,我们展示了GEDI框架的即时化。通过对合成和真实世界数据的实验,包括SVHN、CIFAR10和CIFAR100, 我们表明,GEDI在将现有的自我监督的学习战略与广泛的组合性能结合起来。我们还表明,GEDI可以纳入一个神经系统框架,用以处理小数据系统中的任务,从而可以改进逻辑上的限制。