We present a simple picture of the training process of self-supervised learning methods with joint embedding networks. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We solve the training dynamics of this model from small initialization, finding that the model learns the top eigenmodes of a certain contrastive kernel in a stepwise fashion, and obtain a closed-form expression for the final learned representations. Remarkably, we then see the same stepwise learning phenomenon when training deep ResNets using the Barlow Twins, SimCLR, and VICReg losses. Our theory suggests that, just as kernel regression can be thought of as a model of supervised learning, \textit{kernel PCA} may serve as a useful model of self-supervised learning.
翻译:我们提出了一个简单的模型,描述联合嵌入网络的自监督学习方法的训练过程。我们发现这些方法以一种离散、分离的步骤来逐一学习它们的高维嵌入。通过对以上模型进行线性化,我们得到了Barlow Twins模型在无限宽度情况下的训练动态。我们从小的初始化条件开始解决了该模型的训练方式。我们发现,该模型通过学习某个对比度核的最高特征模式,以阶梯式方式学习了这个核的表示,并得到了其最终学习到的表示的封闭形式表达式。令人惊奇的是,当使用Barlow Twins、SimCLR和VICReg损失训练深度ResNets时,我们看到相同的阶梯式学习现象。我们的理论表明,就像核回归可以被认为是监督学习的模型一样,\textit{核PCA}也可以作为自监督学习的有用模型。