自监督学习的阶梯性质 (On the stepwise nature of self-supervised learning)

We present a simple picture of the training process of self-supervised learning methods with joint embedding networks. We find that these methods learn their high-dimensional embeddings one dimension at a time in a sequence of discrete, well-separated steps. We arrive at this conclusion via the study of a linearized model of Barlow Twins applicable to the case in which the trained network is infinitely wide. We solve the training dynamics of this model from small initialization, finding that the model learns the top eigenmodes of a certain contrastive kernel in a stepwise fashion, and obtain a closed-form expression for the final learned representations. Remarkably, we then see the same stepwise learning phenomenon when training deep ResNets using the Barlow Twins, SimCLR, and VICReg losses. Our theory suggests that, just as kernel regression can be thought of as a model of supervised learning, \textit{kernel PCA} may serve as a useful model of self-supervised learning.

翻译：我们提出了一个简单的模型，描述联合嵌入网络的自监督学习方法的训练过程。我们发现这些方法以一种离散、分离的步骤来逐一学习它们的高维嵌入。通过对以上模型进行线性化，我们得到了Barlow Twins模型在无限宽度情况下的训练动态。我们从小的初始化条件开始解决了该模型的训练方式。我们发现，该模型通过学习某个对比度核的最高特征模式，以阶梯式方式学习了这个核的表示，并得到了其最终学习到的表示的封闭形式表达式。令人惊奇的是，当使用Barlow Twins、SimCLR和VICReg损失训练深度ResNets时，我们看到相同的阶梯式学习现象。我们的理论表明，就像核回归可以被认为是监督学习的模型一样，\textit{核PCA}也可以作为自监督学习的有用模型。

相关内容

监督学习

关注 131

监督学习是指：利用一组已知类别的样本调整分类器的参数，使其达到所要求性能的过程，也称为监督训练或有教师学习。监督学习是从标记的训练数据来推断一个功能的机器学习任务。训练数据包括一套训练示例。在监督学习中，每个实例都是由一个输入对象（通常为矢量）和一个期望的输出值（也称为监督信号）组成。监督学习算法是分析该训练数据，并产生一个推断的功能，其可以用于映射出新的实例。一个最佳的方案将允许该算法来正确地决定那些看不见的实例的类标签。这就要求学习算法是在一种“合理”的方式从一种从训练数据到看不见的情况下形成。

【ICML2022】深度神经网络中的特征学习与信号传播

专知会员服务

26+阅读 · 2022年6月2日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日