Exponential Moving Average (EMA or momentum) is widely used in modern self-supervised learning (SSL) approaches, such as MoCo, for enhancing performance. We demonstrate that such momentum can also be plugged into momentum-free SSL frameworks, such as SimCLR, for a performance boost. Despite its wide use as a fundamental component in modern SSL frameworks, the benefit caused by momentum is not well understood. We find that its success can be at least partly attributed to the stability effect. In the first attempt, we analyze how EMA affects each part of the encoder and reveal that the portion near the encoder's input plays an insignificant role while the latter parts have much more influence. By monitoring the gradient of the overall loss with respect to the output of each block in the encoder, we observe that the final layers tend to fluctuate much more than other layers during backpropagation, i.e. less stability. Interestingly, we show that using EMA to the final part of the SSL encoder, i.e. projector, instead of the whole deep network encoder can give comparable or preferable performance. Our proposed projector-only momentum helps maintain the benefit of EMA but avoids the double forward computation.
翻译:在现代自我监督的学习方法(如MoCo)中,平均指数(EMA或动力)被广泛用于现代自我监督的学习方法(SSL),例如MOCO,以提高绩效。我们证明,这种势头也可以被带入无动力的SSL框架,例如SimCLR,以促进性能。尽管它在现代SSL框架中广泛用作基本组成部分,但动力带来的惠益并没有得到很好的理解。我们发现,其成功至少部分可以归因于稳定性效应。我们第一次尝试分析EMA如何影响编码器的每个部分,并表明在编码器投入附近部分的作用微不足道,而后一部分则影响更大。我们通过监测编码器中每个块产出的总体损失的梯度,我们观察到,在反调整过程中,最后层的波动往往比其他层大得多,即稳定性较低。有趣的是,我们显示,利用EMA到SL编码器的最后部分,即投影仪,而不是整个深网络的输入部分,其作用不大,而后一部分则影响更大得多。通过监测整个编码器总损失程度相对于每个区块产出的加速或更可取的计算,从而避免了EMA的双重的进度。我们提议的工程的效益。