Despite the success of a number of recent techniques for visual self-supervised deep learning, there remains limited investigation into the representations that are ultimately learned. By using recent advances in comparing neural representations, we explore in this direction by comparing a constrastive self-supervised algorithm (SimCLR) to supervision for simple image data in a common architecture. We find that the methods learn similar intermediate representations through dissimilar means, and that the representations diverge rapidly in the final few layers. We investigate this divergence, finding that it is caused by these layers strongly fitting to the distinct learning objectives. We also find that SimCLR's objective implicitly fits the supervised objective in intermediate layers, but that the reverse is not true. Our work particularly highlights the importance of the learned intermediate representations, and raises important questions for auxiliary task design.
翻译:尽管最近一些视觉自我监督的深层学习技术取得了成功,但对于最终所学到的表达方式的调查仍然有限。我们利用最近比较神经表现方式的进展,通过比较一种严格的自我监督算法(SIMCLR)和对共同结构中简单图像数据的监督方法(SIMCLR),探索了这一方向。我们发现,这些方法通过不同的方式学到了相似的中间表达方式,最后几个层次的表达方式也迅速不同。我们调查了这种差异,发现这些差异是由这些层次造成的,非常适合不同的学习目标。我们也发现,SICLR的目标暗含着中间层次的监督目标,但反之则不真实。我们的工作特别强调了学习过的中间表达方式的重要性,并为辅助任务的设计提出了重要问题。