Despite the success of a number of recent techniques for visual self-supervised deep learning, there has been limited investigation into the representations that are ultimately learned. By leveraging recent advances in the comparison of neural representations, we explore in this direction by comparing a contrastive self-supervised algorithm to supervision for simple image data in a common architecture. We find that the methods learn similar intermediate representations through dissimilar means, and that the representations diverge rapidly in the final few layers. We investigate this divergence, finding that these layers strongly fit to their distinct learning objectives. We also find that the contrastive objective implicitly fits the supervised objective in intermediate layers, but that the reverse is not true. Our work particularly highlights the importance of the learned intermediate representations, and raises critical questions for auxiliary task design.
翻译:尽管最近一些视觉自我监督的深层学习技术取得了成功,但对最终学到的表达方式的调查有限。通过利用最近神经表现方式比较的进展,我们从这个方向探索,将对比式自我监督的算法与对共同架构中简单图像数据的监督进行比较。我们发现,这些方法通过不同的方式学到了类似的中间表达方式,最后几个层次的表达方式也迅速不同。我们调查了这一差异,发现这些层次与它们不同的学习目标非常吻合。我们还发现,对比性目标隐含地符合中间层的监督目标,但反之亦然。我们的工作特别强调了学习过的中间表达方式的重要性,并为辅助任务的设计提出了关键问题。