Self-supervised learning (SSL) is currently one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations. Despite their success, the underlying geometry of these representations remains elusive, which obfuscates the quest for more robust, trustworthy, and interpretable models. In particular, mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector. When used for transfer learning, the projector is discarded since empirical results show that its representation generalizes more poorly than the encoder's. In this paper, we investigate this curious phenomenon and analyze how the strength of the data augmentation policies affects the data embedding. We discover a non-trivial relation between the encoder, the projector, and the data augmentation strength: with increasingly larger augmentation policies, the projector, rather than the encoder, is more strongly driven to become invariant to the augmentations. It does so by eliminating crucial information about the data by learning to project it into a low-dimensional space, a noisy estimate of the data manifold tangent plane in the encoder representation. This analysis is substantiated through a geometrical perspective with theoretical and empirical results.
翻译:自我监督的学习(SSL)是目前创建数据表示器的首要技术之一,在缺少人文说明的情况下可以用于转移学习。尽管它们取得了成功,但这些表示器的基本几何特征仍然难以找到,这模糊了对更坚固、可信赖和可解释模型的追求。特别是,主流的SSL技术依赖于一个特殊的深神经网络结构,有两个连锁神经网络:编码器和投影器。在用于传输学习时,投影器被丢弃,因为实验结果显示其代表器比编码器一般化得更差。在本文中,我们调查了这一奇怪的现象,并分析了数据增强政策的力量如何影响数据嵌入。我们发现了编码器、投影仪和数据增强强度之间的非三角关系:随着增强政策的日益扩大,投影器而不是投影器更强烈地被驱动成为增音的变异体。通过将数据投射到一个低维度空间来消除关键信息。我们通过学习将其投射到低维空间来消除了它。在对数据进行精确的理论分析过程中,对数据进行了精确的理论分析,通过对数据进行精确的模拟分析,从而证实了了对结果的分析。