Self-supervised Learning (SSL) provides a strategy for constructing useful representations of images without relying on hand-assigned labels. Many such methods aim to map distinct views of the same scene or object to nearby points in the representation space, while employing some constraint to prevent representational collapse. Here we recast the problem in terms of efficient coding by adopting manifold capacity, a measure that quantifies the quality of a representation based on the number of linearly separable object manifolds it can support, as the efficiency metric to optimize. Specifically, we adapt the manifold capacity for use as an objective function in a contrastive learning framework, yielding a Maximum Manifold Capacity Representation (MMCR). We apply this method to unlabeled images, each augmented by a set of basic transformations, and find that it learns meaningful features using the standard linear evaluation protocol. Specifically, we find that MMCRs support performance on object recognition comparable to or surpassing that of recently developed SSL frameworks, while providing more robustness to adversarial attacks. Empirical analyses reveal differences between MMCRs and representations learned by other SSL frameworks, and suggest a mechanism by which manifold compression gives rise to class separability.
翻译:自我监督的学习(SSL) 提供了一种战略,用于构建有用的图像表达方式,而不必依赖手贴标签。许多这类方法旨在将同一场景或对象的不同观点映射到演示空间的近点,同时使用一些限制来防止演示性崩溃。在这里,我们通过采用多种能力,从高效编码的角度重新审视了问题,这一措施根据线性可分离的物体的数量量量量度了显示质量,作为优化的效率衡量标准。具体地说,我们调整了多种能力,作为对比性学习框架中的一项客观功能使用,产生最大负能力代表(MMCR),我们将这种方法应用于未贴标签的图像,每个图像都通过一套基本转换得到增强,并发现它利用标准的线性评估协议学习有意义的特征。具体地说,我们发现MMCR支持与最近开发的SSL框架相近似或超过的物体识别性,同时为对抗性攻击提供更强性。Epicalal分析揭示了MCR和由其他SSL框架所学的演示力之间的差异,并提出了一种机制,使压升至等级。</s>