Contrastive loss has significantly improved performance in supervised classification tasks by using a multi-viewed framework that leverages augmentation and label information. The augmentation enables contrast with another view of a single image but enlarges training time and memory usage. To exploit the strength of multi-views while avoiding the high computation cost, we introduce a multi-exit architecture that outputs multiple features of a single image in a single-viewed framework. To this end, we propose Self-Contrastive (SelfCon) learning, which self-contrasts within multiple outputs from the different levels of a single network. The multi-exit architecture efficiently replaces multi-augmented images and leverages various information from different layers of a network. We demonstrate that SelfCon learning improves the classification performance of the encoder network, and empirically analyze its advantages in terms of the single-view and the sub-network. Furthermore, we provide theoretical evidence of the performance increase based on the mutual information bound. For ImageNet classification on ResNet-50, SelfCon improves accuracy by +0.6% with 59% memory and 48% time of Supervised Contrastive learning, and a simple ensemble of multi-exit outputs boosts performance up to +1.5%. Our code is available at https://github.com/raymin0223/self-contrastive-learning.
翻译:对比损失通过使用多视角框架,利用扩增和标签信息,大大提高了监督分类任务的业绩。 扩增使与单一图像的另一种视图形成对比,但扩大了培训时间和记忆使用。 为了利用多视图的强度,同时避免高计算成本,我们引入了一个多输出结构,在单一视图框架内输出单一图像的多重特征。 为此,我们提议使用一个多视图框架,在单一网络的不同级别多输出的多个输出中进行自我连接。 多输出结构有效地取代了多放大图像并利用了网络不同层次的各种信息。 我们证明, 自我学习提高了编码网络的分类性能, 并且从经验上分析了其在单一视图和子网络中的优势。 此外, 我们提供基于共同信息约束的性能提高的理论证据。 关于ResNet- 50的图像网络分类, 自定义提高精度, 以+0.6% 有效替换多放大图像, 并有效地利用网络不同层次的多放大图像, 利用各种信息。 我们证明, 自我学习, 自定义网络的高级对比性能读取MLA+多功能。