This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful information about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the structural information from the auxiliary information, we present to construct data clusters according to the auxiliary information. Then, we introduce the Clustering InfoNCE (Cl-InfoNCE) objective that learns similar representations for augmented variants of data from the same cluster and dissimilar representations for data from different clusters. Our approach contributes as follows: 1) Comparing to conventional self-supervised representations, the auxiliary-information-infused self-supervised representations bring the performance closer to the supervised representations; 2) The presented Cl-InfoNCE can also work with unsupervised constructed clusters (e.g., k-means clusters) and outperform strong clustering-based self-supervised learning approaches, such as the Prototypical Contrastive Learning (PCL) method; 3) We show that Cl-InfoNCE may be a better approach to leverage the data clustering information, by comparing it to the baseline approach - learning to predict the clustering assignments with cross-entropy loss. For analysis, we connect the goodness of the learned representations with the statistical relationships: i) the mutual information between the labels and the clusters and ii) the conditional entropy of the clusters given the labels.
翻译:本文介绍将辅助信息(例如,Instagram图像标签等数据的额外属性)纳入自我监督的学习过程中。 我们首先观察到辅助信息可能给我们带来关于数据结构的有用信息: 例如, 带有相同标签的Instagram图像可以具有语义相似性。 因此, 为了利用辅助信息的结构信息, 我们提出根据辅助信息构建数据集群。 然后, 我们引入了“ InfoNCE( Cl- InfoNCE) ” 目标, 以学习来自同一组群和不同组群数据的不同表达方式的数据变异的类似表示方式。 我们的方法如下:(1) 将数据结构与常规的自我监督表达方式相匹配, 辅助信息引入自我监督的表达方式可以使从辅助信息结构信息中获取更接近于辅助信息。 介绍的 Cl- InfoNCNCE还可以与不超超超超的构建的组合组合(例如, k- means grouplection) 的自我监督学习方法,例如 Protogroupal contracal 数据分组, 与Slovealationalal IM 之间的对比分析(PCL) 方法, 通过Silation- sal- salalation- ligrouplegleglementalationalationalationalationalational) 和Sildationalationalational- wegal- 进行我们学习 数据分组, 进行更好的数据分组, 学习, 的排序- salbalbalbalationalking- 。