Self-supervised learning aims to learn a embedding space where semantically similar samples are close. Contrastive learning methods pull views of samples together and push different samples away, which utilizes semantic invariance of augmentation but ignores the relationship between samples. To better exploit the power of augmentation, we observe that semantically similar samples are more likely to have similar augmented views. Therefore, we can take the augmented views as a special description of a sample. In this paper, we model such a description as the augmentation distribution and we call it augmentation feature. The similarity in augmentation feature reflects how much the views of two samples overlap and is related to their semantical similarity. Without computational burdens to explicitly estimate values of the augmentation feature, we propose Augmentation Component Analysis (ACA) with a contrastive-like loss to learn principal components and an on-the-fly projection loss to embed data. ACA equals an efficient dimension reduction by PCA and extracts low-dimensional embeddings, theoretically preserving the similarity of augmentation distribution between samples. Empirical results show our method can achieve competitive results against various traditional contrastive learning methods on different benchmarks.
翻译:自我监督的学习旨在学习一个嵌入空间, 静脉相似的样本在其中非常接近。 对比式学习方法将样本的视图集中起来, 将不同样本推开, 使用增强性的语义变化, 但却忽略了样本之间的关系。 为了更好地利用增强性的力量, 我们观察到, 语义相似的样本更有可能具有类似的增强性观点。 因此, 我们可以将增强式观点作为样本的特殊描述。 在本文中, 我们模拟了增强性分布这样的描述, 我们称之为增强性特征。 增强性特征的相似性反映了两个样本重叠性的观点, 并且与它们的语义相似性相关。 没有计算负担来明确估算增强性特征的价值, 我们建议增加性成分分析( ACA), 并且有对比性损失来学习主构件, 以及实时预测损失来嵌入数据。 ACA 相当于五氯苯有效减少的维度, 并提取低维度嵌入, 理论上保存增强性分布的相似性。 缩性结果显示我们的方法可以与不同基准上的各种传统对比性学习方法取得竞争性的结果 。