Self-supervised representation learning maps high-dimensional data into a meaningful embedding space, where samples of similar semantic contents are close to each other. Most of the recent representation learning methods maximize cosine similarity or minimize the distance between the embedding features of different views from the same sample usually on the $l2$ normalized unit hypersphere. To prevent the trivial solutions that all samples have the same embedding feature, various techniques have been developed, such as contrastive learning, stop gradient, variance and covariance regularization, etc. In this study, we propose MUlti-Segmental Informational Coding (MUSIC) for self-supervised representation learning. MUSIC divides the embedding feature into multiple segments that discriminatively partition samples into different semantic clusters and different segments focus on different partition principles. Information theory measurements are directly used to optimize MUSIC and theoretically guarantee trivial solutions are avoided. MUSIC does not depend on commonly used techniques, such as memory bank or large batches, asymmetry networks, gradient stopping, momentum weight updating, etc, making the training framework flexible. Our experiments demonstrate that MUSIC achieves better results than most related Barlow Twins and VICReg methods on ImageNet classification with linear probing, and requires neither deep projectors nor large feature dimensions. Code will be made available.
翻译:自我监督的代表学习将高维数据映射成一个有意义的嵌入空间,在这个空间中,相似的语义内容样本相互接近。最近的大多数代表学习方法都尽量扩大共略相似性,或尽量减少同一样本中不同观点的嵌入特征之间的距离,通常以美元为标准单位超视镜为主。为了防止所有样本具有相同的嵌入特征的微小解决方案,已经开发了各种技术,例如对比学习、停止梯度、差异和共差规范等。在本研究中,我们建议MULTI-Semental Informational Coding(MUSIC)进行自我监督的代表学习。MUSIC将嵌入的特征分为多个部分,这些特征以区别地将样本分成不同的语义组和不同分区原则为焦点。信息理论测量直接用于优化MUSIC和理论上保证微不足道的解决办法。MISIC并不依赖于通常使用的技术,如记忆库或大批量体、不对称网络、梯度停止、动力重量更新等,因此培训框架是灵活的。我们进行的实验显示,与SUMISISSIC和线性图像化模型的大小并不比相关的多数标准。