We approach self-supervised learning of image representations from a statistical dependence perspective, proposing Self-Supervised Learning with the Hilbert-Schmidt Independence Criterion (SSL-HSIC). SSL-HSIC maximizes dependence between representations of transformed versions of an image and the image identity, while minimizing the kernelized variance of those features. This self-supervised learning framework yields a new understanding of InfoNCE, a variational lower bound on the mutual information (MI) between different transformations. While the MI itself is known to have pathologies which can result in meaningless representations being learned, its bound is much better behaved: we show that it implicitly approximates SSL-HSIC (with a slightly different regularizer). Our approach also gives us insight into BYOL, since SSL-HSIC similarly learns local neighborhoods of samples. SSL-HSIC allows us to directly optimize statistical dependence in time linear in the batch size, without restrictive data assumptions or indirect mutual information estimators. Trained with or without a target network, SSL-HSIC matches the current state-of-the-art for standard linear evaluation on ImageNet, semi-supervised learning and transfer to other classification and vision tasks such as semantic segmentation, depth estimation and object recognition.
翻译:我们从统计依赖的角度出发,从自我监督的角度来学习图像表达方式,从统计依赖的角度提出与Hilbert-Schmidt独立标准(SSL-HSIC)进行自我监督的学习。SSL-HSIC将图像转换版本和图像身份的表达方式之间的依赖性最大化,同时将这些特征的内分化差异最小化。这一自监督的学习框架产生了对InfoNCE的新理解,即不同变换之间对相互信息(MI)的调低约束。虽然MI本身已知有可能导致毫无意义的表达方式的病理,但其约束性要好得多:我们显示它暗含地接近SSL-HSIC(与稍稍不同的常规化器 ) 。 我们的方法还让我们深入了解BYOL,因为SSL-HSICSIC也同样了解当地样本的相邻。 SSL-HSIC允许我们直接优化分批量规模的时间线统计依赖性,而没有限制性的数据假设或间接的相互信息估计者。在目标网络上训练或没有目标网络,SSL-HSICICSICY匹配当前状态的状态和深度任务,作为对标准的图像分类的半目的部分的分类和分级评估。