We present TWIST, a novel self-supervised representation learning method by classifying large-scale unlabeled datasets in an end-to-end way. We employ a siamese network terminated by a softmax operation to produce twin class distributions of two augmented images. Without supervision, we enforce the class distributions of different augmentations to be consistent. In the meantime, we regularize the class distributions to make them sharp and diverse. Specifically, we minimize the entropy of the distribution for each sample to make the class prediction for each sample assertive and maximize the entropy of the mean distribution to make the predictions of different samples diverse. In this way, TWIST can naturally avoid the trivial solutions without specific designs such as asymmetric network, stop-gradient operation, or momentum encoder. Different from the clustering-based methods which alternate between clustering and learning, our method is a single learning process guided by a unified loss function. As a result, TWIST outperforms state-of-the-art methods on a wide range of tasks, including unsupervised classification, linear classification, semi-supervised learning, transfer learning, and some dense prediction tasks such as detection and segmentation.
翻译:我们提出了一种新型的自我监督的代表学习方法TWIST, 这是一种新型的自我监督学习方法, 以端到端的方式对大型无标签的数据集进行分类。 我们使用一个被软式操作终止的筛选网络, 以产生两个增强的图像的双级分布。 没有监督, 我们强制执行不同增强的类分布, 以保持一致。 与此同时, 我们规范了各个类分布, 使其变得清晰多样 。 具体地说, 我们尽可能减少每个样本的分布的变异性, 以便对每个样本进行分类预测, 并尽量扩大平均分布的变异性, 使不同样本的预测多样化 。 这样, TWIST可以自然地避免一些微不足道的解决方案, 而不使用非对称网络、 停顿式操作或动力编码等特定设计 。 不同于基于集群和学习之间的交替方法, 我们的方法是一个单一的学习过程, 以统一的丢失功能为指导 。 结果, TWISTIST 超越了对一系列任务, 包括非监控分类、 线级分类、 半封闭式分段的学习、 学习和密段预测等等一系列任务的最新方法 。