Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks. However, doing so naively leads to ill posed learning problems with degenerate solutions. In this paper, we propose a novel and principled learning formulation that addresses these issues. The method is obtained by maximizing the information between labels and input data indices. We show that this criterion extends standard cross-entropy minimization to an optimal transport problem, which we solve efficiently for millions of input images and thousands of labels using a fast variant of the Sinkhorn-Knopp algorithm. The resulting method is able to self-label visual data so as to train highly competitive image representations without manual labels. Compared to the best previous method in this class, namely DeepCluster, our formulation minimizes a single objective function for both representation learning and clustering; it also significantly outperforms DeepCluster in standard benchmarks and reaches state of the art for learning a ResNet-50 self-supervisedly.
翻译:组合和代表制学习是不受监督地学习深层神经网络的最有希望的方法之一。 但是,这样做是天真地导致错误的学习问题,而其解决办法则不尽人意。 在本文中,我们提出一个处理这些问题的新颖和有原则的学习方法。这个方法是通过在标签和输入数据指数之间尽量扩大信息而获得的。我们表明,这一标准将标准的跨物种最小化扩大到一个最佳运输问题,我们利用Sinkhorn-Knopp算法的快速变式,有效地解决了数百万个输入图像和数千个标签的问题。 由此得出的方法能够使用自我标签的视觉数据来训练高度竞争性的图像表达方式而不用手工标签。 与本类中的最佳方法相比, 即深晶, 我们的配方将代表学习和组合的单一客观功能最小化; 在标准基准中,它也大大超出深晶体在标准基准中的位置,并达到艺术的状态,用于学习ResNet-50的自我超导。