One of the most promising approaches for unsupervised learning is combining deep representation learning and deep clustering. Some recent works propose to simultaneously learn representation using deep neural networks and perform clustering by defining a clustering loss on top of embedded features. However, these approaches are sensitive to imbalanced data and out-of-distribution samples. As a consequence, these methods optimize clustering by pushing data close to randomly initialized cluster centers. This is problematic when the number of instances varies largely in different classes or a cluster with few samples has less chance to be assigned a good centroid. To overcome these limitations, we introduce a new unsupervised framework for joint debiased representation learning and image clustering. We simultaneously train two deep learning models, a deep representation network that captures the data distribution, and a deep clustering network that learns embedded features and performs clustering. Specifically, the clustering network and learning representation network both take advantage of our proposed statistics pooling block that represents mean, variance, and cardinality to handle the out-of-distribution samples and class imbalance. Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to the out-of-distribution dataset.
翻译:最有希望的未经监督的学习方法之一是将深层代表性学习和深层集群结合起来。最近的一些工作提议,利用深层神经网络同时学习代表性,同时通过在嵌入特征上界定集群损失来进行集群。然而,这些方法对数据不平衡和分配外抽样样本的抽样样本十分敏感。因此,这些方法通过将数据推近随机初始化的集群中心来优化集群。当不同类别或少样样本的集群中的情况大不相同时,就很难被分配出一个好的中间体。为了克服这些限制,我们引入了一个新的未经监督的新框架,用于联合的分化代表性学习和图像分组。我们同时培训了两个深层学习模式,一个深度代表性网络,一个深度代表性网络,捕捉数据分布的深度代表性网络,以及一个深层集群网络,学习嵌入特征并进行集群。具体地说,集群网络和学习代表着我们拟议的统计集合块,代表着平均值、差异和重要性,处理分配之外的样本和分类不平衡现象。我们的实验表明,利用这些表述,可以大大改善在各种图像数据集中错乱乱的图像组合的结果。此外,学习了一般的表述,在向外移出数据时会式展示。