Recently, deep clustering methods have gained momentum because of the high representational power of deep neural networks (DNNs) such as autoencoder. The key idea is that representation learning and clustering can reinforce each other: Good representations lead to good clustering while good clustering provides good supervisory signals to representation learning. Critical questions include: 1) How to optimize representation learning and clustering? 2) Should the reconstruction loss of autoencoder be considered always? In this paper, we propose DEKM (for Deep Embedded K-Means) to answer these two questions. Since the embedding space generated by autoencoder may have no obvious cluster structures, we propose to further transform the embedding space to a new space that reveals the cluster-structure information. This is achieved by an orthonormal transformation matrix, which contains the eigenvectors of the within-class scatter matrix of K-means. The eigenvalues indicate the importance of the eigenvectors' contributions to the cluster-structure information in the new space. Our goal is to increase the cluster-structure information. To this end, we discard the decoder and propose a greedy method to optimize the representation. Representation learning and clustering are alternately optimized by DEKM. Experimental results on the real-world datasets demonstrate that DEKM achieves state-of-the-art performance.
翻译:最近,深层集成方法由于诸如自动编码器等深神经网络(DNNs)的高度代表性力量而获得了动力。关键的想法是,代表性学习和集群可以相互加强:良好的代表性可以导致良好的集群,而良好的集群则为代表性学习提供良好的监督信号。关键问题包括:(1) 如何优化代表性学习和集群?(2) 如何始终考虑自动编码器的重建损失?在本文件中,我们提议DEKM(深嵌入式K-Means)来回答这两个问题。由于自动编码器生成的嵌入空间可能没有明显的集群结构,我们提议进一步将嵌入空间转换为显示集群结构信息的新空间。为此,我们放弃机体结构的嵌入空间,并提议通过包含 K 对象内部分布式分布式分布式分布式分布式分布式分布式结构的组合组合式转换矩阵实现这一点。 egenvalentvaluations 表示, igenualgenvisits of the Developengenual insideal-IKEKM) 代表制新空间中,我们的目标是增加集群信息。为此,我们放弃拆解拆解了拆式的拆式的机模式,并提议了D-IK-D-D-D-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-