Clustering algorithms have significantly improved along with Deep Neural Networks which provide effective representation of data. Existing methods are built upon deep autoencoder and self-training process that leverages the distribution of cluster assignments of samples. However, as the fundamental objective of the autoencoder is focused on efficient data reconstruction, the learnt space may be sub-optimal for clustering. Moreover, it requires highly effective codes (i.e., representation) of data, otherwise the initial cluster centers often cause stability issues during self-training. Many state-of-the-art clustering algorithms use convolution operation to extract efficient codes but their applications are limited to image data. In this regard, we propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC). VCC takes advantage of distributions of local relationships of samples near the boundary of clusters, so that they can be properly separated and pulled to cluster centers to form compact clusters. Experimental results on various datasets illustrate that our proposed approach achieves competitive clustering performance against most of the state-of-the-art clustering methods for both image and non-image data, and its results can be easily qualitatively seen in the learnt low-dimensional space.
翻译:与提供有效数据代表性的深神经网络一起,集群算法有了显著改进,与提供数据有效代表性的深神经网络一起,现有方法建立在利用样本集成分布的深自动编码器和自培训过程的基础上;然而,由于自动编码器的基本目标侧重于高效率的数据重建,所学的空间可能并不理想;此外,它要求数据的高度有效代码(即代表性),否则最初的集群中心往往在自我培训期间造成稳定性问题;许多最先进的集群算法利用革命操作提取高效代码,但其应用仅限于图像数据;在这方面,我们建议采用端到端的深度集成算法,即甚紧凑集组(VCC),利用靠近集群边界的本地样本分布,以便适当分离并拉到集束中心形成集束群;许多最先进的集群算法在各种数据集的实验结果中表明,我们的拟议方法在利用大多数状态的集群方法获取竞争性的集群性业绩,而大多数这类集成方法用于图像和非低维维数据,其结果可以很容易地在质量上看到。