Clustering algorithms have significantly improved along with Deep Neural Networks which provide effective representation of data. Existing methods are built upon deep autoencoder and self-training process that leverages the distribution of cluster assignments of samples. However, as the fundamental objective of the autoencoder is focused on efficient data reconstruction, the learnt space may be sub-optimal for clustering. Moreover, it requires highly effective codes (i.e., representation) of data, otherwise the initial cluster centers often cause stability issues during self-training. Many state-of-the-art clustering algorithms use convolution operation to extract efficient codes but their applications are limited to image data. In this regard, we propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC), for the general datasets, which takes advantage of distributions of local relationships of samples near the boundary of clusters, so that they can be properly separated and pulled to cluster centers to form compact clusters. Experimental results on various datasets illustrate that our proposed approach achieves better clustering performance over most of the state-of-the-art clustering methods, and the data embeddings learned by VCC without convolution for image data are even comparable with specialized convolutional methods.
翻译:与提供有效数据代表性的深神经网络一起,集群算法有了显著改进,与提供数据有效代表性的深神经网络一道,集群算法也大有改进;现有方法建立在利用抽样分配集成的深自动编码和自培训程序的基础上;然而,由于自动编码器的基本目标侧重于高效率的数据重建,因此所学的空间可能不理想,而且数据组群群群群群群群群群群群群群群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集群集集集群集群集群集群集群集群集群集群集群集群集群集群集群集集集集集集集群集群集群集群集群集群集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集集