Deep clustering outperforms conventional clustering by mutually promoting representation learning and cluster assignment. However, most existing deep clustering methods suffer from two major drawbacks. First, most cluster assignment methods are based on simple distance comparison and highly dependent on the target distribution generated by a handcrafted nonlinear mapping. These facts largely limit the possible performance that deep clustering methods can reach. Second, the clustering results can be easily guided towards wrong direction by the misassigned samples in each cluster. The existing deep clustering methods are incapable of discriminating such samples. To address these issues, a novel modular Self-Evolutionary Clustering (Self-EvoC) framework is constructed, which boosts the clustering performance by classification in a self-supervised manner. Fuzzy theory is used to score the sample membership with probability which evaluates the intermediate clustering result certainty of each sample. Based on which, the most reliable samples can be selected and augmented. The augmented data are employed to fine-tune an off-the-shelf deep network classifier with the labels from the clustering, which results in a model to generate the target distribution. The proposed framework can efficiently discriminate sample outliers and generate better target distribution with the assistance of self-supervised classifier. Extensive experiments indicate that the Self-EvoC remarkably outperforms state-of-the-art deep clustering methods on three benchmark datasets.
翻译:深度集聚通过相互促进代表性学习和群集分配,在常规集聚上表现了深度集聚。然而,大多数现有深度集聚方法存在两大缺陷。首先,大多数集聚分配方法基于简单的距离比较,高度依赖手工制作的非线性绘图所产生的目标分布。这些事实在很大程度上限制了深度集聚方法可能达到的性能。第二,分组结果很容易被每个组群中分配不当的样本错误地引导向错误的方向发展。现有的深层集方法无法对此类样本进行区分。为了解决这些问题,建立了一个新型的模块化自我-革命集聚(自我-EvoC)框架,通过自我监督的分类提高集组的性能。使用模糊理论对样本成员进行评分,其概率可评估每个样本的中间组群结果的确定性。在此基础上,最可靠的样本可以被挑选和扩充。扩大的数据被用来微调一个离线的深层网络分类器,其标签将产生目标分布模型。拟议的框架可以有效地区分抽样抽样出自我监督的分类,并用更精确的自我分析的基数分析方法来改进自我分配。