Existing deep clustering methods rely on either contrastive or non-contrastive representation learning for downstream clustering task. Contrastive-based methods thanks to negative pairs learn uniform representations for clustering, in which negative pairs, however, may inevitably lead to the class collision issue and consequently compromise the clustering performance. Non-contrastive-based methods, on the other hand, avoid class collision issue, but the resulting non-uniform representations may cause the collapse of clustering. To enjoy the strengths of both worlds, this paper presents a novel end-to-end deep clustering method with prototype scattering and positive sampling, termed ProPos. Specifically, we first maximize the distance between prototypical representations, named prototype scattering loss, which improves the uniformity of representations. Second, we align one augmented view of instance with the sampled neighbors of another view -- assumed to be truly positive pair in the embedding space -- to improve the within-cluster compactness, termed positive sampling alignment. The strengths of ProPos are avoidable class collision issue, uniform representations, well-separated clusters, and within-cluster compactness. By optimizing ProPos in an end-to-end expectation-maximization framework, extensive experimental results demonstrate that ProPos achieves competing performance on moderate-scale clustering benchmark datasets and establishes new state-of-the-art performance on large-scale datasets. Source code is available at \url{https://github.com/Hzzone/ProPos}.
翻译:现有的深层集群方法取决于下游集群任务的对比性或非调节性代表制学习。由于负对子的反比式方法学习了统一的集群代表制,但负对子可能会不可避免地导致阶级碰撞问题,从而损害集群性能。非反比式方法避免了阶级碰撞问题,但由此产生的非统一代表制可能会导致集群的崩溃。为了享受这两个世界的优势,本文件展示了一种新型端到端的深层集群方法,包括原型散射和正面抽样,称为ProPos。具体地说,我们首先尽可能扩大原型集群代表制(称为原型散射损耗)之间的距离,这样可以提高代表性的统一性。第二,我们将扩大的视角与另一种观点的抽样邻居(假定在嵌入空间中是真正的正对立式组合,但由此产生的非统一性代表制会导致集群的崩溃。ProPO的优势是可避免的类别碰撞问题、统一的演示、分解的集群和集群内集式的集群。通过在端端端端点优化ProPo-Po的原型分布式分布式分散性损失,这可以改善演示式的演示式演示标尺框架。