Contrastive deep clustering has recently gained significant attention with its ability of joint contrastive learning and clustering via deep neural networks. Despite the rapid progress, previous works mostly require both positive and negative sample pairs for contrastive clustering, which rely on a relative large batch-size. Moreover, they typically adopt a two-stream architecture with two augmented views, which overlook the possibility and potential benefits of multi-stream architectures (especially with heterogeneous or hybrid networks). In light of this, this paper presents a new end-to-end deep clustering approach termed Heterogeneous Tri-stream Clustering Network (HTCN). The tri-stream architecture in HTCN consists of three main components, including two weight-sharing online networks and a target network, where the parameters of the target network are the exponential moving average of that of the online networks. Notably, the two online networks are trained by simultaneously (i) predicting the instance representations of the target network and (ii) enforcing the consistency between the cluster representations of the target network and that of the two online networks. Experimental results on four challenging image datasets demonstrate the superiority of HTCN over the state-of-the-art deep clustering approaches. The code is available at https://github.com/dengxiaozhi/HTCN.
翻译:尽管取得了迅速的进展,但以往的工程大多需要正对和负抽样配对,才能进行对比性组合,这取决于相对较大的批量规模;此外,它们通常采用双流结构,有两种扩大的观点,忽视了多流结构的可能性和潜在好处(特别是同多种或混合网络),鉴于这一点,本文件提出了一个新的端至端深层集群办法,称为异质三流集群网络(HTCN)。HTCN的三流结构由三个主要部分组成,包括两个权重共享在线网络和一个目标网络,目标网络的参数是在线网络的指数移动平均数。值得注意的是,对两个在线网络的培训同时进行:(一) 预测目标网络的实例介绍,(二) 执行目标网络的集群表述与两个在线网络的组合的一致性。四个具有挑战性的图像数据集的实验结果显示HTCN优于state-ob-Hogs/hest Group.