Deep clustering (DC) leverages the representation power of deep architectures to learn embedding spaces that are optimal for cluster analysis. This approach filters out low-level information irrelevant for clustering and has proven remarkably successful for high dimensional data spaces. Some DC methods employ Generative Adversarial Networks (GANs), motivated by the powerful latent representations these models are able to learn implicitly. In this work, we propose HC-MGAN, a new technique based on GANs with multiple generators (MGANs), which have not been explored for clustering. Our method is inspired by the observation that each generator of a MGAN tends to generate data that correlates with a sub-region of the real data distribution. We use this clustered generation to train a classifier for inferring from which generator a given image came from, thus providing a semantically meaningful clustering for the real distribution. Additionally, we design our method so that it is performed in a top-down hierarchical clustering tree, thus proposing the first hierarchical DC method, to the best of our knowledge. We conduct several experiments to evaluate the proposed method against recent DC methods, obtaining competitive results. Last, we perform an exploratory analysis of the hierarchical clustering tree that highlights how accurately it organizes the data in a hierarchy of semantically coherent patterns.
翻译:深层集群(DC) 利用深层结构的显示力, 学习最适合集束分析的嵌入空间。 这种方法过滤了与集群无关的低层次信息,并证明在高维数据空间方面非常成功。 某些DC方法在强大的潜在代表的推动下,采用General Adversarial 网络(GANs),这些模型可以隐含地学习。 在这项工作中,我们建议HC-MGAN, 这是一种基于GANs的、多层发电机(MGANs)的新技术,尚未被探索用于集群。 我们的方法受到以下观察的启发:MGAN的每个生成者往往产生与真实数据分布的分区相关联的数据。 我们利用这一集成的一代来训练一个分类者,从中推断出某个特定图像的生成者,从而为真实分布提供具有语义意义的集群。 此外, 我们设计了一种方法, 以便用一个自上到下层的层次的组群落组树(MGANs), 从而提出第一个等级的DC 方法, 以我们的知识的最佳方式。 我们进行了几次实验, 来评价拟议的方法, 来评估最近的DC 方法, 获取竞争结果。