Work in deep clustering focuses on finding a single partition of data. However, high-dimensional data, such as images, typically feature multiple interesting characteristics one could cluster over. For example, images of objects against a background could be clustered over the shape of the object and separately by the colour of the background. In this paper, we introduce Multi-Facet Clustering Variational Autoencoders (MFCVAE), a novel class of variational autoencoders with a hierarchy of latent variables, each with a Mixture-of-Gaussians prior, that learns multiple clusterings simultaneously, and is trained fully unsupervised and end-to-end. MFCVAE uses a progressively-trained ladder architecture which leads to highly stable performance. We provide novel theoretical results for optimising the ELBO analytically with respect to the categorical variational posterior distribution, and corrects earlier influential theoretical work. On image benchmarks, we demonstrate that our approach separates out and clusters over different aspects of the data in a disentangled manner. We also show other advantages of our model: the compositionality of its latent space and that it provides controlled generation of samples.
翻译:深度分组工作的重点是寻找单一的数据分割。 但是,高维数据,例如图像,通常具有多种有趣的特征,可以将这些数据集中在一起。例如,背景对象的图像可以在对象的形状上分组,并按背景的颜色分开。在本文中,我们引入了多面组合组合组合式自动生成器(MFCVAE),这是一个新型的变异自动采集器,具有潜伏变量的等级,每个变异自动采集器都具有混集法,同时学习多种组合,并受过完全不受监督和最终到最终的培训。MFCVAE使用一种逐步训练的梯子结构,可导致高度稳定的性能。我们为优化ELBO的分析提供了新颖的理论结果,涉及绝对的变异子分布,并纠正了早先有影响力的理论工作。关于图像基准,我们展示了我们的方法以分解的方式将数据的不同方面分开和组合在一起。我们还展示了我们模型的其他优势:其潜层空间的构成性,以及它提供了受控制的生成的样本。