Work in deep clustering focuses on finding a single partition of data. However, high-dimensional data, such as images, typically feature multiple interesting characteristics one could cluster over. For example, images of objects against a background could be clustered over the shape of the object and separately by the colour of the background. In this paper, we introduce Multi-Facet Clustering Variational Autoencoders (MFCVAE), a novel class of variational autoencoders with a hierarchy of latent variables, each with a Mixture-of-Gaussians prior, that learns multiple clusterings simultaneously, and is trained fully unsupervised and end-to-end. MFCVAE uses a progressively-trained ladder architecture which leads to highly stable performance. We provide novel theoretical results for optimising the ELBO analytically with respect to the categorical variational posterior distribution, correcting earlier influential theoretical work. On image benchmarks, we demonstrate that our approach separates out and clusters over different aspects of the data in a disentangled manner. We also show other advantages of our model: the compositionality of its latent space and that it provides controlled generation of samples.
翻译:深度分组工作的重点是寻找单一的数据分割。 但是,高维数据,例如图像,通常具有多种有趣的特征,可以将这些数据集中到一起。例如,背景对象的图像可以在对象形状上分组,并按背景颜色分开。在本文中,我们引入了多面组合组合组合式自动生成器(MFCVAE),这是一个新型的变异自动采集器,具有潜伏变量的等级,每个变异自动采集器都具有混集法,同时学习多种组合,并受过完全不受监督和最终端到端的培训。MFCVAE使用一种逐步训练的梯子结构,可导致高度稳定的性能。我们提供了新颖的理论结果,用以优化ELBO在绝对变异子分布方面的分析,纠正早期有影响力的理论工作。关于图像基准,我们证明我们的方法以分解的方式将数据的不同方面分离出来并组合在一起。我们还展示了我们模型的其他优点:其潜在空间的构成性以及它提供了受控制的新一代样本。