Variation Autoencoder (VAE) has become a powerful tool in modeling the non-linear generative process of data from a low-dimensional latent space. Recently, several studies have proposed to use VAE for unsupervised clustering by using mixture models to capture the multi-modal structure of latent representations. This strategy, however, is ineffective when there are outlier data samples whose latent representations are meaningless, yet contaminating the estimation of key major clusters in the latent space. This exact problem arises in the context of resting-state fMRI (rs-fMRI) analysis, where clustering major functional connectivity patterns is often hindered by heavy noise of rs-fMRI and many minor clusters (rare connectivity patterns) of no interest to analysis. In this paper we propose a novel generative process, in which we use a Gaussian-mixture to model a few major clusters in the data, and use a non-informative uniform distribution to capture the remaining data. We embed this truncated Gaussian-Mixture model in a Variational AutoEncoder framework to obtain a general joint clustering and outlier detection approach, called tGM-VAE. We demonstrated the applicability of tGM-VAE on the MNIST dataset and further validated it in the context of rs-fMRI connectivity analysis.
翻译:动态自动编码器(VAE)已成为一个强大的工具,用于模拟低维潜层空间数据的非线性基因化过程。最近,一些研究提议使用混合模型来捕捉潜在显示的多模式结构,将 VAE 用于不受监督的集群。然而,如果有外部数据样本,其潜在代表面毫无意义,但污染了对潜层空间主要主要组群的估计,这一战略就无效。在休息状态FMRI(rs-fMRI)分析中,这一确切的问题就产生了。 在那里,主要功能连接模式的组合往往受到rs-fMRI和许多对分析没有兴趣的小型组群(rare连通模式)的强烈噪音的阻碍。在本文中,我们提出一个新的基因化过程,即我们使用高频混合模型模型模拟数据中的几个主要组群群,而使用非强化的统一分布来捕捉剩余数据。我们将这一tuncredated GASS-MIS-Mixture 模型嵌入了VA-InEcorder 框架,以获得通用的通用组合和图像检测方法。