Mixed membership models are an extension of finite mixture models, where each observation can belong partially to more than one mixture component. We introduce a probabilistic framework for mixed membership models of high-dimensional continuous data with a focus on scalability and interpretability. We derive a novel probabilistic representation of mixed membership based on direct convex combinations of dependent multivariate Gaussian random vectors. In this setting, scalability is ensured through approximations of a tensor covariance structure through multivariate eigen-approximations with adaptive regularization imposed through shrinkage priors. Conditional posterior consistency is established on an unconstrained model, allowing us to facilitate a simple posterior sampling scheme while keeping many of the desired theoretical properties of our model. Our work is motivated by two biomedical case studies: a case study on functional brain imaging of children with autism spectrum disorder (ASD) and a case study on gene expression data from breast cancer tissue. Through these applications, we highlight how the typical assumption made in cluster analysis, that each observation comes from one homogeneous subgroup, may often be restrictive in BioX applications, leading to unnatural interpretations of data features.
翻译:混合成员制模型是有限的混合模型的延伸,其中每种观测都可以部分属于一个以上的混合物组成部分。我们为高维连续数据的混合成员制模型引入了一个概率框架,重点是可缩放性和可解释性。我们根据依赖性多变性高斯随机矢量的直接共振组合,得出了一种新型混合成员制概率代表。在这一背景下,通过多种变异性顺差结构的近似值,通过通过多种变异性顺差,使适应性正常化适应性适应性适应性适应性适应性,确保了可缩放性。我们通过一种不受限制的模式,确立了共振性前科的一致性。在一种无限制的模型上设定了远端后端后端数据的一致性,使我们能够促进简单的远端数据取样计划,同时保持我们模型的许多理想理论属性。我们的工作受到两个生物医学案例研究的推动:关于自闭症谱系儿童功能脑成像(ASD)的案例研究和关于乳腺癌组织基因表达数据的案例研究。通过这些应用,我们强调在群集分析中作出的典型假设,即每次观测都来自一个同质分组的分组分组,在生物X应用中往往具有限制性,导致对数据特性的特性的特性。