Conditional correlation networks, within Gaussian Graphical Models (GGM), are widely used to describe the direct interactions between the components of a random vector. In the case of an unlabelled Heterogeneous population, Expectation Maximisation (EM) algorithms for Mixtures of GGM have been proposed to estimate both each sub-population's graph and the class labels. However, we argue that, with most real data, class affiliation cannot be described with a Mixture of Gaussian, which mostly groups data points according to their geometrical proximity. In particular, there often exists external co-features whose values affect the features' average value, scattering across the feature space data points belonging to the same sub-population. Additionally, if the co-features' effect on the features is Heterogeneous, then the estimation of this effect cannot be separated from the sub-population identification. In this article, we propose a Mixture of Conditional GGM (CGGM) that subtracts the heterogeneous effects of the co-features to regroup the data points into sub-population corresponding clusters. We develop a penalised EM algorithm to estimate graph-sparse model parameters. We demonstrate on synthetic and real data how this method fulfils its goal and succeeds in identifying the sub-populations where the Mixtures of GGM are disrupted by the effect of the co-features.
翻译:Gausian 图形模型( GGM) 内的定点关联网络被广泛用于描述随机矢量各组成部分之间的直接互动。 在无标签的异质人口群中,通常会存在外部共性,其值影响特性的平均值,分布在属于同一子群的地貌空间数据点之间。此外,如果GGGM的特性的共性效应是异质的,那么无法将这一效应的估算与子群落的识别区分开来。在本篇文章中,我们建议了“集级GGM(CGGM)”的混合值,以其几何相近性将数据点按其几何等相近性分组组合在一起。我们通过将数据模型的相异性作用进行对比,将数据点分布于属于同一子群落的地貌空间数据点。此外,如果对地貌特征的共性效应是异性,那么这种效应的估算无法与子群落特性的识别分开。我们建议了“CGGMGM(CGGM) 的混合性作用,从而将数据模型的相异性效应归到将数据模型的相异性效应转化为数据模型的相交,我们通过对地算的方法来将数据组化地算。