Finite Gaussian mixture models provide a powerful and widely employed probabilistic approach for clustering multivariate continuous data. However, the practical usefulness of these models is jeopardized in high-dimensional spaces, where they tend to be over-parameterized. As a consequence, different solutions have been proposed, often relying on matrix decompositions or variable selection strategies. Recently, a methodological link between Gaussian graphical models and finite mixtures has been established, paving the way for penalized model-based clustering in the presence of large precision matrices. Notwithstanding, current methodologies implicitly assume similar levels of sparsity across the classes, not accounting for different degrees of association between the variables across groups. We overcome this limitation by deriving group-wise penalty factors, which automatically enforce under or over-connectivity in the estimated graphs. The approach is entirely data-driven and does not require additional hyper-parameter specification. Analyses on synthetic and real data showcase the validity of our proposal.
翻译:Finite Gaussian 混合物模型为多变量连续数据的组合提供了一种强大和广泛采用的概率性方法。然而,这些模型的实际效用在高维空间受到危害,这些模型往往被过度分化。因此,提出了不同的解决办法,往往依靠矩阵分解或可变选择战略。最近,在高斯图形模型和有限混合物之间建立了方法上的联系,为在大型精确矩阵存在的情况下受处罚的基于模型的组合铺平了道路。尽管如此,目前的方法隐含着各类别之间类似程度的宽度,而没有考虑到不同组别变量之间的不同程度关联。我们克服了这一限制,通过得出群体惩罚因素,在估计的图表中自动地执行低于或过度连接的处罚因素。这种方法完全是数据驱动的,不需要额外的超参数规格。对合成和真实数据的分析显示了我们提案的有效性。