We introduce a novel prior distribution for modelling the weights in mixture models based on a generalisation of the Dirichlet distribution, the Selberg Dirichlet distribution. This distribution contains a repulsive term, which naturally penalises values that lie close to each other on the simplex, thus encouraging few dominating clusters. The repulsive behaviour induces additional sparsity on the number of components. We refer to this construction as sparsity-inducing partition (SIP) prior. By highlighting differences with the conventional Dirichlet distribution, we present relevant properties of the SIP prior and demonstrate their implications across a variety of mixture models, including finite mixtures with a fixed or random number of components, as well as repulsive mixtures. We propose an efficient posterior sampling algorithm and validate our model through an extensive simulation study as well as an application to a biomedical dataset describing children's Body Mass Index and eating behaviour.
翻译:本文提出了一种基于狄利克雷分布推广形式——塞尔伯格狄利克雷分布的新型先验分布,用于建模混合模型中的权重参数。该分布包含排斥项,能自然惩罚单纯形上相邻的取值,从而促进少数主导簇的形成。这种排斥特性对分量数量产生了额外的稀疏性诱导效果,我们将此构造称为稀疏诱导划分(SIP)先验。通过对比传统狄利克雷分布的差异,我们阐述了SIP先验的关键性质,并论证了其在各类混合模型中的应用价值,包括分量数量固定或随机的有限混合模型以及排斥性混合模型。我们提出了一种高效的后验采样算法,并通过大量模拟研究及儿童身体质量指数与饮食行为生物医学数据集的应用验证了模型的有效性。