The Dirichlet Process Mixture Model (DPMM) is a Bayesian non-parametric approach widely used for density estimation and clustering. In this manuscript, we study the choice of prior for the variance or precision matrix when Gaussian kernels are adopted. Typically, in the relevant literature, the assessment of mixture models is done by considering observations in a space of only a handful of dimensions. Instead, we are concerned with more realistic problems of higher dimensionality, in a space of up to 20 dimensions. We observe that the choice of prior is increasingly important as the dimensionality of the problem increases. After identifying certain undesirable properties of standard priors in problems of higher dimensionality, we review and implement possible alternative priors. The most promising priors are identified, as well as other factors that affect the convergence of MCMC samplers. Our results show that the choice of prior is critical for deriving reliable posterior inferences. This manuscript offers a thorough overview and comparative investigation into possible priors, with detailed guidelines for their implementation. Although our work focuses on the use of the DPMM in clustering, it is also applicable to density estimation.
翻译:dirichlet进程混合模型(DPMM)是一种广泛用于密度估计和组群的巴伊西亚非参数性非参数性方法。在本手稿中,我们研究了在采用高山内核时,对差异或精确矩阵的先选选择。通常,在相关文献中,混合模型的评估是通过在少数维度范围内考虑观测而完成的。相反,我们所关注的是在多达20个维度的空间内,更高维度的更现实问题。我们注意到,随着问题维度的增加,选择先选越来越重要。在查明标准前选在较高维度问题中的某些不可取性之后,我们审查并采用可能的替代前选。最有希望的前选,以及影响MCMC采样者汇合的其他因素。我们的结果显示,选择前选对于得出可靠的后方推断至关重要。这一手稿提供了全面的概览和比较性调查,并附有详细的执行准则。尽管我们的工作重点是使用DPMM在集群中的某些不可取性,但它也适用于密度估计。