Bayesian nonparametric mixture models are common for modeling complex data. While these models are well-suited for density estimation, their application for clustering has some limitations. Miller and Harrison (2014) proved posterior inconsistency in the number of clusters when the true number of clusters is finite for Dirichlet process and Pitman--Yor process mixture models. In this work, we extend this result to additional Bayesian nonparametric priors such as Gibbs-type processes and finite-dimensional representations of them. The latter include the Dirichlet multinomial process and the recently proposed Pitman--Yor and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a post-processing algorithm introduced by Guha et al. (2021) for the Dirichlet process extends to more general models and provides a consistent method to estimate the number of components.
翻译:米勒和哈里森(Harrison(2014))证明,当Drichlet工艺和Pitman-Yor工艺混合模型的真正组数有限时,这些组群数量在组群数量上会前后不一。在这项工作中,我们将这一结果扩大到更多的巴伊西亚非参数前科,如Gibbs类型工艺和它们的有限维度表现,后者包括Drichlet多面工艺以及最近提议的Pitman-Yor和普通化通用伽马多面工艺。我们表明,基于这些工艺的混合模型在组群数量上也不一致,并讨论可能的解决办法。值得注意的是,我们表明,Guha等人(2021年)为Drichlet工艺推出的加工后算法将扩展至更通用模型,并提供一致的方法来估计部件的数量。