Dirichlet process mixtures are flexible non-parametric models, particularly suited to density estimation and probabilistic clustering. In this work we study the posterior distribution induced by Dirichlet process mixtures as the sample size increases, and more specifically focus on consistency for the unknown number of clusters when the observed data are generated from a finite mixture. Crucially, we consider the situation where a prior is placed on the concentration parameter of the underlying Dirichlet process. Previous findings in the literature suggest that Dirichlet process mixtures are typically not consistent for the number of clusters if the concentration parameter is held fixed and data come from a finite mixture. Here we show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way, as commonly done in practice. Our results are derived for data coming from a class of finite mixtures, with mild assumptions on the prior for the concentration parameter and for a variety of choices of likelihood kernels for the mixture.
翻译:离子处理混合物是灵活的非参数模型,特别适合于密度估计和概率组合。在这项工作中,随着样本规模的增加,我们研究Drichlet工艺混合物引起的后部分布,更具体地侧重于从一定的混合物中生成观测到的数据时未知的组群数量的一致性。关键是,我们考虑的是,对于底部的Drichlet工艺的浓度参数,先对前部的浓度参数进行假设的情况。文献中以往的研究结果表明,如果集中参数是固定的,数据来自一定的混合物,Drichlet工艺混合物对于组群数量一般是不一致的。我们在这里表明,如果按照通常的做法完全以巴耶斯方式调整浓度参数,则可以实现组组群数量的一致性。我们的结果来自一个特定混合物类别的数据,对浓度参数的先前假设和混合物的各种可能性内核选择作出较轻的假设。