One of the most used priors in Bayesian clustering is the Dirichlet prior. It can be expressed as a Chinese Restaurant Process. This process allows nonparametric estimation of the number of clusters when partitioning datasets. Its key feature is the "rich-get-richer" property, which assumes a cluster has an a priori probability to get chosen linearly dependent on population. In this paper, we show that such prior is not always the best choice to model data. We derive the Powered Chinese Restaurant process from a modified version of the Dirichlet-Multinomial distribution to answer this problem. We then develop some of its fundamental properties (expected number of clusters, convergence). Unlike state-of-the-art efforts in this direction, this new formulation allows for direct control of the importance of the "rich-get-richer" prior.
翻译:Bayesian 群集中最常用的前题之一是 Dirichlet 。 它可以表现为中国餐饮流程。 这个流程允许在分割数据集时对组群数量进行非参数估计。 它的关键特征是“ 富集型” 属性, 假设一个组群具有根据线性选择依赖人口的可能性。 在本文中, 我们显示, 这样的前题并不总是模型数据的最佳选择。 我们从一个修改版的 Dirichlet- Multinomial 配送中提取中国餐饮流程来解决这个问题。 然后我们开发了它的一些基本属性( 预期的组群数量, 趋同 ) 。 与这方面的最新设计不同, 这个新配方可以直接控制前“ 富集型” 的重要性 。