Discrete random structures are important tools in Bayesian nonparametrics and the resulting models have proven effective in density estimation, clustering, topic modeling and prediction, among others. In this paper, we consider nested processes and study the dependence structures they induce. Dependence ranges between homogeneity, corresponding to full exchangeability, and maximum heterogeneity, corresponding to (unconditional) independence across samples. The popular nested Dirichlet process is shown to degenerate to the fully exchangeable case when there are ties across samples at the observed or latent level. To overcome this drawback, inherent to nesting general discrete random measures, we introduce a novel class of latent nested processes. These are obtained by adding common and group-specific completely random measures and, then, normalising to yield dependent random probability measures. We provide results on the partition distributions induced by latent nested processes, and develop an Markov Chain Monte Carlo sampler for Bayesian inferences. A test for distributional homogeneity across groups is obtained as a by product. The results and their inferential implications are showcased on synthetic and real data.
翻译:分解随机结构是巴伊西亚非参数中的重要工具, 由此产生的模型在密度估计、 集群、 主题建模和预测等方面证明是有效的。 在本文中, 我们考虑嵌套过程并研究它们产生的依赖性结构。 依赖性在同质性之间, 对应完全互换性, 和最大异异性之间, 对应于( 无条件) 不同样品的( 不受条件限制) 。 流行的巢式狄里赫特进程在发现或潜伏的样本之间有联系时, 被显示退化为完全互换的情况 。 为了克服嵌套一般离散随机措施所固有的这一缺陷, 我们引入了新型的隐藏嵌套过程。 这些都是通过添加普通和群体特有的随机措施实现的, 然后, 正常地产生取决于随机性的措施 。 我们提供由潜在嵌套过程引起的分区分布的结果, 并为巴伊西亚的推断开发一个 Markov 链 Monte 卡洛 采样器 。 通过产品获取跨组间分布性同性测试。 其结果及其推断影响在合成和真实数据上展示 。