The P\'olya tree (PT) process is a general-purpose Bayesian nonparametric model that has found wide application in a range of inference problems. It has a simple analytic form and the posterior computation boils down to beta-binomial conjugate updates along a partition tree over the sample space. Recent development in PT models shows that performance of these models can be substantially improved by (i) allowing the partition tree to adapt to the structure of the underlying distributions and (ii) incorporating latent state variables that characterize local features of the underlying distributions. However, important limitations of the PT remain, including (i) the sensitivity in the posterior inference with respect to the choice of the partition tree, and (ii) the lack of scalability with respect to dimensionality of the sample space. We consider a modeling strategy for PT models that incorporates a flexible prior on the partition tree along with latent states with Markov dependency. We introduce a hybrid algorithm combining sequential Monte Carlo (SMC) and recursive message passing for posterior sampling that can scale up to 100 dimensions. While our description of the algorithm assumes a single computer environment, it has the potential to be implemented on distributed systems to further enhance the scalability. Moreover, we investigate the large sample properties of the tree structures and latent states under the posterior model. We carry out extensive numerical experiments in density estimation and two-group comparison, which show that flexible partitioning can substantially improve the performance of PT models in both inference tasks. We demonstrate an application to a mass cytometry data set with 19 dimensions and over 200,000 observations.
翻译:P\'olya 树(PT) 进程是一个通用的巴伊西亚非参数模型,在一系列推论问题中得到了广泛应用。 它有一个简单的分析形式, 后部计算结果在采样空间上沿着分区树进行更新。 Pt 模型的最近发展表明, 这些模型的性能可以通过以下方式大大改进:(一) 允许分区树适应基础分布结构, (二) 纳入隐含分布地方特征的潜伏状态变量。 但是, TP 仍然存在重要的局限性, 包括(一) 后部推断对分区树选择的敏感性, 以及(二) 试样空间的尺寸缺乏可缩放性。 我们考虑这些模型的建模战略, 使分区树前的灵活, 以及隐性状态与Markov 模型的依附关系。 我们引入一种灵活的混合算法, 将测序(SMC) 和测序信息传递给后部取样, 其范围可达100度, 包括(一) 后部推断模型的灵敏度的灵敏度, 以及(i) 后部) 以及(i) 后部) 直径分析结构结构结构结构结构结构结构的精确结构结构的可改进。 我们的可进一步测量结构的演化分析, 的演化系统可演化的演化性, 将数据可演化到大变。