用于高维分布的隐藏 Markov Pólya 树 (Hidden Markov Pólya trees for high-dimensional distributions)

The P\'olya tree (PT) process is a general-purpose Bayesian nonparametric model that has found wide application in a range of inference problems. It has a simple analytic form and the posterior computation boils down to beta-binomial conjugate updates along a partition tree over the sample space. Recent development in PT models shows that performance of these models can be substantially improved by (i) allowing the partition tree to adapt to the structure of the underlying distributions and (ii) incorporating latent state variables that characterize local features of the underlying distributions. However, important limitations of the PT remain, including (i) the sensitivity in the posterior inference with respect to the choice of the partition tree, and (ii) the lack of scalability with respect to dimensionality of the sample space. We consider a modeling strategy for PT models that incorporates a flexible prior on the partition tree along with latent states with Markov dependency. We introduce a hybrid algorithm combining sequential Monte Carlo (SMC) and recursive message passing for posterior sampling that can scale up to 100 dimensions. While our description of the algorithm assumes a single computer environment, it has the potential to be implemented on distributed systems to further enhance the scalability. Moreover, we investigate the large sample properties of the tree structures and latent states under the posterior model. We carry out extensive numerical experiments in density estimation and two-group comparison, which show that flexible partitioning can substantially improve the performance of PT models in both inference tasks. We demonstrate an application to a mass cytometry data set with 19 dimensions and over 200,000 observations.

翻译：P\'olya 树(PT) 进程是一个通用的巴伊西亚非参数模型,在一系列推论问题中得到了广泛应用。它有一个简单的分析形式, 后部计算结果在采样空间上沿着分区树进行更新。 Pt 模型的最近发展表明, 这些模型的性能可以通过以下方式大大改进:(一) 允许分区树适应基础分布结构, (二) 纳入隐含分布地方特征的潜伏状态变量。但是, TP 仍然存在重要的局限性, 包括(一) 后部推断对分区树选择的敏感性, 以及(二) 试样空间的尺寸缺乏可缩放性。我们考虑这些模型的建模战略, 使分区树前的灵活, 以及隐性状态与Markov 模型的依附关系。我们引入一种灵活的混合算法, 将测序(SMC) 和测序信息传递给后部取样, 其范围可达100度, 包括(一) 后部推断模型的灵敏度的灵敏度, 以及(i) 后部) 以及(i) 后部) 直径分析结构结构结构结构结构结构结构的精确结构结构的可改进。我们的可进一步测量结构的演化分析, 的演化系统可演化的演化性, 将数据可演化到大变。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【图神经网络导论】Intro to Graph Neural Networks，176页ppt

专知会员服务

129+阅读 · 2021年6月4日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日