There is a rich literature on Bayesian nonparametric methods for unknown densities. The most popular approach relies on Dirichlet process mixture models. These models characterize the unknown density as a kernel convolution with an unknown almost surely discrete mixing measure, which is given a Dirichlet process prior. Such models are very flexible and have good performance in many settings, but posterior computation typically relies on Markov chain Monte Carlo algorithms that can be complex and inefficient. As a simple alternative, we propose a class of nearest neighbor-Dirichlet processes. The approach starts by grouping the data into neighborhoods based on standard algorithms. Within each neighborhood, the density is characterized via a Bayesian parametric model, such as a Gaussian with unknown parameters. Assigning a Dirichlet prior to the weights on these local kernels, we obtain a simple pseudo-posterior for the weights and kernel parameters. A simple and embarrassingly parallel Monte Carlo algorithm is proposed to sample from the resulting pseudo-posterior for the unknown density. Desirable asymptotic properties are shown, and the methods are evaluated in simulation studies and applied to a motivating data set in the context of classification.
翻译:Bayesian非参数性方法中有大量关于未知密度的文献。 最受欢迎的方法依赖于 Drichlet 进程混合模型。 这些模型将未知密度定性为内核变异, 具有未知的、 几乎肯定的离散混合测量, 之前给它一个 Dirichlet 进程。 这些模型非常灵活, 在许多场合都有良好的性能, 但后方计算通常依赖于Markov 链条 Monte Carlo 算法, 这些算法可能是复杂和低效的。 作为简单的替代方案, 我们建议了一组最近的邻居- Drichlet 进程。 这种方法首先根据标准算法将数据分组到附近。 在每一个街区, 该密度通过拜地的参数模型来定性, 例如带未知参数的Gaussian 。 在对本地内核加权之前指定一个 Dirichlet, 我们通常会获得一个简单的伪称称重量和内核参数的参数。 一个简单和尴尬的平行的 Monte Carlo 算法, 被推荐为从由此产生的假密度的伪基点样本, 。 在模拟研究中, 展示了数据分类时, 并应用了方法 。