There is a rich literature on Bayesian methods for density estimation, which characterize the unknown density as a mixture of kernels. Such methods have advantages in terms of providing uncertainty quantification in estimation, while being adaptive to a rich variety of densities. However, relative to frequentist locally adaptive kernel methods, Bayesian approaches can be slow and unstable to implement in relying on Markov chain Monte Carlo algorithms. To maintain most of the strengths of Bayesian approaches without the computational disadvantages, we propose a class of nearest neighbor-Dirichlet mixtures. The approach starts by grouping the data into neighborhoods based on standard algorithms. Within each neighborhood, the density is characterized via a Bayesian parametric model, such as a Gaussian with unknown parameters. Assigning a Dirichlet prior to the weights on these local kernels, we obtain a pseudo-posterior for the weights and kernel parameters. A simple and embarrassingly parallel Monte Carlo algorithm is proposed to sample from the resulting pseudo-posterior for the unknown density. Desirable asymptotic properties are shown, and the methods are evaluated in simulation studies and applied to a motivating data set in the context of classification.
翻译:有关Bayesian 密度估计方法的文献丰富,这些文献将未知密度定性为内核混合物的混合物。这些方法在提供估算不确定性的量化方面具有优势,同时适应多种密度。然而,相对于常见的本地适应内核方法,Bayesian 方法在依赖Markov连锁Monte Carlo算法时可能比较缓慢和不稳定。为了保持Bayesian 方法的大部分长处,而没有计算缺点,我们建议了一组最近的邻居-二里赫特混合物。该方法首先根据标准算法将数据分组到邻里。在每个社区中,该密度的特征是通过拜伊色的参数模型,如具有未知参数的Gaussian等。在这些本地内核重量之前指定一个Drichlet,我们获得一个重量参数和内核参数的假宫。我们建议从由此产生的未知密度的假隐形库中抽取一个简单和令人尴尬的平行的Monte Carlo算法。在模拟研究中使用了方法,并应用了数据分类方法来测量未知密度。