Bayesian methods of sampling from a posterior distribution are becoming increasingly popular due to their ability to precisely display the uncertainty of a model fit. Classical methods based on iterative random sampling and posterior evaluation such as Metropolis-Hastings are known to have desirable long run mixing properties, however are slow to converge. Gradient based methods, such as Langevin Dynamics (and its stochastic gradient counterpart) exhibit favorable dimension-dependence and fast mixing times for log-concave, and "close" to log-concave distributions, however also have long escape times from local minimizers. Many contemporary applications such as Bayesian Neural Networks are both high-dimensional and highly multimodal. In this paper we investigate the performance of a hybrid Metropolis and Langevin sampling method akin to Jump Diffusion on a range of synthetic and real data, indicating that careful calibration of mixing sampling jumps with gradient based chains significantly outperforms both pure gradient-based or sampling based schemes.
翻译:古老的方法基于迭接随机抽样和亚后评估,如大都会-哈斯廷等,已知具有可取的长期混合特性,但这种方法的趋同速度缓慢。基于渐进方法的模型,如Langevin Directives(及其吸附梯度对口),显示出对日志集成的有利的维度依赖和快速混合时间,以及“接近”于日志凝聚分布的快速混合时间,但也有从地方最小化器中长期逃逸的时间。许多当代应用,如Bayesian Neural 网络是高维度和高多式联运的。在本文件中,我们调查混合大都会和Langevin采样方法的性能,它们与一系列合成和真实数据相近乎于跳跃发,表明对基于梯度的链的混合采样跳动的仔细校准明显超越了纯梯度或基于采样的系统。