In this work, we revisit the theoretical properties of Hamiltonian stochastic differential equations (SDEs) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. We consider overlooked results describing the ergodic convergence rates of numerical integration schemes, and we produce a novel analysis for the effect of mini-batches through the lens of differential operator splitting. In our analysis, the stochastic component of the proposed Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This allows us to derive interesting connections among different sampling schemes, including the original Hamiltonian Monte Carlo (HMC) algorithm, and explain their performance. We show that for a careful selection of numerical integrators, both errors vanish at a rate $\mathcal{O}(\eta^2)$, where $\eta$ is the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.
翻译:在这项工作中,我们重新审视了Bayesian 子宫外取样的汉密尔顿随机差分方程(SDEs)的理论属性,并研究了数字SDE模拟产生的两种错误:数据子抽样中因高振的梯度估计引起的离散错误和差错。我们考虑了描述数字整合办法的垂直趋同率的被忽视结果,我们从差分操作员分裂的镜头中为小型管子的效果做了新的分析。在我们的分析中,拟议的汉密尔顿SDE的随机差分部分与梯度噪音脱钩,我们对此没有做出正常的假设。这使我们能够从不同的取样办法,包括最初的汉密尔顿蒙特卡洛(HMC)算法中获取有趣的联系,并解释其性能。我们表明,为了仔细选择数字化器,两种差错都以$mathcal{O}(geeta%2美元)消失,而美元是分解器大小。我们的理论结果得到关于巴伊星神经网络一系列回归和分类任务的经验性研究的支持。