We revisit the theoretical properties of Hamiltonian stochastic differential equations (SDES) for Bayesian posterior sampling, and we study the two types of errors that arise from numerical SDE simulation: the discretization error and the error due to noisy gradient estimates in the context of data subsampling. Our main result is a novel analysis for the effect of mini-batches through the lens of differential operator splitting, revising previous literature results. The stochastic component of a Hamiltonian SDE is decoupled from the gradient noise, for which we make no normality assumptions. This leads to the identification of a convergence bottleneck: when considering mini-batches, the best achievable error rate is $\mathcal{O}(\eta^2)$, with $\eta$ being the integrator step size. Our theoretical results are supported by an empirical study on a variety of regression and classification tasks for Bayesian neural networks.
翻译:我们重新审视了Bayesian 后方取样的汉密尔顿式随机差分方程式(SDES)的理论特性,并研究了数字SDE模拟产生的两种错误:在数据子取样中,离散错误和因高压梯度估计引起的错误。我们的主要结果是通过不同操作者分裂的镜头,对微型管子的影响进行新颖的分析,修改以前的文献结果。汉密尔顿式SDE的随机部分与梯度噪音脱钩,我们对此没有做出正常的假设。这导致确定一个趋同瓶颈:在考虑微型管时,最佳的可实现错误率是$\mathcal{O}(\eta2)$,而美元是异形体级体大小。关于Bayesian 神经网络的各种回归和分类任务的经验研究支持了我们的理论结果。