结构式蒸汽梯度MCMC (Structured Stochastic Gradient MCMC)

Stochastic gradient Markov chain Monte Carlo (SGMCMC) is considered the gold standard for Bayesian inference in large-scale models, such as Bayesian neural networks. Since practitioners face speed versus accuracy tradeoffs in these models, variational inference (VI) is often the preferable option. Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. In this work, we propose a new non-parametric variational approximation that makes no assumptions about the approximate posterior's functional form and allows practitioners to specify the exact dependencies the algorithm should respect or break. The approach relies on a new Langevin-type algorithm that operates on a modified energy function, where parts of the latent variables are averaged over samples from earlier iterations of the Markov chain. This way, statistical dependencies can be broken in a controlled way, allowing the chain to mix faster. This scheme can be further modified in a ''dropout'' manner, leading to even more scalability. By implementing the scheme on a ResNet-20 architecture, we obtain better predictive likelihoods and larger effective sample sizes than full SGMCMC.

翻译：由于业者在这些模型中面临速度和精度权衡,变异推论(VI)往往是最可取的选择。不幸的是,VI对后游体的因数和功能形式做出了强有力的假设。在这项工作中,我们提议一个新的非参数性变差近似值,不对近地点的近似功能形式作任何假设,让从业者能够说明算法应该尊重或打破的确切依赖性。这个方法依靠一种新的Langevin型算法,该算法以经修改的能源功能运作,即潜在变量的一部分平均高于马可夫链早期迭代的样本。这样,统计依赖性可以以一种控制的方式打破,使链条能够更快地混合。这个办法可以进一步修改为“抛出”方式,从而导致更大的可伸缩性。通过在ResNet-20结构上实施这个方案,我们获得了更好的预测可能性,并且比整个SGMC要大得多的样本大小。

相关内容

马尔可夫链

关注 289

马尔可夫链，因安德烈·马尔可夫（A.A.Markov，1856－1922）得名，是指数学中具有马尔可夫性质的离散事件随机过程。该过程中，在给定当前知识或信息的情况下，过去（即当前以前的历史状态）对于预测将来（即当前以后的未来状态）是无关的。在马尔可夫链的每一步，系统根据概率分布，可以从一个状态变到另一个状态，也可以保持当前状态。状态的改变叫做转移，与不同的状态改变相关的概率叫做转移概率。随机漫步就是马尔可夫链的例子。随机漫步中每一步的状态是在图形中的点，每一步可以移动到任何一个相邻的点，在这里移动到每一个点的概率都是相同的（无论之前漫步路径是如何的）。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日