Stochastic gradient Markov Chain Monte Carlo (SGMCMC) is considered the gold standard for Bayesian inference in large-scale models, such as Bayesian neural networks. Since practitioners face speed versus accuracy tradeoffs in these models, variational inference (VI) is often the preferable option. Unfortunately, VI makes strong assumptions on both the factorization and functional form of the posterior. In this work, we propose a new non-parametric variational approximation that makes no assumptions about the approximate posterior's functional form and allows practitioners to specify the exact dependencies the algorithm should respect or break. The approach relies on a new Langevin-type algorithm that operates on a modified energy function, where parts of the latent variables are averaged over samples from earlier iterations of the Markov chain. This way, statistical dependencies can be broken in a controlled way, allowing the chain to mix faster. This scheme can be further modified in a "dropout" manner, leading to even more scalability. We test our scheme for ResNet-20 on CIFAR-10, SVHN, and FMNIST. In all cases, we find improvements in convergence speed and/or final accuracy compared to SG-MCMC and VI.
翻译:由于业者在这些模型中面临速度和精度权衡,变异推论(VI)往往是最可取的选择。不幸的是,VI对这些子体的乘数和功能形式都做出了强有力的假设。在这项工作中,我们提议一个新的非参数性变差近似值,不假定近似子星的功能形式,让从业者能够说明算法应该尊重或打破的确切依赖性。这个方法依靠一种新的Langevin型算法,这种算法依靠的是经修改的能源功能,即潜在变量的某些部分平均高于马可夫链早期迭代的样本。这样,统计依赖性可以以一种控制的方式打破,使链条能够更快地混合。这个办法可以进一步修改,以“抛出”的方式,导致更大的伸缩性。我们测试了我们的ResNet-20系统在CIFAR-10、SVHN和FMISM-MC上的具体依赖性。在所有案例中,我们发现在速度和最终趋同率方面,我们发现我们改进了ResNet-20计划在CIFAR-10、SVHN和FMIS-MC-VI。