Bayesian methods hold significant promise for improving the uncertainty quantification ability and robustness of deep neural network models. Recent research has seen the investigation of a number of approximate Bayesian inference methods for deep neural networks, building on both the variational Bayesian and Markov chain Monte Carlo (MCMC) frameworks. A fundamental issue with MCMC methods is that the improvements they enable are obtained at the expense of increased computation time and model storage costs. In this paper, we investigate the potential of sparse network structures to flexibly trade-off model storage costs and inference run time against predictive performance and uncertainty quantification ability. We use stochastic gradient MCMC methods as the core Bayesian inference method and consider a variety of approaches for selecting sparse network structures. Surprisingly, our results show that certain classes of randomly selected substructures can perform as well as substructures derived from state-of-the-art iterative pruning methods while drastically reducing model training times.
翻译:Bayesian方法对于提高深神经网络模型的不确定性量化能力和稳健性具有重大前景。最近的研究发现,在Bayesian和Markov连锁Monte Carlo(MCMC)框架的基础上,对深神经网络的一些近似Bayesian 和 Markov 推导方法进行了调查。Monte Carlo(MCMC)框架的基本问题是,这些方法所促成的改进是以增加计算时间和模型储存成本为代价取得的。在本文件中,我们研究了稀少的网络结构对于灵活权衡模型储存成本的潜力,以及根据预测性能和不确定性量化能力进行推论的时间。我们使用随机梯度梯度 MCMC方法作为Bayesian 核心推论方法,并考虑了选择稀有网络结构的各种方法。令人惊讶的是,我们的结果表明,某些随机选定的子结构可以同时运行来自最先进的迭代计算方法的亚结构,同时大幅缩短模型培训时间。