Deep learning has been the engine powering many successes of data science. However, the deep neural network (DNN), as the basic model of deep learning, is often excessively over-parameterized, causing many difficulties in training, prediction and interpretation. We propose a frequentist-like method for learning sparse DNNs and justify its consistency under the Bayesian framework: the proposed method could learn a sparse DNN with at most $O(n/\log(n))$ connections and nice theoretical guarantees such as posterior consistency, variable selection consistency and asymptotically optimal generalization bounds. In particular, we establish posterior consistency for the sparse DNN with a mixture Gaussian prior, show that the structure of the sparse DNN can be consistently determined using a Laplace approximation-based marginal posterior inclusion probability approach, and use Bayesian evidence to elicit sparse DNNs learned by an optimization method such as stochastic gradient descent in multiple runs with different initializations. The proposed method is computationally more efficient than standard Bayesian methods for large-scale sparse DNNs. The numerical results indicate that the proposed method can perform very well for large-scale network compression and high-dimensional nonlinear variable selection, both advancing interpretable machine learning.
翻译:深度学习是数据科学许多成功动力的动力。然而,深神经网络(DNN)作为深深学习的基本模式,往往过于过度的参数化,在培训、预测和解释方面造成许多困难。我们提出了一种常见的学习稀有DNN的方法,在巴伊西亚框架下说明其一致性的理由:拟议方法可以学习稀疏的DNN,最多用美元(n/log(n))连接和良好的理论保障,如后传一致性、可变选择一致性和零星最佳一般化界限。特别是,我们为稀疏DNN建立后传一致性,并使用混合的Gaussian前版,表明稀散的DNNNN的结构可以使用拉普近似边缘后传概率方法持续确定,并使用Bayesian证据,通过优化方法,例如以不同初始化的多运行中随机梯度梯度梯度梯度梯度下降等,从而获得稀少的DNNN。拟议方法比大规模稀有的Bayesian方法效率更高。我们用混合的Gaussian 方法和高水平的机器选择方法都可以进行。数字和可变式的机器选择。