The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. When combined with the prior, this accurately predicts the posterior, measured for DNNs trained with stochastic gradient descent. This analysis reveals that structured data, combined with an intrinsic Occam's razor-like inductive bias towards (Kolmogorov) simple functions that is strong enough to counteract the exponential growth of the number of functions with complexity, is a key to the success of DNNs.
翻译:深度神经网络(DNN)的卓越性能必然是由网络结构、训练算法和数据结构之间的相互作用所产生。为了解开这三个因素,我们基于DNN表达的函数,采用了贝叶斯图像来进行监督学习。函数的先验由网络确定,并通过利用有序和混沌状态之间的转换进行变化。对于布尔函数分类,我们使用函数在数据上的误差谱来近似似然。当与先验结合使用时,能够准确预测用随机梯度下降训练的DNN的后验结果。这个分析揭示了当结构化数据与对(Kolmogorov)简单函数具有内在奥卡姆剃刀样的归纳偏差相结合时,能强到足以抵消函数复杂性呈指数增长的DNN成功的关键所在。