The paper uses statistical and differential geometric motivation to acquire prior information about the learning capability of an artificial neural network on a given dataset. The paper considers a broad class of neural networks with generalized architecture performing simple least square regression with stochastic gradient descent (SGD). The system characteristics at two critical epochs in the learning trajectory are analyzed. During some epochs of the training phase, the system reaches equilibrium with the generalization capability attaining a maximum. The system can also be coherent with localized, non-equilibrium states, which is characterized by the stabilization of the Hessian matrix. The paper proves that neural networks with higher generalization capability will have a slower convergence rate. The relationship between the generalization capability with the stability of the neural network has also been discussed. By correlating the principles of high-energy physics with the learning theory of neural networks, the paper establishes a variant of the Complexity-Action conjecture from an artificial neural network perspective.
翻译:本文使用统计和不同的几何动机来获得关于某个数据集中人工神经网络学习能力的事先信息。 论文认为,有广泛的神经网络类别,其通用结构将具有简单的最平方的回归,与随机梯度梯度下降(SGD)作用。 正在分析学习轨迹中两个关键时代的系统特征。 在培训阶段的某些时期,系统达到平衡,其一般化能力达到最大化。 系统也可以与局部的、非平衡状态相一致,其特点是赫森矩阵稳定。 论文证明,具有更高一般化能力的神经网络将出现较慢的趋同率。 还讨论了一般化能力与神经网络稳定性之间的关系。 文件通过将高能物理原理与神经网络学习理论联系起来,从人工神经网络的角度确定了复杂行动预测的变式。