We develop a framework for the analysis of deep neural networks and neural ODE models that are trained with stochastic gradient algorithms. We do that by identifying the connections between control theory, deep learning and theory of statistical sampling. We derive Pontryagin's optimality principle and study the corresponding gradient flow in the form of Mean-Field Langevin dynamics (MFLD) for solving relaxed data-driven control problems. Subsequently, we study uniform-in-time propagation of chaos of time-discretised MFLD. We derive explicit convergence rate in terms of the learning rate, the number of particles/model parameters and the number of iterations of the gradient algorithm. In addition, we study the error arising when using a finite training data set and thus provide quantitive bounds on the generalisation error. Crucially, the obtained rates are dimension-independent. This is possible by exploiting the regularity of the model with respect to the measure over the parameter space.
翻译:我们开发了一个分析深神经网络和神经元模型的框架,这些模型经过了随机梯度算法的培训。我们这样做的方法是确定控制理论、深层学习和统计抽样理论之间的联系。我们从Pontryagin的优化原则中推导出Pontryagin的优化原则,并研究相应的梯度流,其形式为平均战地Langevin动态(MFLD),以解决放松的数据驱动控制问题。随后,我们研究时间分解MFLD混乱的统一实时传播。我们从学习率、粒子/模型参数的数量和梯度算法的迭代数中得出明确的趋同率。此外,我们还研究使用有限的培训数据集时产生的错误,从而提供一般差的等差。很显然,所获得的率是视尺寸而独立的。这是通过利用模型的规律性与参数空间的测量值来做到的。