Using rough path techniques, we provide a priori estimates for the output of Deep Residual Neural Networks in terms of both the input data and the (trained) network weights. As trained network weights are typically very rough when seen as functions of the layer, we propose to derive stability bounds in terms of the total $p$-variation of trained weights for any $p\in[1,3]$. Unlike the $C^1$-theory underlying the neural ODE literature, our estimates remain bounded even in the limiting case of weights behaving like Brownian motions, as suggested in [arXiv:2105.12245]. Mathematically, we interpret residual neural network as solutions to (rough) difference equations, and analyse them based on recent results of discrete time signatures and rough path theory.
翻译:使用粗略路径技术,我们从输入数据和(经过训练的)网络重量的角度对深残余神经网络的产出进行先验估计,由于经过训练的网络重量在被视为该层的功能时通常非常粗糙,我们提议从任何美元[1,300]美元经训练的重量的总额中得出稳定性界限。与神经代码文献背后的1美元理论不同,我们的估计即使在像布朗运动(arXiv:2105.12245)所建议的限制重量的案例中,也仍然有界限。从数学学上看,我们把残余神经网络解释为(粗)差异方程式的解决方案,并根据离散时间特征和粗路理论的最新结果加以分析。