We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon. We prove that any optimal control (for this cost) vanishes beyond some positive stopping time. When seen in the discrete-time context, this result entails an \emph{ordered} sparsity pattern for the parameters of the associated residual neural network: ordered in the sense that these parameters are all $0$ beyond a certain layer. Furthermore, we provide a polynomial stability estimate for the empirical risk with respect to the time horizon. This can be seen as a \emph{turnpike property}, for nonsmooth dynamics and functionals with $\ell^1$-penalties, and without any smallness assumptions on the data, both of which are new in the literature.
翻译:我们考虑的是监督学习的神经值和最佳控制角度,有1美元控制罚款,不仅将国家的最后成本(meph{emph{emperical risk})降到最低,而且将这一成本纳入整个时间范围。我们证明,任何最佳控制(为这一成本)都会消失超过一些积极的停止时间。在离散时间背景下,这一结果为相关剩余神经网络参数的参数带来一种\emph{ordered}紧张模式:从这些参数全部是超过某一层的0.00美元这一意义上来订购。此外,我们为时间范围的经验风险提供了一种多数值稳定性估计。这可以被看作是一个meph{gurppike 属性,对于非光滑动动态和功能,有$\ell1$-penals,而且对数据没有任何小的假设,两者在文献中都是新的。