We consider the continuous-time, neural ordinary differential equation (neural ODE) perspective of deep supervised learning, and study the impact of the final time horizon $T$ in training. We focus on a cost consisting of an integral of the empirical risk over the time interval, and $L^1$--parameter regularization. Under homogeneity assumptions on the dynamics (typical for ReLU activations), we prove that any global minimizer is sparse, in the sense that there exists a positive stopping time $T^*$ beyond which the optimal parameters vanish. Moreover, under appropriate interpolation assumptions on the neural ODE, we provide quantitative estimates of the stopping time $T^\ast$, and of the training error of the trajectories at the stopping time. The latter stipulates a quantitative approximation property of neural ODE flows with sparse parameters. In practical terms, a shorter time-horizon in the training problem can be interpreted as considering a shallower residual neural network (ResNet), and since the optimal parameters are concentrated over a shorter time horizon, such a consideration may lower the computational cost of training without discarding relevant information.
翻译:我们认为,深入监督学习的连续时间、神经普通差异方程式(Neal Complication)观点是深入监督学习的连续时间、神经普通差异方程式(Neal Ode)观点,并研究最后时间范围在培训中的影响。我们注重成本,包括时间间隔内经验风险的内在部分,以及1美元参数的正规化。根据关于动态的同质性假设(通常为RELU激活),我们证明,任何全球最小化因素都很少,这意味着存在积极的停止时间,超过最佳参数消失。此外,根据对神经空间的适当的内推假设,我们提供了停止时间的量化估算值$T ⁇ ast$,以及停止时间轨迹的培训错误的量化估算值。后者规定了有稀薄参数的神经性极值流动的定量近似特性。实际上,培训问题中较短的时间焦距可以解释为考虑浅度的残余神经网络(ResNet),而且由于最佳参数集中在较短的时间范围,因此考虑可能降低培训的计算成本,而不抛弃相关信息。