Democratization of machine learning requires architectures that automatically adapt to new problems. Neural Differential Equations (NDEs) have emerged as a popular modeling framework by removing the need for ML practitioners to choose the number of layers in a recurrent model. While we can control the computational cost by choosing the number of layers in standard architectures, in NDEs the number of neural network evaluations for a forward pass can depend on the number of steps of the adaptive ODE solver. But, can we force the NDE to learn the version with the least steps while not increasing the training cost? Current strategies to overcome slow prediction require high order automatic differentiation, leading to significantly higher training time. We describe a novel regularization method that uses the internal cost heuristics of adaptive differential equation solvers combined with discrete adjoint sensitivities to guide the training process towards learning NDEs that are easier to solve. This approach opens up the blackbox numerical analysis behind the differential equation solver's algorithm and directly uses its local error estimates and stiffness heuristics as cheap and accurate cost estimates. We incorporate our method without any change in the underlying NDE framework and show that our method extends beyond Ordinary Differential Equations to accommodate Neural Stochastic Differential Equations. We demonstrate how our approach can halve the prediction time and, unlike other methods which can increase the training time by an order of magnitude, we demonstrate similar reduction in training times. Together this showcases how the knowledge embedded within state-of-the-art equation solvers can be used to enhance machine learning.
翻译:机器学习的民主化需要自动适应新问题的架构。 神经差异等量( NDEs) 已经成为一个流行的建模框架, 消除了 ML 执行人员在经常性模型中选择层数的需要。 虽然我们可以通过选择标准架构层数来控制计算成本, 但是在 NDEs 中, 前进通道的神经网络评价数量取决于适应性化的 ODE 解算法背后的黑盒数字分析。 但是, 我们能否迫使 NDE 学习以最小步骤的版本而不增加培训成本? 目前克服缓慢预测的战略需要高度的自动差异, 导致大幅提高培训时间。 我们描述一种新的正规化方法, 使用适应性差异方程式的内部成本超常, 结合离散的关联性敏感度来指导培训过程学习更容易解决的NDE。 这种方法打开了差异方程式解算法背后的黑盒数字分析, 直接使用其本地错误估计和僵硬性性高的估算成本。 我们将我们的方法融入了基础的NDE- 自动区分, 导致培训中采用的方法, 也使用了相似的变异性预测框架, 显示我们的方法可以超越了常规的顺序。