Neural differential equations may be trained by backpropagating gradients via the adjoint method, which is another differential equation typically solved using an adaptive-step-size numerical differential equation solver. A proposed step is accepted if its error, \emph{relative to some norm}, is sufficiently small; else it is rejected, the step is shrunk, and the process is repeated. Here, we demonstrate that the particular structure of the adjoint equations makes the usual choices of norm (such as $L^2$) unnecessarily stringent. By replacing it with a more appropriate (semi)norm, fewer steps are unnecessarily rejected and the backpropagation is made faster. This requires only minor code modifications. Experiments on a wide range of tasks -- including time series, generative modeling, and physical control -- demonstrate a median improvement of 40% fewer function evaluations. On some problems we see as much as 62% fewer function evaluations, so that the overall training time is roughly halved.
翻译:神经差异方程式可以通过联合法反向反演变梯度来训练神经差异方程式, 这是一种典型的另一种差异方程式, 通常使用适应性步数数字差方程解析器来解决。 如果一个建议步骤的错误( \ emph{ 相对某些规范} ) 足够小, 则该步骤被接受; 被否决, 该步骤缩小, 程序重复。 这里, 我们证明该联合方程式的特殊结构使得常规规则的通常选择( 如 $L%2$ ) 变得不必要严格。 通过用更合适的( emi) 诺尔姆来取代它, 减少步骤被不必要地拒绝, 而后方方方方方方程式则更快。 这只需要稍小的代码修改 。 在一系列任务上进行的实验, 包括时间序列、 组合建模和物理控制, 都显示中等改进了40% 的功能评价。 在有些问题上, 我们所看到的功能评价减少了62%, 以便总体培训时间大约减半 。