"嘿,这不是一个ODE": "通过Seminnorms更快的ODE Adjoints" ("Hey, that's not an ODE": Faster ODE Adjoints via Seminorms)

Neural differential equations may be trained by backpropagating gradients via the adjoint method, which is another differential equation typically solved using an adaptive-step-size numerical differential equation solver. A proposed step is accepted if its error, \emph{relative to some norm}, is sufficiently small; else it is rejected, the step is shrunk, and the process is repeated. Here, we demonstrate that the particular structure of the adjoint equations makes the usual choices of norm (such as $L^2$) unnecessarily stringent. By replacing it with a more appropriate (semi)norm, fewer steps are unnecessarily rejected and the backpropagation is made faster. This requires only minor code modifications. Experiments on a wide range of tasks -- including time series, generative modeling, and physical control -- demonstrate a median improvement of 40% fewer function evaluations. On some problems we see as much as 62% fewer function evaluations, so that the overall training time is roughly halved.

翻译：神经差异方程式可以通过联合法反向反演变梯度来训练神经差异方程式, 这是一种典型的另一种差异方程式, 通常使用适应性步数数字差方程解析器来解决。如果一个建议步骤的错误( \ emph{ 相对某些规范} ) 足够小, 则该步骤被接受; 被否决, 该步骤缩小, 程序重复。这里, 我们证明该联合方程式的特殊结构使得常规规则的通常选择( 如 $L%2$ ) 变得不必要严格。通过用更合适的( emi) 诺尔姆来取代它, 减少步骤被不必要地拒绝, 而后方方方方方方程式则更快。这只需要稍小的代码修改。在一系列任务上进行的实验, 包括时间序列、组合建模和物理控制, 都显示中等改进了40% 的功能评价。在有些问题上, 我们所看到的功能评价减少了62%, 以便总体培训时间大约减半。

相关内容

反向传播

关注 355

反向传播一词严格来说仅指用于计算梯度的算法，而不是指如何使用梯度。但是该术语通常被宽松地指整个学习算法，包括如何使用梯度，例如通过随机梯度下降。反向传播将增量计算概括为增量规则中的增量规则，该规则是反向传播的单层版本，然后通过自动微分进行广义化，其中反向传播是反向累积（或“反向模式”）的特例。在机器学习中，反向传播（backprop）是一种广泛用于训练前馈神经网络以进行监督学习的算法。对于其他人工神经网络（ANN）都存在反向传播的一般化–一类算法，通常称为“反向传播”。反向传播算法的工作原理是，通过链规则计算损失函数相对于每个权重的梯度，一次计算一层，从最后一层开始向后迭代，以避免链规则中中间项的冗余计算。

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

ICLR2021放榜了！ 687篇入选34篇得满分！ 48篇orals，108篇spotlights，531篇poster

专知会员服务

24+阅读 · 2021年1月13日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日