A neural network model of a differential equation, namely neural ODE, has enabled us to learn continuous-time dynamical systems and probabilistic distributions with a high accuracy. It uses the same network repeatedly during a numerical integration. Hence, the backpropagation algorithm requires a memory footprint proportional to the number of uses times the network size. This is true even if a checkpointing scheme divides the computational graph into sub-graphs. Otherwise, the adjoint method obtains a gradient by a numerical integration backward in time with a minimal memory footprint; however, it suffers from numerical errors. This study proposes the symplectic adjoint method, which obtains the exact gradient (up to rounding error) with a footprint proportional to the number of uses plus the network size. The experimental results demonstrate the symplectic adjoint method occupies the smallest footprint in most cases, functions faster in some cases, and is robust to a rounding error among competitive methods.
翻译:神经等式的神经网络模型,即神经元代码,使我们能够以高精度学习连续时间动态系统和概率分布。它在数字集成期间反复使用相同的网络。 因此, 反反向调整算法需要与网络规模使用次数成比例的内存足迹。 即使一个边检办法将计算图分成子图, 也确实如此。 否则, 联合方法通过数字集成获得梯度, 时间倒转, 最小的内存足迹; 然而, 它会受到数字错误的影响。 本研究建议采用共振连接法, 获得精确的梯度( 直至圆形错误), 其足迹与使用次数和网络大小成比例。 实验结果显示, 共振匹配法在多数情况下占据最小的足迹, 在某些情况下功能更快, 并且强于竞争性方法之间的四舍错。