Reverse-mode automatic differentiation (AD) suffers from the issue of having too much space overhead to trace back intermediate computational states for back-propagation. The traditional method to trace back states is called checkpointing that stores intermediate states into a global stack and restore state through either stack pop or re-computing. The overhead of stack manipulations and re-computing makes the general purposed (not tensor-based) AD engines unable to meet many industrial needs. Instead of checkpointing, we propose to use reverse computing to trace back states by designing and implementing a reversible programming eDSL, where a program can be executed bi-directionally without implicit stack operations. The absence of implicit stack operations makes the program compatible with existing compiler features, including utilizing existing optimization passes and compiling the code as GPU kernels. We implement AD for sparse matrix operations and some machine learning applications to show that our framework has the state-of-the-art performance.
翻译:反向模式自动区分(AD)的问题在于空间管理费过多,无法追溯回回回回的中间计算状态。 传统的追溯状态方法称为将中间状态储存到一个全球堆叠中,并通过堆叠弹出或重新计算恢复状态。 堆叠操纵和再计算的间接费用使得一般目的(不是以压子为基础的)AD引擎无法满足许多工业需求。 我们提议通过设计和实施可逆的 eDSL 程序来反向计算追踪状态, 在那里, 一个程序可以在不隐含堆叠操作的情况下双向执行。 隐含的堆叠操作的缺失使得程序与现有的编译器特性兼容, 包括利用现有的优化通行证和将代码编成 GPU 内核。 我们为稀疏的矩阵操作和一些机器学习应用程序实施AD, 以显示我们的框架具有最先进的性能 。