We consider checkpointing strategies that minimize the number of recomputations needed when performing discrete adjoint computations using multistage time-stepping schemes, which requires computing several substeps within one complete time step. In this case we propose two algorithms that can generate optimal checkpointing schedules under weak assumptions. The first is an extension of the seminal Revolve algorithm adapted to multistage schemes. The second algorithm, named CAMS, is developed based on dynamic programming, and it requires the least number of recomputations when compared with other algorithms. The CAMS algorithm is made publicly available in a library with bindings to C and Python. Numerical results illustrate that the proposed algorithms can deliver up to two times the speedup compared with that of classical Revolve. Moreover, we discuss a tailored implementation of an adjoint computation that is arguably better suited for mature scientific computing libraries by avoiding the central control assumed by the original checkpointing strategy. The proposed algorithms have been adopted by the PETSc TSAdjoint library. Their performance has been demonstrated with a large-scale PDE-constrained optimization problem on a leadership-class supercomputer.
翻译:我们考虑采用使用多阶段时间步骤计划进行离散联合计算时所需的重复计算数量最小化的检查战略,使用多阶段时间步骤计划需要计算几个子步骤。 在这种情况下,我们提议两种算法,可以在薄弱假设下产生最佳的检查时间表。第一个是扩展适合多阶段计划的半循环算法。第二个算法称为CAMS,是根据动态程序设计出来的,与其他算法相比,它要求的重复计算数量最少。CAMS算法在一个与C和Python捆绑在一起的图书馆中公开提供。数字结果显示,提议的算法可以提供比古典循环系统速度高两倍的加速速度。此外,我们讨论一个适合成熟的科学计算图书馆的定制计算方法,即避免最初的检查战略所假设的中央控制。提议的算法已被PETSC TSAdcomitive 图书馆采用。其表现表现表现表现表现在领导阶级超级计算机上出现了大规模受限制的PDE节制的优化问题。