Automatic differentiation (AD) is a set of techniques that systematically applies the chain rule to compute the gradients of functions without requiring human intervention. Although the fundamentals of this technology were established decades ago, it is experiencing a renaissance as it plays a key role in efficiently computing gradients for backpropagation in machine learning algorithms. AD is also crucial for many applications in scientific computing domains, particularly emerging techniques that integrate machine learning models within scientific simulations and schemes. Existing AD frameworks have four main limitations: limited support of programming languages, requiring code modifications for AD compatibility, limited performance on scientific computing codes, and a naive store-all solution for forward-pass data required for gradient calculations. These limitations force domain scientists to manually compute the gradients for large problems. This work presents DaCe AD, a general, efficient automatic differentiation engine that requires no code modifications. DaCe AD uses a novel ILP-based algorithm to optimize the trade-off between storing and recomputing to achieve maximum performance within a given memory constraint. We showcase the generality of our method by applying it to NPBench, a suite of HPC benchmarks with diverse scientific computing patterns, where we outperform JAX, a Python framework with state-of-the-art general AD capabilities, by more than 92 times on average without requiring any code changes.
翻译:自动微分(AD)是一套系统应用链式法则计算函数梯度而无需人工干预的技术。尽管该技术的基本原理在数十年前就已确立,但因其在机器学习算法反向传播中高效计算梯度的关键作用,正经历复兴。AD对于科学计算领域的许多应用同样至关重要,尤其是在科学模拟与方案中集成机器学习模型的新兴技术。现有AD框架存在四个主要局限:编程语言支持有限、需要修改代码以实现AD兼容性、在科学计算代码上性能受限,以及为梯度计算所需的前向传播数据采用简单的全存储方案。这些局限迫使领域科学家针对大规模问题手动计算梯度。本研究提出DaCe AD,一种无需代码修改的通用高效自动微分引擎。DaCe AD采用基于整数线性规划(ILP)的新颖算法,在存储与重计算之间进行优化权衡,以在给定内存约束下实现最大性能。我们通过将其应用于NPBench(一套包含多样化科学计算模式的高性能计算基准测试套件)展示了方法的通用性,在无需任何代码修改的情况下,平均性能超越具有最先进通用AD能力的Python框架JAX达92倍以上。