We present a technique for applying (forward and) reverse-mode automatic differentiation (AD) on a non-recursive second-order functional array language that supports nested parallelism and is primarily aimed at efficient GPU execution. The key idea is to eliminate the need for a "tape" by relying on redundant execution to bring into each new scope all program variables that may be needed by the differentiated code. Efficient execution is enabled by the observation that perfectly-nested scopes do not introduce re-execution, and such perfect nests are produced by known compiler transformations, e.g., flattening. Our technique differentiates loops and bulk-parallel operators, such as map, reduce, histogram, scan, scatter, by specific rewrite rules, and aggressively optimizes the resulting nested-parallel code. We report an experimental evaluation that compares with established AD solutions and demonstrates competitive performance on nine common benchmarks from recent applied AD literature.
翻译:我们提出了一个应用(前方和)反向模式自动区分技术(AD)的技术,该技术支持嵌套平行,主要旨在高效地执行GPU。关键的想法是通过依赖冗余执行,将差异代码可能需要的所有程序变量引入每个新范围,从而消除“磁带”的必要性。通过观察到完美放弃的范围不引入再执行,而这种完美的巢由已知的编译器变换产生,例如平坦。我们的技术区分环形和散射散射操作器,例如地图、缩小、直方图、扫描、按特定重写规则进行散射,并积极优化由此产生的巢状参数代码。我们报告一项实验性评价,与既定的自动计算法作比较,并展示最近应用的反倾销文献的九项共同基准的竞争性表现。