Dedicated tensor accelerators demonstrate the importance of linear algebra in modern applications. Such accelerators have the potential for impressive performance gains, but require programmers to rewrite code using vendor APIs - a barrier to wider scale adoption. Recent work overcomes this by matching and replacing patterns within code, but such approaches are fragile and fail to cope with the diversity of real-world codes. We develop ATC, a compiler that uses program synthesis to map regions of code to specific APIs. The mapping space that ATC explores is combinatorially large, requiring the development of program classification, dynamic analysis, variable constraint generation and lexical distance matching techniques to make it tractable. We apply ATC to real-world tensor and linear algebra codes and evaluate them against four state-of-the-art approaches. We accelerate between 2.6x and 7x more programs, leading to over an order of magnitude performance improvement.
翻译:专用的压强加速器显示了线性代数在现代应用中的重要性。 这种加速器具有惊人的性能增益潜力,但要求程序设计员使用供应商 API 重写代码,这是更广泛采用的障碍。 最近的工作克服了这一点,在代码中匹配和替换模式,但这类方法很脆弱,无法应对现实世界代码的多样性。我们开发了ATC,这是一个使用程序合成来绘制特定 API 代码区域图的汇编器。 ATC 所探索的绘图空间是广集的,需要开发程序分类、动态分析、可变制约生成和词汇距离匹配技术才能使其可移动。我们将ATC 应用到真实世界的 发光和线性代数代码,并根据四种最先进的方法对其进行评估。我们加速了2.6x 和 7x 多程序之间的程序,从而导致超大型性能改进。