Deploying deep learning models on various devices has become an important topic. The wave of hardware specialization brings a diverse set of acceleration primitives for multi-dimensional tensor computations. These new acceleration primitives, along with the emerging machine learning models, bring tremendous engineering challenges. In this paper, we present TensorIR, a compiler abstraction for optimizing programs with these tensor computation primitives. TensorIR generalizes the loop nest representation used in existing machine learning compilers to bring tensor computation as the first-class citizen. Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives. Experimental results show that TensorIR compilation automatically uses the tensor computation primitives for given hardware backends and delivers performance that is competitive to state-of-art hand-optimized systems across platforms.
翻译:在各种设备上部署深层学习模型已成为一个重要的主题。 硬件专业化浪潮带来了一套多种加速原始元素,用于多维的抗拉计算。 这些新的加速原始以及新兴的机器学习模型带来了巨大的工程挑战。 在本文中,我们展示了TensorIR, 这是一种编译器抽象, 用于优化程序, 使用这些高压计算原始元素。 TensorIR 概括了现有机器学习汇编器中所使用的环形巢代表, 以将 Exor 计算作为一流公民。 最后, 我们在我们抽象的顶端上建立了一个端到端框架, 以自动优化给给的抗拉计算原始元素的深层次学习模型。 实验结果显示, TensorIR 编集自动使用 Exor 计算原始元素, 用于给定的硬件后端, 并交付对平台上最先进的手优化系统具有竞争力的性能。