Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage with high performance, each accelerator vendor has to develop a full compiler stack to ingest, optimize, and execute the DNNs. This poses significant challenges in the development and maintenance of the software stack. In addition, the vendors have to contiguously update their hardware and/or software to cope with the rapid evolution of the DNN model architectures and operators. To address these issues, this paper proposes an open source framework that enables users to only concentrate on the development of their proprietary code generation tools by reusing as many as possible components in the existing deep learning compilers. Our framework provides users flexible and easy-to-use interfaces to partition their models into segments that can be executed on "the best" processors to take advantage of the powerful computation capability of accelerators. Our case study shows that our framework has been deployed in multiple commercial vendors' compiler stacks with only a few thousand lines of code.
翻译:深神经网络(DNN)已被广泛应用于许多应用程序,加速器也成为支持这些应用程序快速和高效的推断任务的辅助工具。然而,为了实现高模型覆盖率的高性能,每个加速器供应商必须开发一个完整的编译器堆叠,供取用、优化和执行DNN,这给软件堆的开发和维护带来了重大挑战。此外,供应商必须连续更新硬件和/或软件,以应对DNN模型架构和操作器的快速演变。为了解决这些问题,本文件提出了一个开放源框架,使用户能够仅专注于开发其自有代码生成工具,在现有的深层编译器中尽可能多地重新使用组件。我们的框架为用户提供了灵活和易于使用的界面,以便将其模型分割成可以在“最佳”处理器上执行的区块,以便利用加速器的强大计算能力。我们的案例研究显示,我们的框架被安装在多个商业供应商的编译器堆中,只有几千行的编码。