Machine learning model deployment for training and execution has been an important topic for industry and academic research in the last decade. Much of the attention has been focused on developing specific toolchains to support acceleration hardware. In this paper, we present IREE, a unified compiler and runtime stack with the explicit goal to scale down machine learning programs to the smallest footprints for mobile and edge devices, while maintaining the ability to scale up to larger deployment targets. IREE adopts a compiler-based approach and optimizes for heterogeneous hardware accelerators through the use of the MLIR compiler infrastructure which provides the means to quickly design and implement multi-level compiler intermediate representations (IR). More specifically, this paper is focused on TinyIREE, which is a set of deployment options in IREE that accommodate the limited memory and computation resources in embedded systems and bare-metal platforms, while also demonstrating IREE's intuitive workflow that generates workloads for different ISA extensions and ABIs through LLVM.
翻译:过去十年来,为培训和执行而部署机器学习模型一直是工业和学术研究的一个重要议题,许多注意力都集中在开发支持加速硬件的具体工具链上。本文介绍IREE,这是一个统一的编译器和运行时间堆,其明确目标是将机器学习程序缩小到移动和边缘装置的最小脚印,同时保持将规模扩大到更大部署目标的能力。IREE采用基于编译器的方法,并通过使用MLIR编译器基础设施优化不同硬件加速器。MLIR编译器基础设施提供了快速设计和实施多级别编译器中间演示(IR)的手段。更具体地说,本文侧重于TinyIRREE,这是IREE的一套部署选项,它满足嵌入系统和光金属平台有限的记忆和计算资源,同时展示IREE的直觉工作流程,通过LLVM为不同的ISA扩展和ABI产生工作量。