As the usage of deep learning becomes increasingly popular in mobile and embedded solutions, it is necessary to convert the framework-specific network representations into executable code for these embedded platforms. This paper consists of two parts: The first section is made up of a survey and benchmark of the available open source deep learning compiler toolchains, which focus on the capabilities and performance of the individual solutions in regard to targeting embedded devices and microcontrollers that are combined with a dedicated accelerator in a heterogeneous fashion. The second part explores the implementation and evaluation of a compilation flow for such a heterogeneous device and reuses one of the existing toolchains to demonstrate the necessary steps for hardware developers that plan to build a software flow for their own hardware.
翻译:随着在移动和嵌入式解决方案中日益普及使用深层次学习,有必要将这些嵌入式平台的特定框架网络表达方式转换为可执行代码,本文件由两部分组成:第一部分由现有开放源代码深层学习汇编工具链的调查和基准组成,侧重于针对嵌入式设备和微型控制器的单个解决方案的能力和性能,这些装置和微控制器与专门加速器以不同方式结合使用;第二部分探讨如何实施和评价这种混杂装置的汇编流程,并重新使用现有工具链中的一个,以展示计划为自己的硬件建立软件流程的硬件开发者的必要步骤。