DNNFusion: 加速与高级操作器融合的深神经网络执行 (DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion)

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

翻译：深神经网络(DNN)已成为移动设备许多主要应用的核心推进器。为了实现高度精确,DNN模型随着数百甚至数千个操作层的操作层而变得日益深入,从而导致对推断的记忆和计算要求很高。操作器整合(或内核/层融合)是许多最先进的 DNNN执行框架的关键优化,如TensorFlow、TVMM和MNN。然而,这些框架通常采用基于某些模式的聚合方法,这些模式过于严格,无法覆盖操作器和层连接的多样性。多管基循环组合模型的循环整合技术已经越来越深入,在另一手,在没有操作器级信息的情况下进行低水平的计算,也可能错过潜在融合机会。为了应对这一挑战,本文提出了一个新的广泛的循环整合框架,称为DNNNFusFus。这项工作的基本设想是在操作器模型中工作,但通过对单个操作器和组合进行分类,扩大融合机会。此外,DNNFSO-8的循环组合组合组合组合组合组合组合组合技术技术,DNNF-NUF级的计算模型运行包括8级的新型智能数据分析,在15年期的模型中进行新版本的数学和随后的版本的数学模型分析。

相关内容