Traditional deep learning compilers rely on heuristics for subgraph generation, which impose extra constraints on graph optimization, e.g., each subgraph can only contain at most one complex operator. In this paper, we propose AGO, a framework for graph optimization with arbitrary structures to boost the inference performance of deep models by removing such constraints. To create new optimization opportunities for complicated subgraphs, we propose intensive operator fusion, which can effectively stitch multiple complex operators together for better performance. Further, we design a graph partitioning scheme that allows an arbitrary structure for each subgraph while guaranteeing the acyclic property among all generated subgraphs. Additionally, to enable efficient performance tuning on complicated subgraphs, we devise a novel divide-and-conquer tuning mechanism to orchestrate different system components. Through extensive experiments on various neural networks and mobile devices, we show that our system can improve the inference performance by up to 3.3x when compared with state-of-the-art deep compilers.
翻译:传统的深层学习编译者依靠图层生成的逻辑学,这给图层优化造成额外的限制,例如,每个图层只能包含一个最复杂的操作者。在本文中,我们提议AGO,这是一个图层优化框架,其结构具有任意性,通过消除这些限制来提高深层模型的推论性能。为了为复杂的子层生成创造新的优化机会,我们提议密集的操作者聚合,它可以有效地将多个复杂操作者缝合在一起,以提高性能。此外,我们设计了一个图层分割计划,允许每个图层的任意结构,同时保证所有生成的子层的周期性能。此外,为了能够对复杂的子层进行高效的性能调整,我们设计了一个新型的分化和化调机制,通过对各种神经网络和移动装置进行广泛的实验,我们表明我们的系统可以改进推论性能,与最先进的深层编译者相比,可以达到3.3x。