Edge computing has been emerging as a popular scenario for model inference. However, the inference performance on edge devices (e.g., Multi-Core DSP, FGPA, etc.) suffers from inefficiency due to the lack of highly optimized inference frameworks. Previous model inference frameworks are mainly developed in an operator-centric way, which provides insufficient acceleration to edge-based inference. Besides, the operator-centric framework incurs significant costs for continuous development and maintenance. In this paper, we propose Xenos, which can automatically conduct dataflow-centric optimization of the computation graph and accelerate inference in two dimensions. Vertically, Xenos develops operator linking technique to improve data locality by restructuring the inter-operator dataflow. Horizontally, Xenos develops DSP-aware operator split technique to enable higher parallelism across multiple DSP units. Our evaluation proves the effectiveness of vertical and horizontal dataflow optimization, which reduce the inference time by 21.2\%--84.9\% and 17.9\%--96.2\% , respectively. Besides, Xenos also outperforms the widely-used TVM by 3.22$\times$--17.92$\times$. Moreover, we extend Xenos to a distributed solution, which we call d-Xenos. d-Xenos employs multiple edge devices to jointly conduct the inference task and achieves a speedup of 3.68x--3.78x compared with the single device.
翻译:电磁计算是模型推断的流行假设。然而,由于缺少高度优化的推论框架,边缘设备(例如多科DSP、FGPA等)的推论性能效率低下。以前的模型推论框架主要以操作者为中心的方式制定,为边缘推论提供加速度不足。此外,操作者中心框架为持续开发和维护带来了巨大的成本。我们在此文件中提议Xenos,它可以自动对计算图进行数据流中心优化,加速两个维度的推论。垂直,Xenos开发了连接操作者的技术,通过重组操作者之间的数据流来改善数据位置。水平上,Xenos开发了DSP-观测操作者分解技术,使多个 DSP单位能够实现更高的平行推力。我们的评估证明了垂直和横向数据流优化的有效性,从而将计算时间分别减少21.2 ⁇ -84.9和17.9 ⁇ -9620美元。此外,Xenos开发了连接操作者连接数据位置的技术,而Xenoal-eutislates a ex-roduction the weuptime axxx ex ex ex lax lax a ex ex ex ex a lax lax lax a ex a lax laxxx laxx ex a lax laxx lax lax lax lax a lax lautds a ex a ex a lax lads a laxxxxxxxxxxxxxxx