Deep neural networks with large model sizes achieve state-of-the-art results for tasks in computer vision (CV) and natural language processing (NLP). However, these large-scale models are too compute- or memory-intensive for resource-constrained edge devices. Prior works on parallel and distributed execution primarily focus on training -- rather than inference -- using homogeneous accelerators in data centers. We propose EdgePipe, a distributed framework for edge systems that uses pipeline parallelism to both speed up inference and enable running larger (and more accurate) models that otherwise cannot fit on single edge devices. EdgePipe achieves these results by using an optimal partition strategy that considers heterogeneity in compute, memory, and network bandwidth. Our empirical evaluation demonstrates that EdgePipe achieves $10.59\times$ and $11.88\times$ speedup using 16 edge devices for the ViT-Large and ViT-Huge models, respectively, with no accuracy loss. Similarly, EdgePipe improves ViT-Huge throughput by $3.93\times$ over a 4-node baseline using 16 edge devices, which independently cannot fit the model in memory. Finally, we show up to $4.16\times$ throughput improvement over the state-of-the-art PipeDream when using a heterogeneous set of devices.
翻译:大型模型规模大的深心神经网络在计算机视觉(CV)和自然语言处理(NLP)的任务方面达到最先进的结果。 然而,这些大型模型对于资源限制的边缘设备来说过于计算或记忆密集。 先前平行和分散执行的工程主要侧重于培训,而不是推断 -- -- 使用数据中心的同质加速器。 我们提议 EdgePipe, 边端系统的分布式框架, 使用管道平行装置来加快推断速度, 并能够运行本来无法适用于单一边缘装置的更大( 和更精确的) 模型。 EdgePipe 实现这些结果的方法是, 使用最佳的分区战略, 考虑在计算、 记忆和网络带宽方面的异质性。 我们的经验评估表明, EdgePipe 实现10. 59\time $ 和 11. 88\ time 加速度, 使用16 边端装置的边际装置, 且不精确损失。 同样, EdgePipe 改进 ViT- 任务- 透视点 由3.93\ 和网络带带宽宽宽度设备, 最后无法独立地显示4- 时间设置 。