This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. To ensure inference accuracy in inference task partitioning, we consider the receptive-field when performing segment-based partitioning. To maximize the parallelization between the communication and computing processes, thereby minimizing the total inference time of an inference task, we design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP. We further extend HALP to the scenario of multiple tasks. Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier, which outperforms the state-of-the-art work MoDNN. Moreover, we evaluate the service reliability under time-variant channel, which shows that HALP is an effective solution to ensure high service reliability with strict service deadline.
翻译:本文用合作边缘计算中的分布式进化神经网络(CNNs)来推断加速率。为了确保推导任务分割的准确性,我们在进行基于分区的分割时考虑接受场。为了最大限度地实现通信和计算过程的平行,从而最大限度地减少推论任务的总推论时间,我们设计了一个新颖的任务协作计划,使二级边缘服务器(ES)的子任务重叠区在主机ES(称为HALP)上执行。我们进一步将HALP扩大到多重任务的情况。实验结果显示,HALP可以将VG-16的CNN推断加速1.7-2.0x,用于一项单一任务,将GTX 1080TI和JETson AGXXavier每批4项任务加速1.7-1.8x,该任务超过MDNN.。此外,我们评估了时效通道下的服务可靠性,这表明HALP是确保高服务可靠性的有效解决办法,服务期限严格。