This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing network. To avoid inference accuracy loss in inference task partitioning, we propose receptive field-based segmentation (RFS). To reduce the computation time and communication overhead, we propose a novel collaborative edge computing using fused-layer parallelization to partition a CNN model into multiple blocks of convolutional layers. In this scheme, the collaborative edge servers (ESs) only need to exchange small fraction of the sub-outputs after computing each fused block. In addition, to find the optimal solution of partitioning a CNN model into multiple blocks, we use dynamic programming, named as dynamic programming for fused-layer parallelization (DPFP). The experimental results show that DPFP can accelerate inference of VGG-16 up to 73% compared with the pre-trained model, which outperforms the existing work MoDNN in all tested scenarios. Moreover, we evaluate the service reliability of DPFP under time-variant channel, which shows that DPFP is an effective solution to ensure high service reliability with strict service deadline.
翻译:本文用合作边缘计算网络中的分布式神经神经网络(CNNs)来推断加速率。 为避免在推断任务分割中推断出准确性损失, 我们提议接受现场分割。 为了减少计算时间和通信间接费用, 我们提议一种新的协作边缘计算, 使用引信- 层平行法将CNN模型分成多个相联层块。 在这个计划中, 协作边缘服务器( ES) 只需在计算每个引信区块后交换一小部分次产出。 此外, 为了找到将CNN模型分割成多个区块的最佳解决方案, 我们使用动态编程, 称为引信层平行化动态编程( DPFP) 。 实验结果显示, DPFP 能够加快VGG-16 至 73% 的推论, 与预先培训模型相比, 该模型超越了所有测试情景中MDNNN的现有工作。 此外, 我们评估了时间变量频道下的DPFP的服务可靠性, 这表明DPFP是确保高服务可靠性的有效解决方案, 严格服务期限。