Recent researches in artificial intelligence have proposed versatile convolutional neural networks (CNN) with different structures and substantially improved the accuracy of various intelligent applications. Nevertheless, the CNN inference imposes heavy computation overhead on mobile devices, but uploading the large volume of raw data to the cloud causes significant network latency. Motivated by the spatial independence of convolution operation, we propose pipeline cooperation (PICO) framework to accelerate CNN inference using multiple diverse mobile devices in this paper. PICO divides the CNN and mobile devices into several stages and combines them into an inference pipeline. PICO faces three main challenges: (1) Parallelizing the convolution operation introduces redundant calculation. (2) The partition is greatly complicated since the structures of many CNNs are directed acyclic graphs (DAG). (3) The mobile devices own diverse computing resources. In response to these issues, a two-step optimization is proposed based on deep analysis. We first orchestrate the DAG into sequential pieces, then divides these pieces and devices into stages. The optimization goal is to minimize the redundant calculation during partition and maximize the throughput. In our experiment with $2 \sim 8$ RaspberryPi devices, the throughput can be improved by $1.8 \sim 6.8 \times$ under different CPU frequencies.
翻译:最近对人工智能的研究提出了具有不同结构的多功能进化神经网络(CNN),并大大提高了各种智能应用的准确性。然而,CNN的推论在移动设备上设置了沉重的计算间接费用,但将大量原始数据上传到云层造成显著的网络悬浮。受进化操作空间独立性的驱动,我们提出了管道合作框架,以利用本文中多种不同移动设备加速CNN的推论。PICO将CNN和移动装置分为几个阶段,并把它们合并成一条推断管道。PICO面临三大挑战:(1) 将演进操作平行化,引入多余的计算。(2) 由于许多CNN的结构被定向为循环图(DAG),这种分割非常复杂。(3) 移动装置拥有多种计算资源。为了应对这些问题,我们提出了基于深入分析的两步优化方案。我们首先将DAGAG分为一系列,然后将这些件和移动装置分为几个阶段。优化目标是在分区期间尽量减少多余的计算,并最大限度地增加过量。(2) 由于许多CNNNS的结构结构结构结构是循环图(DAR8/PARIPI)下的2美元,通过不同频率可以改进的CARIPRAS。