Recently, there has been an explosive growth of mobile and embedded applications using convolutional neural networks(CNNs). To alleviate their excessive computational demands, developers have traditionally resorted to cloud offloading, inducing high infrastructure costs and a strong dependence on networking conditions. On the other end, the emergence of powerful SoCs is gradually enabling on-device execution. Nonetheless, low- and mid-tier platforms still struggle to run state-of-the-art CNNs sufficiently. In this paper, we present DynO, a distributed inference framework that combines the best of both worlds to address several challenges, such as device heterogeneity, varying bandwidth and multi-objective requirements. Key components that enable this are its novel CNN-specific data packing method, which exploits the variability of precision needs in different parts of the CNN when onloading computation, and its novel scheduler that jointly tunes the partition point and transferred data precision at run time to adapt inference to its execution environment. Quantitative evaluation shows that DynO outperforms the current state-of-the-art, improving throughput by over an order of magnitude over device-only execution and up to 7.9x over competing CNN offloading systems, with up to 60x less data transferred.
翻译:最近,使用进化神经网络(CNNs)的移动和嵌入应用程序出现了爆炸性的增长。为了缓解其过度的计算需求,开发者历来采用云层卸载、高基础设施成本和对网络条件的强烈依赖。另一方面,强大的SoCs的出现正在逐渐促成脱机执行。然而,中低级平台仍然在努力运行最新水平的CNN。在本文中,我们介绍DynO,一个分布式推论框架,将世界的最佳力量结合在一起,以应对多种挑战,如设备异质性、不同带宽和多目标要求。使得这一功能得以实现的关键组成部分是其新型CNN特定数据包装方法,它利用CNN在加载计算时不同部分精确需求的变异性,以及它的新式的调度器,它联合调整了分区点和传输数据精度,以适应其执行环境。定量评估显示DynO超越了当前状态,通过超载量的SNISMX系统,通过超载速度改进到超载速度,通过超载速度的NISMIS-7.9, 超载到超载装置执行。