Recent breakthroughs in deep learning (DL) have led to the emergence of many intelligent mobile applications and services, but in the meanwhile also pose unprecedented computing challenges on resource-constrained mobile devices. This paper builds a collaborative deep inference system between a resource-constrained mobile device and a powerful edge server, aiming at joining the power of both on-device processing and computation offloading. The basic idea of this system is to partition a deep neural network (DNN) into a front-end part running on the mobile device and a back-end part running on the edge server, with the key challenge being how to locate the optimal partition point to minimize the end-to-end inference delay. Unlike existing efforts on DNN partitioning that rely heavily on a dedicated offline profiling stage to search for the optimal partition point, our system has a built-in online learning module, called Autodidactic Neurosurgeon (ANS), to automatically learn the optimal partition point on-the-fly. Therefore, ANS is able to closely follow the changes of the system environment by generating new knowledge for adaptive decision making. The core of ANS is a novel contextual bandit learning algorithm, called $\mu$LinUCB, which not only has provable theoretical learning performance guarantee but also is ultra-lightweight for easy real-world implementation. We implement our system on a video stream object detection testbed to validate the design of ANS and evaluate its performance. The experiments show that ANS significantly outperforms state-of-the-art benchmarks in terms of tracking system changes and reducing the end-to-end inference delay.
翻译:最近深层学习(DL)的突破导致了许多智能移动应用程序和服务的出现,但与此同时,在资源限制的移动设备上也带来了前所未有的计算挑战。本文在资源限制的移动设备和强大的边缘服务器之间建立了一个协作性的深度推断系统,目的是结合在设备上处理和计算卸载的功能。这个系统的基本想法是将一个深神经网络(DNN)分割成一个在移动设备上运行的前端部分,并在边缘服务器上运行一个后端部分,关键的挑战是如何定位最佳的分区点,以尽量减少最终到最终推算的延迟。与目前大量依赖专用非线外剖析的移动设备与强大的边缘服务器建立协作性的深度推断系统相比,我们的系统有一个内部的在线学习模块,称为自动应用神经系统(ANS),通过生成适应目标决定的新知识来密切跟踪系统环境的变化。 ANS-NS-NPL 系统的核心是一个新颖的背景缩略缩缩缩缩缩缩缩略图的运行过程,我们需要在现实测试中进行实时的缩略图。