Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others. However, continuously executing the entire DNN on the mobile device can quickly deplete its battery. Although task offloading to cloud/edge servers may decrease the mobile device's computational burden, erratic patterns in channel quality, network, and edge server load can lead to a significant delay in task execution. Recently, approaches based on split computing (SC) have been proposed, where the DNN is split into a head and a tail model, executed respectively on the mobile device and on the edge server. Ultimately, this may reduce bandwidth usage as well as energy consumption. Another approach, called early exiting (EE), trains models to present multiple "exits" earlier in the architecture, each providing increasingly higher target accuracy. Therefore, the trade-off between accuracy and delay can be tuned according to the current conditions or application demands. In this paper, we provide a comprehensive survey of the state of the art in SC and EE strategies by presenting a comparison of the most relevant approaches. We conclude the paper by providing a set of compelling research challenges.
翻译:智能手机和自主车辆等移动设备越来越多地依赖深神经网络(DNNs)来执行图像分类和语音识别等复杂推论任务。然而,在移动设备上持续执行整个DNN可以迅速耗尽电池。虽然将任务卸载到云端/顶端服务器可能会减少移动设备的计算负担,但频道质量、网络和边缘服务器负荷的不稳定模式可能导致任务执行的重大延误。最近,提出了基于分裂计算(SC)的方法,其中DNN分为头型和尾型,分别由移动设备上和边缘服务器上执行。最终,这可能会减少带宽的使用和能源消耗。另一种方法叫做早期退出(EEEE),培训模型在结构中更早地展示多重“访问”,每个模型提供越来越高的目标准确性。因此,准确性和延迟之间的权衡可以根据当前条件或应用要求进行调整。在本文件中,我们通过比较最相关的方法,对SC和EE战略的现状进行全面调查。我们通过提供一套令人信服的研究来完成论文。