Deep neural networks (DNNs) sustain high performance in today's data processing applications. DNN inference is resource-intensive thus is difficult to fit into a mobile device. An alternative is to offload the DNN inference to a cloud server. However, such an approach requires heavy raw data transmission between the mobile device and the cloud server, which is not suitable for mission-critical and privacy-sensitive applications such as autopilot. To solve this problem, recent advances unleash DNN services using the edge computing paradigm. The existing approaches split a DNN into two parts and deploy the two partitions to computation nodes at two edge computing tiers. Nonetheless, these methods overlook collaborative device-edge-cloud computation resources. Besides, previous algorithms demand the whole DNN re-partitioning to adapt to computation resource changes and network dynamics. Moreover, for resource-demanding convolutional layers, prior works do not give a parallel processing strategy without loss of accuracy at the edge side. To tackle these issues, we propose D3, a dynamic DNN decomposition system for synergistic inference without precision loss. The proposed system introduces a heuristic algorithm named horizontal partition algorithm to split a DNN into three parts. The algorithm can partially adjust the partitions at run time according to processing time and network conditions. At the edge side, a vertical separation module separates feature maps into tiles that can be independently run on different edge nodes in parallel. Extensive quantitative evaluation of five popular DNNs illustrates that D3 outperforms the state-of-the-art counterparts up to 3.4 times in end-to-end DNN inference time and reduces backbone network communication overhead up to 3.68 times.
翻译:深神经网络( DNN) 在今天的数据处理应用程序中保持高性能。 DNN 的推断是资源密集型的,因此很难适应移动设备。 另一种办法是将 DNN 的推断卸载到云服务器。 但是, 这种方法需要移动设备和云服务器之间大量原始数据传输, 这不适合像自动驾驶这样的任务关键和隐私敏感的应用程序。 为了解决这个问题, 最近的进展利用边缘计算模式释放了 DNN 服务。 现有的方法将 DNN 分成两个部分, 并将两个分区用于计算两个边缘计算层的节点。 尽管如此, 这些方法忽略了合作设备- 顶层计算资源。 此外, 以前的算法要求整个 DNNN 重新分割以适应计算资源变化和网络动态。 此外, 对于资源需求热量和对隐私敏感度的图层, 之前的工程不会在边缘端端端端端端端点上出现平行的 DNNNF 配置系统, 而在端端端端端端端点上引入一个时间端端端端端的D, 将一个直径端分析模型的D, 直径端的D 方向的D 分解到直径对端分析, 。