Deep neural networks (DNNs) have been widely used in various video analytic tasks. These tasks demand real-time responses. Due to the limited processing power on mobile devices, a common way to support such real-time analytics is to offload the processing to an edge server. This paper examines how to speed up the edge server DNN processing for multiple clients. In particular, we observe batching multiple DNN requests significantly speeds up the processing time. Based on this observation, we first design a novel scheduling algorithm to exploit the batching benefits of all requests that run the same DNN. This is compelling since there are only a handful of DNNs and many requests tend to use the same DNN. Our algorithms are general and can support different objectives, such as minimizing the completion time or maximizing the on-time ratio. We then extend our algorithm to handle requests that use different DNNs with or without shared layers. Finally, we develop a collaborative approach to further improve performance by adaptively processing some of the requests or portions of the requests locally at the clients. This is especially useful when the network and/or server is congested. Our implementation shows the effectiveness of our approach under different request distributions (e.g., Poisson, Pareto, and Constant inter-arrivals).
翻译:深度神经网络(DNN)已被广泛用于各种视频分析任务。这些任务要求实时响应。由于移动设备的处理能力有限,支持这种实时分析的常见方法是将处理外包到边缘服务器。本文研究如何加速边缘服务器上多个客户端的DNN处理。特别是,我们观察到批处理多个DNN请求显着加速了处理时间。基于此观察,我们首先设计了一种新的调度算法,以利用运行相同DNN的所有请求的批处理优势。这是令人信服的,因为只有少数DNN,许多请求倾向于使用相同的DNN。我们的算法是通用的,可以支持不同的目标,如最小化完成时间或最大化准时比率。然后,我们扩展了我们的算法,以处理使用具有或没有共享层的不同DNN的请求。最后,我们开发了一种协同方法,通过自适应地在客户端本地处理一些请求或部分请求来进一步提高性能。当网络和/或服务器拥塞时,这是特别有用的。我们的实现在不同的请求分布(例如Poisson,Pareto和Constant inter-arrivals)下显示了我们方法的有效性。