Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload inference tasks to an edge server with GPU. The inference task is partitioned into sub-tasks for a finer granularity of offloading and scheduling, and the user energy consumption minimization problem under inference latency constraints is investigated. To deal with the coupled offloading and scheduling introduced by concurrent batch processing, we first consider an offline problem with a constant edge inference latency and the same latency constraint. It is proven that optimizing the offloading policy of each user independently and aggregating all the same sub-tasks in one batch is optimal, and thus the independent partitioning and same sub-task aggregating (IP-SSA) algorithm is inspired. Further, the optimal grouping (OG) algorithm is proposed to optimally group tasks when the latency constraints are different. Finally, when future task arrivals cannot be precisely predicted, a deep deterministic policy gradient (DDPG) agent is trained to call OG. Experiments show that IP-SSA reduces up to 94.9\% user energy consumption in the offline setting, while DDPG-OG outperforms DDPG-IP-SSA by up to 8.92\% in the online setting.
翻译:图形处理器(GPUs)可以通过批量处理,同时处理多种任务,改进深神经网络推导分量,通过批量处理改善深度神经网络推算。我们侧重于能源限制的移动装置卸载推论任务到使用GPU的边缘服务器的新设想。推论任务被分割成子任务,用于细微的卸载和排期颗粒化,并调查在延迟度限制下用户的能源消耗最小化问题。为了处理同时处理批量处理带来的同时卸载和排期问题,我们首先考虑一个离线问题,即不断有边缘推力的悬浮度和相同的延度限制。我们证明,优化每个用户独立和将所有相同的子任务集中到一个批量的边缘服务器的卸载政策是最佳的,从而激发了独立隔断和子串联(IP-SSA)的算法。此外,在延迟度限制时,我们提议最佳组合(OG)算法在最佳组任务抵达时,在无法准确预测的情况下,将未来任务到达时,将深度阻断性政策梯度延度和相同的延度限制。 8. 证明每个用户的DPG-DG-DG-DG-DG-DG-DG-DG-DG-D-DASims 设置离SAP-D-D-D-SAS-SAS-D-D-D-SAS-D-D-D-SAS-D-D-D-SA-A-D-SAS-A-D-A-A-A-A-SMA-S-SIM调调调调调调调。