Many real-time applications (e.g., Augmented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) to process inference tasks. Edge computing is considered a key infrastructure to deploy such applications, as moving computation close to the data sources enables us to meet stringent latency and throughput requirements. However, the constrained nature of edge networks poses several additional challenges to the management of inference workloads: edge clusters can not provide unlimited processing power to DNN models, and often a trade-off between network and processing time should be considered when it comes to end-to-end delay requirements. In this paper, we focus on the problem of scheduling inference queries on DNN models in edge networks at short timescales (i.e., few milliseconds). By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP, highlighting the need for a dynamic scheduling policy that can adapt to network conditions and workloads. We therefore design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions. Our results show that ASET effectively provides the best performance compared to static policies when scheduling over a distributed pool of edge resources.
翻译:许多实时应用程序(例如,增强/虚拟现实、认知援助)依赖深神经网络(DNN)处理推理任务。 边缘计算被视为部署此类应用程序的关键基础设施,因为将计算方法移近数据源,使我们能够满足严格的悬浮和吞吐量要求。然而,边缘网络的有限性质给推理工作量的管理带来了若干额外挑战:边缘集群不能为DNN模型提供无限的处理能力,在终端到终端延迟要求时,常常应考虑网络和处理时间之间的权衡。在本文件中,我们侧重于在短时间尺度(即几毫秒)将边缘网络DNN模型的推论查询安排在边缘网络上的问题。通过模拟,我们分析了大型ISP现实的网络设置和工作量方面的若干政策,强调需要有能够适应网络条件和工作量的动态排期政策。我们因此设计了ASET,一种基于强化的时间安排法,能够根据系统边际条件调整其决定。我们的成果显示,相对于静态资源分布时,AST政策可以有效地提供相对于静态的进度。