QOS - 将深学习服务设置在边缘并采用多种服务实施 (QoS-Aware Placement of Deep Learning Services on the Edge with Multiple Service Implementations)

from arxiv, Accepted for publication through the 30th International Conference on Computer Communications and Networks (ICCCN 2021). This manuscript contains a complete proof of a theorem referenced in the ICCCN manuscript

Mobile edge computing pushes computationally-intensive services closer to the user to provide reduced delay due to physical proximity. This has led many to consider deploying deep learning models on the edge -- commonly known as edge intelligence (EI). EI services can have many model implementations that provide different QoS. For instance, one model can perform inference faster than another (thus reducing latency) while achieving less accuracy when evaluated. In this paper, we study joint service placement and model scheduling of EI services with the goal to maximize Quality-of-Servcice (QoS) for end users where EI services have multiple implementations to serve user requests, each with varying costs and QoS benefits. We cast the problem as an integer linear program and prove that it is NP-hard. We then prove the objective is equivalent to maximizing a monotone increasing, submodular set function and thus can be solved greedily while maintaining a (1-1/e)-approximation guarantee. We then propose two greedy algorithms: one that theoretically guarantees this approximation and another that empirically matches its performance with greater efficiency. Finally, we thoroughly evaluate the proposed algorithm for making placement and scheduling decisions in both synthetic and real-world scenarios against the optimal solution and some baselines. In the real-world case, we consider real machine learning models using the ImageNet 2012 data-set for requests. Our numerical experiments empirically show that our more efficient greedy algorithm is able to approximate the optimal solution with a 0.904 approximation on average, while the next closest baseline achieves a 0.607 approximation on average.

翻译：移动边缘计算将计算密集型服务推向更接近用户,以提供更接近物理的延迟。这导致许多人考虑在边缘部署深学习模型 -- -- 通常称为边缘智能(EI)。EI服务可以有许多提供不同QOS的模型执行。例如,一个模型可以比另一个模型更快地进行推断(从而减少胶合),同时在评估时实现更低的准确性。在本文件中,我们研究EI服务的联合服务安排和模型时间安排,目标是最大限度地提高服务质量。在终端用户中,EI服务有多种执行功能,为用户请求服务,每个都具有不同的成本和QOS效益。我们把问题作为一个整齐线性程序,证明它很难提供不同的QOS。例如,一个模型可以比另一个模型更快地进行推导出推算(从而减少胶合),同时在保持一个(1/e)达标比例保证的情况下,我们提出两种通俗的算算算法:一种在理论上保证这一接近率和另一个是使其业绩与更符合用户要求的用户要求,同时使用更精确的直径的直线性程序。最后,我们用一个模型来彻底地评估我们用一个最接近和最接近的模型来模拟的模型来显示真实的模型,我们最接近的模型,然后用一个最接近和最精确的模型来计算。我们最接近的推算出一个最接近的模型来显示我们最接近的模型来显示我们最接近和最接近的模型。