With the emergence of edge computing, the problem of offloading jobs between an Edge Device (ED) and an Edge Server (ES) received significant attention in the past. Motivated by the fact that an increasing number of applications are using Machine Learning (ML) inference, we study the problem of offloading inference jobs by considering the following novel aspects: 1) in contrast to a typical computational job, the processing time of an inference job depends on the size of the ML model, and 2) recently proposed Deep Neural Networks (DNNs) for resource-constrained devices provide the choice of scaling the model size. We formulate an assignment problem with the aim of maximizing the total inference accuracy of n data samples available at the ED, subject to a time constraint T on the makespan. We propose an approximation algorithm AMR2, and prove that it results in a makespan at most 2T, and achieves a total accuracy that is lower by a small constant from optimal total accuracy. As proof of concept, we implemented AMR2 on a Raspberry Pi, equipped with MobileNet, and is connected to a server equipped with ResNet, and studied the total accuracy and makespan performance of AMR2 for image classification application.
翻译:随着边缘计算的出现,边缘设备(ED)和边缘服务器(ES)之间的卸载工作问题在过去曾受到极大关注,由于越来越多的应用程序正在使用机器学习(ML)的推论,我们研究卸载推论工作的问题,方法是考虑以下新的方面:(1) 与典型的计算工作相比,推论工作的处理时间取决于ML模型的大小,(2) 资源限制的装置最近提议的深神经网络(DNNS)提供了缩放模型大小的选择。我们设计了任务分配问题,目的是尽量扩大在ED上提供的N数据样本的完全推断准确性,但需在Meampan上有时间限制。我们建议了近似算算法AMR2,并证明它的结果是最多为2T,总精度比最精确的恒定值低。作为概念的证明,我们在安装移动网络的Raspberry Pi上安装了AMRMR2, 并与安装了AMR2服务器的性能分类和图像应用连接。