The recent breakthrough in artificial intelligence (AI), especially deep neural networks (DNNs), has affected every branch of science and technology. Particularly, edge AI has been envisioned as a major application scenario to provide DNN-based services at edge devices. This article presents effective methods for edge inference at resource-constrained devices. It focuses on device-edge co-inference, assisted by an edge computing server, and investigates a critical trade-off among the computation cost of the on-device model and the communication cost of forwarding the intermediate feature to the edge server. A three-step framework is proposed for the effective inference: (1) model split point selection to determine the on-device model, (2) communication-aware model compression to reduce the on-device computation and the resulting communication overhead simultaneously, and (3) task-oriented encoding of the intermediate feature to further reduce the communication overhead. Experiments demonstrate that our proposed framework achieves a better trade-off and significantly reduces the inference latency than baseline methods.
翻译:最近人工智能(AI)的突破,特别是深神经网络(DNNs)的突破,影响到了科学和技术的每一个分支,特别是,边缘AI被设想为在边缘装置上提供基于DNN服务的主要应用设想方案,这是在资源受限制装置上进行边缘推断的有效方法,重点是在边缘计算服务器的协助下进行装置-尖端共同推断,并调查在将中间特征传送到边缘服务器的装置模型的计算成本和通信成本之间的重大权衡。为了有效推断,建议了一个三步框架:(1) 确定设计模型的模型拆分点选择,(2) 通信意识模型压缩,以同时减少机上计算和由此产生的通信间接费用,(3) 以任务为导向的中间特征编码,以进一步减少通信间接费用。实验表明,我们提议的框架实现了更好的交易,大大降低了比基线方法的偏差。