Lightweight neural networks exchange fast inference for predictive strength. Conversely, large deep neural networks have low prediction error but incur prolonged inference times and high energy consumption on resource-constrained devices. This trade-off is unacceptable for latency-sensitive and performance-critical applications. Offloading inference tasks to a server is unsatisfactory due to the inevitable network congestion by high-dimensional data competing for limited bandwidth and leaving valuable client-side resources idle. This work demonstrates why existing methods cannot adequately address the need for high-performance inference in mobile edge computing. Then, we show how to overcome current limitations by introducing a novel training method to reduce bandwidth consumption in Machine-to-Machine communication and a generalizable design heuristic for resource-conscious compression models. We extensively evaluate our proposed method against a wide range of baselines for latency and compressive strength in an environment with asymmetric resource distribution between edge devices and servers. Despite our edge-oriented lightweight encoder, our method achieves considerably better compression rates.
翻译:轻量度神经网络交换预测强度的快速推论。 相反, 大型深神经网络的预测误差低, 但却造成长期的推论时间和资源限制装置的能源消耗量高。 这种权衡对于潜伏敏感度和性能关键应用来说是不可接受的。 向服务器卸载推论任务不能令人满意, 原因是高维数据争夺有限带宽,使宝贵的客户方资源闲置, 从而不可避免地造成网络拥挤。 这项工作表明为什么现有方法不能充分满足移动边缘计算中高性能推论的需要。 然后, 我们展示了如何克服目前的局限性, 采用新颖的培训方法, 以减少机器到海洋通信中的带宽消耗量, 以及资源意识压缩模型的通用设计超高。 我们根据在边缘装置和服务器之间分布不均匀资源的环境中的宽度和压缩强度的基线, 对拟议方法进行了广泛的评估。 尽管我们的边缘轻量计算器, 我们的方法还是取得了更好的压缩率。