There has been much interest in deploying deep learning algorithms on low-powered devices, including smartphones, drones, and medical sensors. However, full-scale deep neural networks are often too resource-intensive in terms of energy and storage. As a result, the bulk part of the machine learning operation is therefore often carried out on an edge server, where the data is compressed and transmitted. However, compressing data (such as images) leads to transmitting information irrelevant to the supervised task. Another popular approach is to split the deep network between the device and the server while compressing intermediate features. To date, however, such split computing strategies have barely outperformed the aforementioned naive data compression baselines due to their inefficient approaches to feature compression. This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently. Our supervised compression approach uses a teacher model and a student model with a stochastic bottleneck and learnable prior for entropy coding (Entropic Student). We compare our approach to various neural image and feature compression baselines in three vision tasks and found that it achieves better supervised rate-distortion performance while maintaining smaller end-to-end latency. We furthermore show that the learned feature representations can be tuned to serve multiple downstream tasks.
翻译:人们对在低功率设备,包括智能手机、无人机和医疗传感器上部署深学习算法非常感兴趣,但是,在能量和储存方面,全面深度神经网络往往资源过于密集,因此,机器学习作业的大部分往往在边缘服务器上进行,数据被压缩和传输。然而,压缩数据(如图像)导致传递与监督任务无关的信息。另一种流行的做法是在压缩中间功能的同时,将设备与服务器的深网络分割开来。然而,迄今为止,这种分解计算战略由于对压缩功能采用效率低的方法,几乎没有达到上述天真的数据压缩基线。本文采用了知识蒸馏和神经图像压缩的想法,以便更有效率地压缩中间特征演示。我们监督的压缩方法使用了教师模型和学生模型,该模型具有直观的瓶颈,并且可以在对加密前学习。我们比较了三种视觉任务中的各种神经图像和特征压缩基线,发现它能够实现更好的受监督的温度调整基线,同时能够更精确地显示多度的下游功能。