In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device, and the remainder of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer, while having a low complexity suitable for edge devices and not requiring any retraining. We also present a modified entropy-constrained quantizer design algorithm optimized for clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding a layer's activations in split neural networks for edge/cloud applications.
翻译:在合作情报应用中,深神经网络的一部分(DNN)部署在相对较低的复杂装置上,如移动电话或边缘装置,DNN的其余部分在有较多计算资源的地方(如云中)被处理。本文介绍了一种新型的轻量压缩技术,专门用来编码分裂的DNN层的激活,同时具有适合边缘装置的低复杂性,而不需要再培训。我们还提供了一种经改进的微量受限制的量子设计算法,优化用于剪接的激活。当应用到受欢迎的物体探测和分类 DNNS时,我们能够将32位浮动点的激活压缩到0.6至0.8位,同时将损失的精确度控制在不到1%。与HEVC相比,我们发现轻量码始终提供了更好的推断准确性,最高为1.3%。这种轻量级压缩技术的性能和简单性能使得在边缘/宽度应用的分神经网络中将一个层的激活编码具有吸引力。