Model compression is a ubiquitous tool that brings the power of modern deep learning to edge devices with power and latency constraints. The goal of model compression is to take a large reference neural network and output a smaller and less expensive compressed network that is functionally equivalent to the reference. Compression typically involves pruning and/or quantization, followed by re-training to maintain the reference accuracy. However, it has been observed that compression can lead to a considerable mismatch in the labels produced by the reference and the compressed models, resulting in bias and unreliability. To combat this, we present a framework that uses a teacher-student learning paradigm to better preserve labels. We investigate the role of additional terms to the loss function and show how to automatically tune the associated parameters. We demonstrate the effectiveness of our approach both quantitatively and qualitatively on multiple compression schemes and accuracy recovery algorithms using a set of 8 different real-world network architectures. We obtain a significant reduction of up to 4.1X in the number of mismatches between the compressed and reference models, and up to 5.7X in cases where the reference model makes the correct prediction.
翻译:模型压缩是一种无处不在的工具,它使现代深层学习的力量带到有力量和延缓力限制的边缘装置。模型压缩的目标是采用大型参考神经网络,输出一个与参考功能等同的较小和不太昂贵的压缩网络,压缩通常涉及剪裁和/或量化,然后进行再培训以保持参考准确性。然而,据观察,压缩可能导致参考和压缩模型所制作标签的巨大不匹配,从而导致偏差和不可靠性。为解决这一问题,我们提出了一个框架,利用师生学习模式更好地保存标签。我们调查了损失函数中额外术语的作用,并展示了如何自动调整相关参数。我们用8个不同的真实世界网络结构,从数量和质量上展示了多种压缩计划和精确恢复算法的实效。我们在参考模型作出正确预测的情况下,将压缩和参考模型之间的不匹配数量大幅降至4.1X。