With the recent demand of deploying neural network models on mobile and edge devices, it is desired to improve the model's generalizability on unseen testing data, as well as enhance the model's robustness under fixed-point quantization for efficient deployment. Minimizing the training loss, however, provides few guarantees on the generalization and quantization performance. In this work, we fulfill the need of improving generalization and quantization performance simultaneously by theoretically unifying them under the framework of improving the model's robustness against bounded weight perturbation and minimizing the eigenvalues of the Hessian matrix with respect to model weights. We therefore propose HERO, a Hessian-enhanced robust optimization method, to minimize the Hessian eigenvalues through a gradient-based training process, simultaneously improving the generalization and quantization performance. HERO enables up to a 3.8% gain on test accuracy, up to 30% higher accuracy under 80% training label perturbation, and the best post-training quantization accuracy across a wide range of precision, including a >10% accuracy improvement over SGD-trained models for common model architectures on various datasets.
翻译:由于最近需要同时在移动和边缘装置上部署神经网络模型,因此希望改进该模型在隐蔽测试数据上的通用性,并提高该模型在固定点的稳健性能,以便有效部署。不过,尽量减少培训损失对通用性能和定量性能几乎没有什么保障。在这项工作中,我们满足了改进一般化和定量性能的需要,在理论上将其统一在改进模型的稳健性以对抗捆绑重量的扰动性能的框架之下,最大限度地减少赫森矩阵在模型重量方面的密封性值。因此,我们建议HERO,一种赫森加固的稳健性优化方法,通过基于梯度的培训进程尽量减少赫森电子基因值,同时改进一般化和定量性能。HERO在测试精度方面可以达到3.8%的收益,在80%的训练标签下达到30%的更高精度,并在广泛的精确度范围内,包括 >-10%的精确度模型上,在各种共同的SGDA模型上改进了10%的精确度。