When deploying pre-trained neural network models in real-world applications, model consumers often encounter resource-constraint platforms such as mobile and smart devices. They typically use the pruning technique to reduce the size and complexity of the model, generating a lighter one with less resource consumption. Nonetheless, most existing pruning methods are proposed with the premise that the model after being pruned has a chance to be fine-tuned or even retrained based on the original training data. This may be unrealistic in practice, as the data controllers are often reluctant to provide their model consumers with the original data. In this work, we study the neural network pruning in the data-free context, aiming to yield lightweight models that are not only accurate in prediction but also robust against undesired inputs in open-world deployments. Considering the absence of the fine-tuning and retraining that can fix the mis-pruned units, we replace the traditional aggressive one-shot strategy with a conservative one that treats the pruning as a progressive process. We propose a pruning method based on stochastic optimization that uses robustness-related metrics to guide the pruning process. Our method is implemented as a Python program and evaluated with a series of experiments on diverse neural network models. The experimental results show that it significantly outperforms existing one-shot data-free pruning approaches in terms of robustness preservation and accuracy.
翻译:在现实世界应用中,模型消费者常常遇到像移动和智能设备这样的资源约束平台,以降低模型的大小和复杂性,从而产生一种较轻的、资源消耗较少的模型。然而,大多数现有的运行方法都是在以下前提下提出的:在经过加工后,模型有可能根据原始培训数据进行微调或甚至再培训。这在实践中可能不切实际,因为数据控制者往往不愿意向模型消费者提供原始数据。在这项工作中,我们研究在无数据环境下运行的神经网络,目的是产生轻量级模型,这些模型不仅在预测中准确,而且对开放世界部署中不理想的投入也很有力。考虑到没有进行微调和再培训,以纠正错误的单元,我们用一种保守的一发式战略取代这种传统的一发战略,将原始运行视为一个渐进的过程。我们提出一种基于随机调整的方法,利用稳健的准确性相关度测试,旨在产生轻量的模型,不仅在预测中准确性,而且针对开放世界部署中不理想的投入。考虑到没有进行微调的调整和再培训,因此,我们用一种方法来大大地指导现有实验模式的实验结果。