Neural network pruning is a popular technique used to reduce the inference costs of modern, potentially overparameterized, networks. Starting from a pre-trained network, the process is as follows: remove redundant parameters, retrain, and repeat while maintaining the same test accuracy. The result is a model that is a fraction of the size of the original with comparable predictive performance (test accuracy). Here, we reassess and evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well across a wide spectrum of "harder" metrics such as generalization to out-of-distribution data and resilience to noise. Across evaluations on varying architectures and data sets, we find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks. These results call into question the extent of \emph{genuine} overparameterization in deep learning and raise concerns about the practicability of deploying pruned networks, specifically in the context of safety-critical systems, unless they are widely evaluated beyond test accuracy to reliably predict their performance. Our code is available at https://github.com/lucaslie/torchprune.
翻译:神经网络运行是一种常用的技术,用于降低现代的、可能过度光化的网络的推断成本。从预先培训的网络开始,这一过程如下:删除冗余参数、再培训和重复,同时保持同样的测试准确性。结果是原始规模的一小部分模型,具有可比较的预测性能(测试准确性)。在这里,我们重新评估和评价单在终止条件中使用测试准确性是否足以确保由此产生的模型在广泛的“硬度”指标中运行良好,如向外分布数据和对噪音的复原力等。从对不同结构和数据集的跨度评价开始,我们发现经调整的网络有效地接近未调整的模型,然而,经调整的网络取得相应性能的纯度比率在各个任务之间差异很大。这些结果使人们质疑深层学习中是否过度使用测试性能,并引起对部署经运行的网络的可行性的关切,特别是在安全临界系统中,除非对测试性能进行广泛的评估,以可靠地预测其性能。