We revisit the performance of the classic gradual magnitude pruning (GMP) baseline for large language models, focusing on the classic BERT benchmark on various popular tasks. Despite existing evidence in the literature that GMP performs poorly, we show that a simple and general variant, which we call GMP*, can match and sometimes outperform more complex state-of-the-art methods. Our results provide a simple yet strong baseline for future work, highlight the importance of parameter tuning for baselines, and even improve the performance of the state-of-the-art second-order pruning method in this setting.
翻译:我们重新审视了大型语言模型经典渐进规模裁剪基准(GMP)的绩效,重点是典型的BERT基准(各种流行任务 ) 。 尽管文献中现有证据表明GMP表现不佳,但我们发现,我们称之为GMP* ( GMP* ) 的简单而一般的变体可以匹配,有时甚至优于更复杂的最先进方法。 我们的结果为未来工作提供了一个简单而有力的基准,强调了参数调整对基线的重要性,甚至改进了在这一环境下最先进的二级裁剪方法的性能。