Adapting pre-trained neural models to downstream tasks has become the standard practice for obtaining high-quality models. In this work, we propose a novel model adaptation paradigm, adapting by pruning, which prunes neural connections in the pre-trained model to optimise the performance on the target task; all remaining connections have their weights intact. We formulate adapting-by-pruning as an optimisation problem with a differentiable loss and propose an efficient algorithm to prune the model. We prove that the algorithm is near-optimal under standard assumptions and apply the algorithm to adapt BERT to some GLUE tasks. Results suggest that our method can prune up to 50% weights in BERT while yielding similar performance compared to the fine-tuned full model. We also compare our method with other state-of-the-art pruning methods and study the topological differences of their obtained sub-networks.
翻译:将经过培训的神经模型适应到下游任务已成为获得高质量模型的标准做法。 在这项工作中,我们提出一种新的模型适应模式,通过修剪加以调整,在经过培训的模型中将神经连接用于优化目标任务的业绩;所有剩余连接的重量均保持不变。我们将逐个调试作为最佳化问题,有不同的损失,并提出一种有效的算法来利用模型。我们证明,在标准假设下,算法是接近最佳的,并应用算法使BERT适应一些GLUE任务。结果表明,我们的方法可以使BERT达到50%的重量,同时取得与经过精细调整的完整模型相似的性能。我们还将我们的方法与其他最先进的调试方法进行比较,并研究获得的子网络的地形差异。