Large Language Models have become the core architecture upon which most modern natural language processing (NLP) systems build. These models can consistently deliver impressive accuracy and robustness across tasks and domains, but their high computational overhead can make inference difficult and expensive. To make using these models less costly, recent work has explored leveraging structured and unstructured pruning, quantization, and distillation to improve inference speed and decrease size. This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks. Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches. We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text. Moreover, we show that SparseBioBERT can match the quality of BioBERT with only 10\% of the parameters.
翻译:大型语言模型已成为大多数现代自然语言处理(NLP)系统构建的核心架构。这些模型可以始终在各种任务和领域中提供令人印象深刻的准确性和鲁棒性,但其高计算开销可能使推理变得困难和昂贵。为了使使用这些模型更加经济, 最近的工作探索了利用结构化和非结构化的剪枝,量化和蒸馏来提高推理速度并减小大小。本文研究了如何在不同领域和任务之间转移使用渐进非结构化幅度剪枝的模型。我们的实验表明,使用广泛领域蒙面语言模型在预训练期间修剪的模型可以在没有专门方法或广泛的超参数探索的情况下转移到新领域和任务。我们证明了我们的通用稀疏模型Sparse * BERT可以通过在非结构化生物医学文本上预训练压缩的体系结构而成为SparseBioBERT。此外,我们证明SparseBioBERT仅使用10%的参数即可达到BioBERT的质量。