Large Language Models have become the core architecture upon which most modern natural language processing (NLP) systems build. These models can consistently deliver impressive accuracy and robustness across tasks and domains, but their high computational overhead can make inference difficult and expensive. To make using these models less costly, recent work has explored leveraging structured and unstructured pruning, quantization, and distillation to improve inference speed and decrease size. This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks. Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches. We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text. Moreover, we show that SparseBioBERT can match the quality of BioBERT with only 10\% of the parameters.
翻译:大型语言模型已成为大多数现代自然语言处理(NLP)系统基于的核心架构。这些模型在任务和领域中能够持续地提供令人印象深刻的准确性和稳健性,但是它们高昂的计算开销可能使得推理变得困难和昂贵。为了降低使用这些模型的成本,最近的研究探索了使用结构化和非结构化剪枝、量化和蒸馏来提高推理速度并减小模型的大小。本文研究了使用渐进非结构化幅度剪枝来传递不同领域和任务的模型。我们的实验表明,预训练时使用通用领域掩码语言模型对模型进行剪枝,即可在未经广泛超参数探索或专门方法的情况下传递到新领域和任务。我们展示了我们的通用稀疏模型Sparse*BERT可以通过在非结构化的生物医学文本上预训练压缩架构来成为SparseBioBERT。此外,我们还展示了SparseBioBERT只需10%的参数就能达到BioBERT的质量。