The dominant paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, conventional fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example, deploying many independent instances of fine-tuned models, each with 175B parameters, is extremely expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning. LoRA performs on-par or better than fine-tuning in model quality on both GPT-3 and GPT-2, despite having fewer trainable parameters, a higher training throughput, and no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptations, which sheds light on the efficacy of LoRA. We release our implementation in GPT-2 at https://github.com/microsoft/LoRA .
翻译:自然语言处理的主要范例包括:对通用域数据进行大规模预先培训,并适应特定任务或领域。随着我们预先培训大型模型,将所有模型参数都重复的常规微调变得不那么可行。以GPT-3 175B为例,部署许多独立的微调模型实例,每个模型都有175B参数,费用极高。我们提议低Rank适应,即LoRA,冻结经过预先培训的模型重量和可训练的排位分解到变异结构的每一层,大大减少下游任务可训练参数的数量。对于GPT-3, LoRA可以将可训练参数减少10 000倍,计算硬件要求减少3倍,而全面微调则比较。LORA在GPT-3和GPT-2上,尽管培训参数较少,培训量和增加的精度分级分解矩阵。我们还对语言模型调整中的品位偏差进行了实证调查,这些参数使GPT/MRA的效能暴露。