The increasing size of language models raises great research interests in parameter-efficient fine-tuning (e.g. Adapter, LoRA and prompt tuning) that freezes the pre-trained model, and injects small-scale trainable parameters for multiple downstream tasks. To further enhance the efficiency of fine-tuning, we propose a framework that integrates LoRA and structured layer pruning. In addition, based on MIMIC-IV-Note, we create two deidentified medical report summarization datasets. Further, We validate the integrated framework on the proposed two datasets and two medical dialogue datasets. By tuning 0.6% parameters of the original model and pruning over 30% Transformer-layers, the framework can speed up 100% of the training phase and reduce 50% of GPU memory usage, while preserving over 92% generation qualities on free-text sequence-to-sequence tasks.
翻译:暂无翻译