DyLoRA: 使用动态搜索自由低秩自适应调整预训练模型的参数效率 (DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation) - 专知论文

会员服务 ·

0

LORA · 低秩 · 预训练模型 · 预训练 · 搜索 ·

2023 年 4 月 19 日

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

翻译：DyLoRA: 使用动态搜索自由低秩自适应调整预训练模型的参数效率

Mojtaba Valipour,Mehdi Rezagholizadeh,Ivan Kobyzev,Ali Ghodsi

from arxiv, Accepted to EACL 2023

With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training (for example, if we need to change the rank of LoRA blocks, then we need to re-train them from scratch); second, optimizing their rank requires an exhaustive search and effort. In this work, we introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together. Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training. We evaluate our solution on different natural language understanding (GLUE benchmark) and language generation tasks (E2E, DART and WebNLG) using different pretrained models such as RoBERTa and GPT with different sizes. Our results show that we can train dynamic search-free models with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA without significantly compromising performance. Moreover, our models can perform consistently well on a much larger range of ranks compared to LoRA.

翻译：随着预训练模型（PM）的不断增大，对其进行微调变得更加昂贵和资源密集。为了解决这个问题，低秩适配器（LoRA）将模型的主要预训练权重保持冻结，并引入一些可学习的截断SVD模块（称为LoRA块）到模型中。虽然LoRA块是参数高效的，但它们存在两个主要问题：首先，这些块的大小是固定的，不能在训练后进行修改（例如，如果我们需要更改LoRA块的秩，则需要从头开始重新训练它们）；其次，优化它们的秩需要耗费大量的搜索和努力。在这项工作中，我们引入了一种动态低秩适应（DyLoRA）技术，以解决这两个问题。我们的DyLoRA方法通过在训练期间排序不同秩的适配器模块学习的表示而不是单个秩来训练LoRA块。我们使用不同大小的预训练模型（如RoBERTa和GPT）在不同的自然语言理解（GLUE基准）和语言生成任务（E2E，DART和WebNLG）上评估我们的解决方案。我们的结果表明，我们可以使用DyLoRA训练动态无搜索模型，比LoRA快4到7倍（取决于任务），而不会显着牺牲性能。此外，与LoRA相比，我们的模型可以在更大的秩范围内表现出稳定的性能。

0

相关内容

LORA

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

从此告别繁琐的模型微调，LLM-Adapters助力NLP任务快速高效微调！

从此告别繁琐的模型微调，LLM-Adapters助力NLP任务快速高效微调！

PaperWeekly

2+阅读 · 2023年4月6日

Ladder Side-Tuning：预训练模型的“过墙梯”

Ladder Side-Tuning：预训练模型的“过墙梯”

PaperWeekly

0+阅读 · 2022年6月24日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

华为诺亚方舟预训练语言模型NEZHA、TinyBERT开源代码

华为诺亚方舟预训练语言模型NEZHA、TinyBERT开源代码

专知

17+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

基于Lowrank分解的谱方法和有限差分地震正演模拟

国家自然科学基金

0+阅读 · 2015年12月31日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

硫族半导体纳米晶/MOFs复合材料的制备及光解水制备氢气性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

不同生理载荷下国人腰椎间盘瞬时三维运动和形变的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

高效长寿命量子存储

国家自然科学基金

0+阅读 · 2013年12月31日

韧性城市卫生健康领域适应气候变化评价方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

半自适应预测控制系统设计与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

线性系统的多目标参数不敏感控制器设计

国家自然科学基金

0+阅读 · 2009年12月31日

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

Arxiv

0+阅读 · 2023年6月5日

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Arxiv

0+阅读 · 2023年6月5日

Fast and high-order approximation of parabolic equations using hierarchical direct solvers and implicit Runge-Kutta methods

Arxiv

0+阅读 · 2023年6月5日

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

Arxiv

0+阅读 · 2023年6月5日

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

Arxiv

0+阅读 · 2023年6月2日

When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods

Arxiv

0+阅读 · 2023年6月2日

EEL: Efficiently Encoding Lattices for Reranking

Arxiv

0+阅读 · 2023年6月1日

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Arxiv

0+阅读 · 2023年6月1日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

VIP会员

文章信息

相关主题

预训练模型

相关VIP内容

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

【Hugging Face】指导文本生成与约束波束搜索🤗Transformers，Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

专知会员服务

22+阅读 · 2022年3月18日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

【华为-诺亚实验室】动态BERT, Dynamic BERT with Adaptive Width and Depth

专知会员服务

24+阅读 · 2020年4月13日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

从此告别繁琐的模型微调，LLM-Adapters助力NLP任务快速高效微调！

从此告别繁琐的模型微调，LLM-Adapters助力NLP任务快速高效微调！

PaperWeekly

2+阅读 · 2023年4月6日

Ladder Side-Tuning：预训练模型的“过墙梯”

Ladder Side-Tuning：预训练模型的“过墙梯”

PaperWeekly

0+阅读 · 2022年6月24日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

华为诺亚方舟预训练语言模型NEZHA、TinyBERT开源代码

华为诺亚方舟预训练语言模型NEZHA、TinyBERT开源代码

专知

17+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

相关论文

HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE

Arxiv

0+阅读 · 2023年6月5日

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Arxiv

0+阅读 · 2023年6月5日

Fast and high-order approximation of parabolic equations using hierarchical direct solvers and implicit Runge-Kutta methods

Arxiv

0+阅读 · 2023年6月5日

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

Arxiv

0+阅读 · 2023年6月5日

Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks

Arxiv

0+阅读 · 2023年6月2日

When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods

Arxiv

0+阅读 · 2023年6月2日

EEL: Efficiently Encoding Lattices for Reranking

Arxiv

0+阅读 · 2023年6月1日

Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning

Arxiv

0+阅读 · 2023年6月1日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

相关基金

基于Lowrank分解的谱方法和有限差分地震正演模拟

国家自然科学基金

0+阅读 · 2015年12月31日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

硫族半导体纳米晶/MOFs复合材料的制备及光解水制备氢气性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

不同生理载荷下国人腰椎间盘瞬时三维运动和形变的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

高效长寿命量子存储

国家自然科学基金

0+阅读 · 2013年12月31日

韧性城市卫生健康领域适应气候变化评价方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

半自适应预测控制系统设计与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

线性系统的多目标参数不敏感控制器设计

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员