经过参数高效设计的模型架构扩展预训练语言模型深度 (Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture) - 专知论文

会员服务 ·

0

参数高效 · 模型架构 · 预训练语言模型 · 语言模型 · 分解 ·

2023 年 4 月 11 日

Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

翻译：经过参数高效设计的模型架构扩展预训练语言模型深度

Peiyu Liu,Ze-Feng Gao,Yushuo Chen,Wayne Xin Zhao,Ji-Rong Wen

from arxiv, 14 pages, 4 figures, 6 tables

In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable parameter-sharing architecture based on matrix product operator (MPO). MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts: the major part that contains the major information (central tensor) and the supplementary part that only has a small proportion of parameters (auxiliary tensors). Based on such a decomposition, our architecture shares the central tensor across all layers for reducing the model size and meanwhile keeps layer-specific auxiliary tensors (also using adapters) for enhancing the adaptation flexibility. To improve the model training, we further propose a stable initialization algorithm tailored for the MPO-based architecture. Extensive experiments have demonstrated the effectiveness of our proposed model in reducing the model size and achieving highly competitive performance.

翻译：本文提出一种高度参数高效的方法将预训练语言模型（PLM）扩展到更深的模型深度。与以前通过共享所有参数或使用额外块的方法不同，我们设计了一种更具能力的基于矩阵乘积算子（MPO）的参数共享架构。MPO分解可以将参数矩阵的信息重新组织和因子化为两部分：包含主要信息（中心张量）的主要部分和只有少部分参数的辅助张量的补充部分。在这种分解的基础上，我们的架构通过所有层共享中心张量以减少模型大小，同时保持具有适应性的特定于每一层的辅助张量（还要使用适配器）以增强适应性的灵活性。为了改善模型训练，我们进一步提出了一种针对基于MPO的架构量身定制的稳定初始化算法。大量实验已经证明了我们提出的模型在减少模型大小和实现高度竞争性能方面的有效性。

0

相关内容

参数高效

【ICML2023】调整语言模型作为增强少样本学习的训练数据生成器

【ICML2023】调整语言模型作为增强少样本学习的训练数据生成器

专知会员服务

32+阅读 · 2023年5月19日

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

专知会员服务

14+阅读 · 2022年10月5日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

【AAAI2022】基于双流更新的视觉Transformer动态加速方法

【AAAI2022】基于双流更新的视觉Transformer动态加速方法

专知会员服务

24+阅读 · 2021年12月11日

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

专知会员服务

94+阅读 · 2020年4月13日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

NeurlPS 2022 | 全新大模型参数高效微调方法：仅需训练0.3M的参数

NeurlPS 2022 | 全新大模型参数高效微调方法：仅需训练0.3M的参数

PaperWeekly

0+阅读 · 2022年11月9日

ECCV 2022 | 在视觉Transformer上进行递归，不增参数，计算量还少

ECCV 2022 | 在视觉Transformer上进行递归，不增参数，计算量还少

机器之心

0+阅读 · 2022年7月28日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

非局部总变差正则化图像恢复模型的快速子空间校正算法

国家自然科学基金

0+阅读 · 2014年12月31日

花生四烯酸P450表氧化酶代谢物-EETs通过抑制肝脏炎症及刺激肝脏神经信号调控胰腺β细胞增殖及功能

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素通过ERα介导lncRNA 1200076调节卵巢ERα（+）细胞生物学行为

国家自然科学基金

0+阅读 · 2012年12月31日

西沙群岛四种海绵新颖结构抗肿瘤活性成分的发现研究

国家自然科学基金

0+阅读 · 2012年12月31日

BCR/ABL-HDAC双靶点抑制剂的设计、合成及生物活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

颗粒材料中的偶应力效应及Cosserat介质本构模拟研究

国家自然科学基金

0+阅读 · 2011年12月31日

物联网轻量级健壮安全中的关键问题研究

国家自然科学基金

1+阅读 · 2011年12月31日

一类necroptosis诱导剂抗肿瘤干细胞的研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型选择性CDKs抑制剂的设计、合成与生物活性研究

国家自然科学基金

0+阅读 · 2009年12月31日

SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution

Arxiv

0+阅读 · 2023年5月26日

Parameter-Efficient Fine-Tuning without Introducing New Latency

Arxiv

0+阅读 · 2023年5月26日

Can Language Models Be Specific? How?

Arxiv

0+阅读 · 2023年5月26日

Scaling Data-Constrained Language Models

Arxiv

0+阅读 · 2023年5月25日

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Arxiv

0+阅读 · 2023年5月25日

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

Arxiv

0+阅读 · 2023年5月25日

pNLP-Mixer: an Efficient all-MLP Architecture for Language

Arxiv

0+阅读 · 2023年5月25日

Mixture-of-Expert Conformer for Streaming Multilingual ASR

Arxiv

0+阅读 · 2023年5月25日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

215+阅读 · 2023年4月7日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

VIP会员

文章信息

相关主题

预训练语言模型

相关VIP内容

【ICML2023】调整语言模型作为增强少样本学习的训练数据生成器

【ICML2023】调整语言模型作为增强少样本学习的训练数据生成器

专知会员服务

32+阅读 · 2023年5月19日

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

专知会员服务

14+阅读 · 2022年10月5日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

【AAAI2022】基于双流更新的视觉Transformer动态加速方法

【AAAI2022】基于双流更新的视觉Transformer动态加速方法

专知会员服务

24+阅读 · 2021年12月11日

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

专知会员服务

94+阅读 · 2020年4月13日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

NeurlPS 2022 | 全新大模型参数高效微调方法：仅需训练0.3M的参数

NeurlPS 2022 | 全新大模型参数高效微调方法：仅需训练0.3M的参数

PaperWeekly

0+阅读 · 2022年11月9日

ECCV 2022 | 在视觉Transformer上进行递归，不增参数，计算量还少

ECCV 2022 | 在视觉Transformer上进行递归，不增参数，计算量还少

机器之心

0+阅读 · 2022年7月28日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution

Arxiv

0+阅读 · 2023年5月26日

Parameter-Efficient Fine-Tuning without Introducing New Latency

Arxiv

0+阅读 · 2023年5月26日

Can Language Models Be Specific? How?

Arxiv

0+阅读 · 2023年5月26日

Scaling Data-Constrained Language Models

Arxiv

0+阅读 · 2023年5月25日

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Arxiv

0+阅读 · 2023年5月25日

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

Arxiv

0+阅读 · 2023年5月25日

pNLP-Mixer: an Efficient all-MLP Architecture for Language

Arxiv

0+阅读 · 2023年5月25日

Mixture-of-Expert Conformer for Streaming Multilingual ASR

Arxiv

0+阅读 · 2023年5月25日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

215+阅读 · 2023年4月7日

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Arxiv

39+阅读 · 2019年1月17日

相关基金

非凸稀疏正则化模型与算法的研究

国家自然科学基金

3+阅读 · 2015年12月31日

非局部总变差正则化图像恢复模型的快速子空间校正算法

国家自然科学基金

0+阅读 · 2014年12月31日

花生四烯酸P450表氧化酶代谢物-EETs通过抑制肝脏炎症及刺激肝脏神经信号调控胰腺β细胞增殖及功能

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素通过ERα介导lncRNA 1200076调节卵巢ERα（+）细胞生物学行为

国家自然科学基金

0+阅读 · 2012年12月31日

西沙群岛四种海绵新颖结构抗肿瘤活性成分的发现研究

国家自然科学基金

0+阅读 · 2012年12月31日

BCR/ABL-HDAC双靶点抑制剂的设计、合成及生物活性研究

国家自然科学基金

0+阅读 · 2012年12月31日

颗粒材料中的偶应力效应及Cosserat介质本构模拟研究

国家自然科学基金

0+阅读 · 2011年12月31日

物联网轻量级健壮安全中的关键问题研究

国家自然科学基金

1+阅读 · 2011年12月31日

一类necroptosis诱导剂抗肿瘤干细胞的研究

国家自然科学基金

0+阅读 · 2009年12月31日

新型选择性CDKs抑制剂的设计、合成与生物活性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员