序列平行主义:从系统角度看长期序列培训 (Sequence Parallelism: Long Sequence Training from System Perspective) - 专知论文

会员服务 ·

0

Batch Size · 缩放 · Performer · Integration · RSA 加密 ·

2022 年 1 月 15 日

Sequence Parallelism: Long Sequence Training from System Perspective

翻译：序列平行主义:从系统角度看长期序列培训

Shenggui Li,Fuzhao Xue,Yongbin Li,Yang You

Self-attention suffers from quadratic memory requirements with respect to the sequence length. In this work, we propose sequence parallelism, a memory-efficient parallelism method to help us break input sequence length limitation and train with longer sequences on GPUs efficiently. Our approach is compatible with most existing parallelisms. More importantly, we no longer require a single device to hold the whole sequence. Specifically, we split the input sequence into multiple chunks and feed each chunk into its corresponding device (i.e. GPU). To compute the attention output, we integrated ring-style communication with self-attention calculation and proposed Ring Self-Attention (RSA). Experiments show that sequence parallelism performs well when scaling with batch size and sequence length. Compared with tensor parallelism, our approach achieved $13.7\times$ and $3.0\times$ maximum batch size and sequence length respectively when scaling up to 64 NVIDIA P100 GPUs.

翻译：自留因序列长度的二次记忆要求而受到影响。在这项工作中, 我们提出序列平行法, 这是一种记忆高效的平行法, 以帮助我们打破输入序列长度限制, 并高效地在 GPU 上进行较长序列的训练。我们的方法与大多数现有的平行法是兼容的。更重要的是, 我们不再需要单个设备来保持整个序列。具体地说, 我们将输入序列分成多个块, 并将每个块输入相应的设备( e. GPU ) 。为了计算关注输出, 我们将环形通信与自留计算和拟议环形自留( RSA ) 整合起来。实验显示, 序列平行法与批量大小和序列长度相比, 当向64 NVIDIA P100 GPUs 扩展时, 我们的方法分别达到了1.37美元和3.0美元的最大批量和序列长度。

0

相关内容

Batch Size

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

MFC产电驱动-ZnFe2O4/TiO2可见光催化-H2O2氧化耦合体系构筑及协同降解作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Ta2O5-WO3-RxOy系统相关系及TaW基抗氧化合金组分优化

国家自然科学基金

0+阅读 · 2014年12月31日

面向逆时偏移算法的FPGA加速技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

mTOR信号通路基因遗传变异与肾癌发病和耐药机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于非刚性变换不变性的动态三维几何模型分解技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

序列密码的密钥流稳定性研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

基于FPGA+ARM的电力谐波检测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

Massive Parallelization of Massive Sample-size Survival Analysis

Arxiv

0+阅读 · 2022年4月18日

Improving Rare Word Recognition with LM-aware MWER Training

Arxiv

0+阅读 · 2022年4月15日

LaMemo: Language Modeling with Look-Ahead Memory

Arxiv

0+阅读 · 2022年4月15日

Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models

Arxiv

0+阅读 · 2022年4月15日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

49+阅读 · 2021年1月6日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

相关论文

Massive Parallelization of Massive Sample-size Survival Analysis

Arxiv

0+阅读 · 2022年4月18日

Improving Rare Word Recognition with LM-aware MWER Training

Arxiv

0+阅读 · 2022年4月15日

LaMemo: Language Modeling with Look-Ahead Memory

Arxiv

0+阅读 · 2022年4月15日

Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models

Arxiv

0+阅读 · 2022年4月15日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

49+阅读 · 2021年1月6日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

Distance-based Self-Attention Network for Natural Language Inference

Arxiv

10+阅读 · 2017年12月6日

Deep Semantic Role Labeling with Self-Attention

Arxiv

13+阅读 · 2017年12月5日

相关基金

MFC产电驱动-ZnFe2O4/TiO2可见光催化-H2O2氧化耦合体系构筑及协同降解作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Ta2O5-WO3-RxOy系统相关系及TaW基抗氧化合金组分优化

国家自然科学基金

0+阅读 · 2014年12月31日

面向逆时偏移算法的FPGA加速技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

数据并行与线程并行合一的可伸缩处理器体系结构

国家自然科学基金

2+阅读 · 2013年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

mTOR信号通路基因遗传变异与肾癌发病和耐药机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于非刚性变换不变性的动态三维几何模型分解技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

序列密码的密钥流稳定性研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于图形处理器的高性能计算

国家自然科学基金

0+阅读 · 2009年12月31日

基于FPGA+ARM的电力谐波检测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员