字元转换器: 一种高性能转换器, 用于可变字符输入 (ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs) - 专知论文

会员服务 ·

0

变换 · Boosting（一种模型训练加速方式） · MoDELS · Performer · Transformer模型 ·

2022 年 10 月 6 日

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

翻译：字元转换器: 一种高性能转换器, 用于可变字符输入

Yujia Zhai,Chengquan Jiang,Leyuan Wang,Xiaoying Jia,Shang Zhang,Zizhong Chen,Xin Liu,Yibo Zhu

from arxiv, In submission

Transformer is the cornerstone model of Natural Language Processing (NLP) over the past decade. Despite its great success in Deep Learning (DL) applications, the increasingly growing parameter space required by transformer models boosts the demand on accelerating the performance of transformer models. In addition, NLP problems can commonly be faced with variable-length sequences since their word numbers can vary among sentences. Existing DL frameworks need to pad variable-length sequences to the maximal length, which, however, leads to significant memory and computational overhead. In this paper, we present ByteTransformer, a high-performance transformer boosted for variable-length inputs. We propose a zero padding algorithm that enables the whole transformer to be free from redundant computations on useless padded tokens. Besides the algorithmic level optimization, we provide architectural-aware optimizations for transformer functioning modules, especially the performance-critical algorithm, multi-head attention (MHA). Experimental results on an NVIDIA A100 GPU with variable-length sequence inputs validate that our fused MHA (FMHA) outperforms the standard PyTorch MHA by 6.13X. The end-to-end performance of ByteTransformer for a standard BERT transformer model surpasses the state-of-the-art Transformer frameworks, such as PyTorch JIT, TensorFlow XLA, Tencent TurboTransformer and NVIDIA FasterTransformer, by 87\%, 131\%, 138\% and 46\%, respectively.

翻译：变异器是过去十年里自然语言处理(NLP)的基石模型。尽管变异器模型在深学习(DL)应用中取得了巨大成功, 但变异器模型所需的日益扩大的参数空间却提高了加速变异器模型性能的需求。此外, 变异序列通常会遇到变长序列问题, 因为其单数在句子上各有差异。现有的DL框架需要将变异长序列粘贴在最大长度上, 但是这会导致重要的记忆和计算管理管理。在本文中, 我们展示了“ 变异变异式变异器”, 一个高性能变异器加速了变异器投入。我们提出一个零拼换算法, 使整个变异器能够摆脱无用的加价符号上的重复计算。除了算法层面的优化之外, 我们还为变异器功能模块提供建筑认知优化, 特别是性能临界算法, 多头关注(MAHA) 。 NVIDIA A100 GPU 和变异序列输入的实验结果证实, 我们的MHA(FA) 混编装的比标准的JyTotar-TAR TRAT- Transl- TRA- TRATI- TRA- TRA- TRA- TRA- TRADRVER-T 和6-T-T-T-T- TRA- TRA-T-T-T-T-T- TRA- TRA- TRA-T-T-T- TRAVAR 6-T-T-T-T-T-T-TIR-T-T- TRA- TRATIW 和6-T-T-T-T-TIR-T-T-TRVR-T-T-T-T-T-T-T-T-T-T-T-T-T-T-TR-T-T-T-T-TRVDRVL-SIR 和6-T-T-T-SAR-T-T-T-6-T-T-T-T-T-T-T-T-T-T-T-S-T-T-T-T-T-T-S-S-S-S-T-T-T-T-T

0

相关内容

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

YOLOv5 v6.1发布！支持TensorRT+OpenVINO+TFLite等平台一键导出和部署

YOLOv5 v6.1发布！支持TensorRT+OpenVINO+TFLite等平台一键导出和部署

CVer

1+阅读 · 2022年2月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Wnt10b基因对安哥拉长毛兔次级毛囊周期的表观遗传调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

De Brujin图和Kautz图的交叉数算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向自私用户的自组织网络协作模型与算法

国家自然科学基金

0+阅读 · 2012年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

代数曲线在序列中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

钝齿棒杆菌精氨酸生物合成途径中argR基因调控机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

前列腺癌转移抑制基因CRMP4及其调控机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

K-Space Transformer for Undersampled MRI Reconstruction

Arxiv

0+阅读 · 2022年11月10日

Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection

Arxiv

0+阅读 · 2022年11月10日

Efficiently Scaling Transformer Inference

Arxiv

1+阅读 · 2022年11月9日

Focused Dynamic Slicing for Large Applications using an Abstract Memory-Model

Arxiv

0+阅读 · 2022年11月8日

What do we mean by "data"? A proposed classification of data types in the arts and humanities

Arxiv

0+阅读 · 2022年11月8日

High-resolution embedding extractor for speaker diarisation

Arxiv

0+阅读 · 2022年11月8日

Transformers in Remote Sensing: A Survey

Transformers in Remote Sensing: A Survey

Arxiv

25+阅读 · 2022年9月2日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

Transformer模型

相关VIP内容

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

《可消耗型精确打击：为陆军构建大规模无人机巡飞能力》2025最新报告

最新《扩散模型原理》新书，470页pdf

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

YOLOv5 v6.1发布！支持TensorRT+OpenVINO+TFLite等平台一键导出和部署

YOLOv5 v6.1发布！支持TensorRT+OpenVINO+TFLite等平台一键导出和部署

CVer

1+阅读 · 2022年2月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

K-Space Transformer for Undersampled MRI Reconstruction

Arxiv

0+阅读 · 2022年11月10日

Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection

Arxiv

0+阅读 · 2022年11月10日

Efficiently Scaling Transformer Inference

Arxiv

1+阅读 · 2022年11月9日

Focused Dynamic Slicing for Large Applications using an Abstract Memory-Model

Arxiv

0+阅读 · 2022年11月8日

What do we mean by "data"? A proposed classification of data types in the arts and humanities

Arxiv

0+阅读 · 2022年11月8日

High-resolution embedding extractor for speaker diarisation

Arxiv

0+阅读 · 2022年11月8日

Transformers in Remote Sensing: A Survey

Transformers in Remote Sensing: A Survey

Arxiv

25+阅读 · 2022年9月2日

AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing

Arxiv

23+阅读 · 2021年8月12日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

Wnt10b基因对安哥拉长毛兔次级毛囊周期的表观遗传调控机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

De Brujin图和Kautz图的交叉数算法及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向自私用户的自组织网络协作模型与算法

国家自然科学基金

0+阅读 · 2012年12月31日

高阶Schwarz导数与Teichmuller空间紧化

国家自然科学基金

0+阅读 · 2012年12月31日

代数曲线在序列中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

钝齿棒杆菌精氨酸生物合成途径中argR基因调控机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

前列腺癌转移抑制基因CRMP4及其调控机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

肾上腺源性及原发性高血压线粒体tRNAIle、tRNALeu(UUR)和tRNAlys基因突变的差异对比研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员