DFX: 一种用于加速以变压器为基础的基于文本生成的低时多功能、 (DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation) - 专知论文

会员服务 ·

0

GPT-2 · MoDELS · Processing（编程语言） · 变换 · Performer ·

2022 年 9 月 22 日

DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation

翻译：DFX: 一种用于加速以变压器为基础的基于文本生成的低时多功能、

Seongmin Hong,Seungjae Moon,Junsoo Kim,Sungjae Lee,Minsub Kim,Dongsoo Lee,Joo-Young Kim

from arxiv, Extension of HOTCHIPS 2022 and accepted in MICRO 2022

Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pre-trained Transformer (GPT) has achieved remarkable performance in text generation, or natural language generation (NLG), which needs the processing of a large input context in the summarization stage, followed by the generation stage that produces a single word at a time. The conventional platforms such as GPU are specialized for the parallel processing of large inputs in the summarization stage, but their performance significantly degrades in the generation stage due to its sequential characteristic. Therefore, an efficient hardware platform is required to address the high latency caused by the sequential characteristic of text generation. In this paper, we present DFX, a multi-FPGA acceleration appliance that executes GPT-2 model inference end-to-end with low latency and high throughput in both summarization and generation stages. DFX uses model parallelism and optimized dataflow that is model-and-hardware-aware for fast simultaneous workload execution among devices. Its compute cores operate on custom instructions and provide GPT-2 operations end-to-end. We implement the proposed hardware architecture on four Xilinx Alveo U280 FPGAs and utilize all of the channels of the high bandwidth memory (HBM) and the maximum number of compute resources for high hardware efficiency. DFX achieves 5.58x speedup and 3.99x energy efficiency over four NVIDIA V100 GPUs on the modern GPT-2 model. DFX is also 8.21x more cost-effective than the GPU appliance, suggesting that it is a promising solution for text generation workloads in cloud datacenters.

翻译：变压器是一种深层次的学习语言模型,广泛用于在数据中心的自然语言处理(NLP)服务。在变压器模型中,先训练变压器(GPT)在文本生成或自然语言生成(NLG)中取得了显著的成绩。在文本生成或自然语言生成(NLG)中,需要处理大量输入环境,随后是生成阶段,产生一个单词。GPU等传统平台专门用于在合成阶段平行处理大量投入,但由于其相继特性,其性能在生成阶段显著下降。因此,需要有一个高效的硬件平台来解决因文本生成的相继特性而导致的高透明性。在本文件中,我们提供DFX,一个多功能变压器加速能力,在合成阶段和生成阶段同时处理大量投入。DFX使用模型平行和优化数据流,在设备之间同时执行的模型和硬件同步运行。它在GPFA 4 高级智能操作中,使用所有硬拷贝机中,并且使用SBEFA-DFA的高级智能操作中, 4个高级智能和S-SDFDFDFDFDS 的高级操作系统运行系统。

0

相关内容

GPT-2

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

组蛋白甲基化酶G9a调控糖尿病肾病中巨噬细胞极化失衡的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Massive MIMO 系统中接收端低复杂度检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于印迹基因Dlk1探讨针刺阳明经穴防治痿病肌萎缩的表观遗传调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

Akt磷酸化Prohibitin介导其线粒体转位促进膀胱癌的增殖

国家自然科学基金

0+阅读 · 2014年12月31日

中国淡水桥弯藻（Cymbelloid）植物分类学研究

国家自然科学基金

1+阅读 · 2014年12月31日

IL-35在动脉粥样硬化进程中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

纳米氧化铜致中脑小胶质细胞活化介导多巴胺能神经元的损伤及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向智能配电网的电力电子变压器关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向气动CFD非线性求解的GPU/CPU混合并行JFNK算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国人新的老年黄斑变性基因的鉴定及功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

GowFed -- A novel Federated Network Intrusion Detection System

Arxiv

0+阅读 · 2022年11月2日

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Arxiv

0+阅读 · 2022年11月1日

An analysis of degenerating speech due to progressive dysarthria on ASR performance

Arxiv

0+阅读 · 2022年10月31日

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Arxiv

1+阅读 · 2022年10月31日

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

Arxiv

0+阅读 · 2022年10月29日

ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation

Arxiv

0+阅读 · 2022年10月28日

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

Arxiv

0+阅读 · 2022年10月28日

Concadia: Towards Image-Based Text Generation with a Purpose

Arxiv

0+阅读 · 2022年10月27日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

GowFed -- A novel Federated Network Intrusion Detection System

Arxiv

0+阅读 · 2022年11月2日

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

Arxiv

0+阅读 · 2022年11月1日

An analysis of degenerating speech due to progressive dysarthria on ASR performance

Arxiv

0+阅读 · 2022年10月31日

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Arxiv

1+阅读 · 2022年10月31日

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

Arxiv

0+阅读 · 2022年10月29日

ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation

Arxiv

0+阅读 · 2022年10月28日

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

Arxiv

0+阅读 · 2022年10月28日

Concadia: Towards Image-Based Text Generation with a Purpose

Arxiv

0+阅读 · 2022年10月27日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

组蛋白甲基化酶G9a调控糖尿病肾病中巨噬细胞极化失衡的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Massive MIMO 系统中接收端低复杂度检测技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于印迹基因Dlk1探讨针刺阳明经穴防治痿病肌萎缩的表观遗传调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

Akt磷酸化Prohibitin介导其线粒体转位促进膀胱癌的增殖

国家自然科学基金

0+阅读 · 2014年12月31日

中国淡水桥弯藻（Cymbelloid）植物分类学研究

国家自然科学基金

1+阅读 · 2014年12月31日

IL-35在动脉粥样硬化进程中的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

纳米氧化铜致中脑小胶质细胞活化介导多巴胺能神经元的损伤及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向智能配电网的电力电子变压器关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向气动CFD非线性求解的GPU/CPU混合并行JFNK算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

中国人新的老年黄斑变性基因的鉴定及功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员