EdgeTran: 针对移动边缘平台高效推断共同设计Transformer模型 (EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms) - 专知论文

会员服务 ·

0

Transformer模型 · 边缘 · 边缘设备 · 峰值功率 · Transformer ·

2023 年 3 月 24 日

EdgeTran: Co-designing Transformers for Efficient Inference on Mobile Edge Platforms

翻译：EdgeTran: 针对移动边缘平台高效推断共同设计Transformer模型

Shikhar Tuli,Niraj K. Jha

Automated design of efficient transformer models has recently attracted significant attention from industry and academia. However, most works only focus on certain metrics while searching for the best-performing transformer architecture. Furthermore, running traditional, complex, and large transformer models on low-compute edge platforms is a challenging problem. In this work, we propose a framework, called ProTran, to profile the hardware performance measures for a design space of transformer architectures and a diverse set of edge devices. We use this profiler in conjunction with the proposed co-design technique to obtain the best-performing models that have high accuracy on the given task and minimize latency, energy consumption, and peak power draw to enable edge deployment. We refer to our framework for co-optimizing accuracy and hardware performance measures as EdgeTran. It searches for the best transformer model and edge device pair. Finally, we propose GPTran, a multi-stage block-level grow-and-prune post-processing step that further improves accuracy in a hardware-aware manner. The obtained transformer model is 2.8$\times$ smaller and has a 0.8% higher GLUE score than the baseline (BERT-Base). Inference with it on the selected edge device enables 15.0% lower latency, 10.0$\times$ lower energy, and 10.8$\times$ lower peak power draw compared to an off-the-shelf GPU.

翻译：摘要：自动设计高效的Transformer模型最近在工业界和学术界引起了极大的关注。然而，大多数研究只关注某些指标，在搜索最佳表现的Transformer架构时忽略了其他指标。此外，在低计算力的边缘平台上运行传统的、复杂的、大型Transformer模型是一个具有挑战性的问题。本文提出了一个框架ProTran，用于在不同的边缘设备上探测一系列Transformer架构的硬件性能。同时，使用提出的共同设计技术，获取最佳性能模型，以便能够高精度地完成给定任务，同时能够在边缘部署时降低延迟、能耗和峰值功率消耗。我们将为优化精度和硬件性能指标的框架称为EdgeTran。它对最佳Transformer模型和边缘设备进行了搜索匹配。最后，我们提出了GPTran，一种多阶段块级增长与剪枝的后处理步骤，以硬件感知的方式进一步提高了精度。所得到的Transformer模型比基线(BERT-Base)小2.8倍，并且具有0.8%更高的GLUE分数。在选择的边缘设备上进行推断，相对于现成的GPU设备，EdgeTran可以实现15.0%的更低延迟、10.0倍的能源消耗降低以及10.8倍的峰值功率降低。

0

相关内容

Transformer模型

Transformer模型

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【KDD2021】TUTA: 通用表格预训练的树结构Transformer

专知会员服务

25+阅读 · 2021年8月22日

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

95+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Presto on Apache Kafka 在 Uber的大规模应用

Presto on Apache Kafka 在 Uber的大规模应用

AI前线

0+阅读 · 2022年6月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

小球藻病毒编码的泛素连接酶E3复合物组分cvANK2和cvSkp1功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

引入蛋白质主链和侧链耦合作用的侧链结构模拟算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cofilin在Erucin诱导的乳腺癌细胞线粒体分裂和细胞凋亡中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

针对Android系统的Java/C++多语言接口建模与分析

国家自然科学基金

0+阅读 · 2012年12月31日

以HDACs为靶点的抗肿瘤先导物的设计、合成及定量构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

海量不确定数据流的分布并行Skyline查询处理模型与算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

生物分子模拟中的PDE模型与高效计算

国家自然科学基金

0+阅读 · 2012年12月31日

抗菌肽Apidaecin抑制革兰氏阴性菌E.coli的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

海鞘多肽PCIA诱导细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

Arxiv

0+阅读 · 2023年5月15日

QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms

Arxiv

0+阅读 · 2023年5月14日

Exploiting Constructive Interference for Backscatter Communication Systems

Arxiv

0+阅读 · 2023年5月12日

Open-WikiTable: Dataset for Open Domain Question Answering with Complex Reasoning over Table

Arxiv

0+阅读 · 2023年5月12日

Generalized Iterative Scaling for Regularized Optimal Transport with Affine Constraints: Application Examples

Arxiv

0+阅读 · 2023年5月11日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Enabling Deep Learning on Edge Devices

Arxiv

19+阅读 · 2022年10月6日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Heterogeneous Graph Transformer

Heterogeneous Graph Transformer

Arxiv

27+阅读 · 2020年3月3日

VIP会员

文章信息

相关主题

Transformer模型

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【KDD2021】TUTA: 通用表格预训练的树结构Transformer

专知会员服务

25+阅读 · 2021年8月22日

Google-EfficientNet v2来了！更快，更小，更强！

Google-EfficientNet v2来了！更快，更小，更强！

专知会员服务

19+阅读 · 2021年4月4日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

基于Transformer嵌入模型的个性化产品搜索，A Transformer-based Embedding Model for Personalized Product Search

专知会员服务

31+阅读 · 2020年5月20日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

【中科院计算所】边缘计算与工具综述论文，A Survey on Edge Computing Systems and Tools

专知会员服务

95+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】在线学习：优化、控制与学习理论

不确定环境下无人机三维路径规划研究 | 221页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

大语言模型将如何改变军事指挥结构

相关资讯

Presto on Apache Kafka 在 Uber的大规模应用

Presto on Apache Kafka 在 Uber的大规模应用

AI前线

0+阅读 · 2022年6月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

相关论文

Dragon-Alpha&cu32: A Java-based Tensor Computing Framework With its High-Performance CUDA Library

Arxiv

0+阅读 · 2023年5月15日

QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms

Arxiv

0+阅读 · 2023年5月14日

Exploiting Constructive Interference for Backscatter Communication Systems

Arxiv

0+阅读 · 2023年5月12日

Open-WikiTable: Dataset for Open Domain Question Answering with Complex Reasoning over Table

Arxiv

0+阅读 · 2023年5月12日

Generalized Iterative Scaling for Regularized Optimal Transport with Affine Constraints: Application Examples

Arxiv

0+阅读 · 2023年5月11日

Full Stack Optimization of Transformer Inference: a Survey

Arxiv

19+阅读 · 2023年2月27日

Enabling Deep Learning on Edge Devices

Arxiv

19+阅读 · 2022年10月6日

EDTER: Edge Detection with Transformer

Arxiv

11+阅读 · 2022年3月16日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Heterogeneous Graph Transformer

Heterogeneous Graph Transformer

Arxiv

27+阅读 · 2020年3月3日

相关基金

小球藻病毒编码的泛素连接酶E3复合物组分cvANK2和cvSkp1功能研究

国家自然科学基金

0+阅读 · 2014年12月31日

引入蛋白质主链和侧链耦合作用的侧链结构模拟算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cofilin在Erucin诱导的乳腺癌细胞线粒体分裂和细胞凋亡中的作用及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

针对Android系统的Java/C++多语言接口建模与分析

国家自然科学基金

0+阅读 · 2012年12月31日

以HDACs为靶点的抗肿瘤先导物的设计、合成及定量构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

海量不确定数据流的分布并行Skyline查询处理模型与算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

生物分子模拟中的PDE模型与高效计算

国家自然科学基金

0+阅读 · 2012年12月31日

抗菌肽Apidaecin抑制革兰氏阴性菌E.coli的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

海鞘多肽PCIA诱导细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员