HEP-BNN:在异源多处理平台上找到BNN的低长期执行配置框架 (HEP-BNN: A Framework for Finding Low-Latency Execution Configurations of BNNs on Heterogeneous Multiprocessor Platforms) - 专知论文

会员服务 ·

0

GPU · 中央处理器 (CPU) · 层 · 可约的 · Fashion MNIST (数据集) ·

2023 年 1 月 12 日

HEP-BNN: A Framework for Finding Low-Latency Execution Configurations of BNNs on Heterogeneous Multiprocessor Platforms

翻译：HEP-BNN:在异源多处理平台上找到BNN的低长期执行配置框架

Leonard David Bereholschi,Ching-Chi Lin,Mikail Yayla,Jian-Jia Chen

from arxiv, This paper has been accepted for presentation in the 5th Workshop on Accelerated Machine Learning (AccML), co-located with HiPEAC23. 8 Pages, 5 figures

Binarized Neural Networks (BNNs) significantly reduce the computation and memory demands with binarized weights and activations compared to full-precision NNs. Executing a layer in a BNN on different devices of a heterogeneous multiprocessor platform consisting of CPU and GPU can affect the inference performance, i.e., accuracy and latency. Usually, a heterogeneous HW platform consisting of a CPU and a GPU is available to execute the BNN workloads. However, to use the heterogeneous HW effectively, it is necessary to find an efficient strategy for BNN workload mapping. In this work, we propose a framework that generates efficient BNN layer-to-device mappings (i.e. suitable parallel configuration for each layer of the model) for execution platforms comprised of CPU and CUDA-capable GPU. We evaluate our proposed framework with two BNN architectures using two well-known datasets, Fashion-MNIST and CIFAR-10, on three hardware platforms with different characteristics. The results show that compared to running a fully-parallelized GPU implementation, our framework generates an efficient configuration up to 2x, 2.6x and 11.8x faster on our tested hardware respectively.

翻译：由 CPU 和 CUDA 能力强的 GPU 构成的多处理平台的不同装置上,在由 CPU 和 GPU 组成的多式多处理平台的不同装置上,在 BNN 中执行一个层,可以影响推断性能,即准确性和延缓性。通常,由 CPU 和 GPU 组成的不同 HW 平台可以执行 BNN 工作量。然而,为了有效地使用混杂的 HW,有必要为 BNN 工作量绘图找到一个有效的战略。在这项工作中,我们提议了一个框架,为由 CPU 和 CUDA 能力强的 GPU组成的执行平台生成高效的 BNN 层到设备图(即每一层的合适平行配置 ) 。我们用两个众所周知的数据集( Fashashion-MNIST 和 CIFAR-10) 来评估我们提议的框架。在三个具有不同特性的硬件平台上使用两个 BNNNE 。结果显示,与完全平行的GPUx 11 和两个测试的硬件配置相比,我们框架将产生高效的配置。

0

相关内容

GPU

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Alpha稳定分布环境下的非圆信号波达方向估计方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Aurivillius-Sillenite结构光催化材料的性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于天然产物Drimenal的新型杀菌剂分子设计、合成及构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于高阶密度泛函方法研究联吡啶类有机金属配合物的电子结构及非共价相互作用

国家自然科学基金

0+阅读 · 2013年12月31日

ERF转录因子亚家族基因在甜瓜果实发育中的功能解析

国家自然科学基金

0+阅读 · 2012年12月31日

中国小麦地方品种抗白粉病基因Pm5e精细遗传图谱绘制

国家自然科学基金

0+阅读 · 2012年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于杂交小波变换的农业图像去噪算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Graph parameters, implicit representations and factorial properties

Arxiv

0+阅读 · 2023年3月8日

Many-core algorithms for high-dimensional gradients on phylogenetic trees

Arxiv

0+阅读 · 2023年3月8日

QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms

Arxiv

1+阅读 · 2023年3月8日

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Arxiv

0+阅读 · 2023年3月8日

An ODE Model for Dynamic Matching in Heterogeneous Networks

Arxiv

0+阅读 · 2023年3月8日

Decentralized Training of Foundation Models in Heterogeneous Environments

Arxiv

0+阅读 · 2023年3月6日

Faster Learning of Temporal Action Proposal via Sparse Multilevel Boundary Generator

Arxiv

0+阅读 · 2023年3月6日

Multi-Order Networks for Action Unit Detection

Arxiv

0+阅读 · 2023年3月6日

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

Arxiv

0+阅读 · 2023年3月3日

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Arxiv

14+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

中央处理器 (CPU)

Fashion MNIST (数据集)

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

【WSDM 2020】RecVAE:一种新的变分自编码器，用于具有隐式反馈的Top-N推荐（RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback）

专知会员服务

32+阅读 · 2019年12月26日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Graph parameters, implicit representations and factorial properties

Arxiv

0+阅读 · 2023年3月8日

Many-core algorithms for high-dimensional gradients on phylogenetic trees

Arxiv

0+阅读 · 2023年3月8日

QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms

Arxiv

1+阅读 · 2023年3月8日

Runtime Support for Performance Portability on Heterogeneous Distributed Platforms

Arxiv

0+阅读 · 2023年3月8日

An ODE Model for Dynamic Matching in Heterogeneous Networks

Arxiv

0+阅读 · 2023年3月8日

Decentralized Training of Foundation Models in Heterogeneous Environments

Arxiv

0+阅读 · 2023年3月6日

Faster Learning of Temporal Action Proposal via Sparse Multilevel Boundary Generator

Arxiv

0+阅读 · 2023年3月6日

Multi-Order Networks for Action Unit Detection

Arxiv

0+阅读 · 2023年3月6日

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

Arxiv

0+阅读 · 2023年3月3日

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Arxiv

14+阅读 · 2018年1月31日

相关基金

Alpha稳定分布环境下的非圆信号波达方向估计方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Aurivillius-Sillenite结构光催化材料的性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

领域驱动空间co-location模式挖掘技术研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于天然产物Drimenal的新型杀菌剂分子设计、合成及构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于高阶密度泛函方法研究联吡啶类有机金属配合物的电子结构及非共价相互作用

国家自然科学基金

0+阅读 · 2013年12月31日

ERF转录因子亚家族基因在甜瓜果实发育中的功能解析

国家自然科学基金

0+阅读 · 2012年12月31日

中国小麦地方品种抗白粉病基因Pm5e精细遗传图谱绘制

国家自然科学基金

0+阅读 · 2012年12月31日

催化型氮杂Wittig反应合成多取代杂环的新方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于杂交小波变换的农业图像去噪算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员