提高高KSspmV的HBM 效率,以在FPGAs上近似嵌入相似性 (Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs) - 专知论文

会员服务 ·

0

可约的 · FPGA · 相似度 · 缩放 · 稀疏 ·

2021 年 3 月 8 日

Scaling up HBM Efficiency of Top-K SpMV for Approximate Embedding Similarity on FPGAs

翻译：提高高KSspmV的HBM 效率,以在FPGAs上近似嵌入相似性

Alberto Parravicini,Luca Giuseppe Cellamare,Marco Siracusa,Marco Domenico Santambrogio

from arxiv, To appear in Proceedings of the 58th Design Automation Conference (DAC)

Top-K SpMV is a key component of similarity-search on sparse embeddings. This sparse workload does not perform well on general-purpose NUMA systems that employ traditional caching strategies. Instead, modern FPGA accelerator cards have a few tricks up their sleeve. We introduce a Top-K SpMV FPGA design that leverages reduced precision and a novel packet-wise CSR matrix compression, enabling custom data layouts and delivering bandwidth efficiency often unreachable even in architectures with higher peak bandwidth. With HBM-based boards, we are 100x faster than a multi-threaded CPU implementation and 2x faster than a GPU with 20% higher bandwidth, with 14.2x higher power-efficiency.

翻译：顶KSpMV是稀薄嵌入层的类似搜索的关键组成部分。这种稀薄的工作量在采用传统缓冲策略的通用NUMA系统上表现不佳。相反,现代的FPGA加速器卡的袖子里有一些技巧。我们引入了顶K SpMV FPGA设计,利用了降低精确度和新颖的包式CSR矩阵压缩,使定制数据布局和带宽效率即使在高峰宽带宽的建筑中也常常无法达到。在基于 HBM 的板块上,我们比多轨CPU执行速度快100x,比高20%带宽的GPU速度2x快,高14.2x功率。

0

相关内容

可约的

小米在预训练模型的探索与优化

小米在预训练模型的探索与优化

专知会员服务

20+阅读 · 2020年12月31日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

78+阅读 · 2020年7月23日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

专知会员服务

25+阅读 · 2020年3月17日

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

专知会员服务

100+阅读 · 2019年11月15日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

深度神经网络模型压缩与加速综述

专知会员服务

130+阅读 · 2019年10月12日

Windows 提权-快速查找 Exp

Windows 提权-快速查找 Exp

黑白之道

3+阅读 · 2019年1月23日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

使用 MPI for Python 并行化遗传算法

使用 MPI for Python 并行化遗传算法

Python开发者

5+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

已删除

将门创投

7+阅读 · 2017年7月11日

A Novel Approximate Hamming Weight Computing for Spiking Neural Networks: an FPGA Friendly Architecture

Arxiv

0+阅读 · 2021年4月29日

Fast convolutional neural networks on FPGAs with hls4ml

Arxiv

0+阅读 · 2021年4月29日

Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

Arxiv

0+阅读 · 2021年4月29日

Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications

Arxiv

0+阅读 · 2021年4月28日

Fast Parallel Newton-Raphson Power Flow Solver for Large Number of System Calculations with CPU and GPU

Arxiv

0+阅读 · 2021年4月28日

Revisiting Light Field Rendering with Deep Anti-Aliasing Neural Network

Arxiv

0+阅读 · 2021年4月28日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation

FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation

Arxiv

8+阅读 · 2019年2月8日

Scalable attribute-aware network embedding with localily

Arxiv

3+阅读 · 2018年4月17日

Efficient and Deep Person Re-Identification using Multi-Level Similarity

Arxiv

4+阅读 · 2018年4月2日

VIP会员

文章信息

相关主题

相关VIP内容

小米在预训练模型的探索与优化

小米在预训练模型的探索与优化

专知会员服务

20+阅读 · 2020年12月31日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

127+阅读 · 2020年8月2日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

78+阅读 · 2020年7月23日

【Manning新书】现代Java实战，592页pdf

【Manning新书】现代Java实战，592页pdf

专知会员服务

101+阅读 · 2020年5月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

专知会员服务

25+阅读 · 2020年3月17日

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

【MLA 2019】自然语言处理中的表示学习进展：从Transfomer到BERT，复旦大学邱锡鹏

专知会员服务

100+阅读 · 2019年11月15日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

深度神经网络模型压缩与加速综述

专知会员服务

130+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

超越机械控制：神经形态军事人工智能中的因果决策处理

《构建战略杀伤力：美军联合部队学习与领导者发展的特种作战模型》

《元宇宙在军事领域的应用》

《乌克兰战场联合兵种机动的新兴方法》最新报告

相关资讯

Windows 提权-快速查找 Exp

Windows 提权-快速查找 Exp

黑白之道

3+阅读 · 2019年1月23日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

前端高性能计算（4）：GPU加速计算

前端高性能计算（4）：GPU加速计算

前端大全

7+阅读 · 2017年10月26日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

使用 MPI for Python 并行化遗传算法

使用 MPI for Python 并行化遗传算法

Python开发者

5+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

已删除

将门创投

7+阅读 · 2017年7月11日

相关论文

A Novel Approximate Hamming Weight Computing for Spiking Neural Networks: an FPGA Friendly Architecture

Arxiv

0+阅读 · 2021年4月29日

Fast convolutional neural networks on FPGAs with hls4ml

Arxiv

0+阅读 · 2021年4月29日

Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions

Arxiv

0+阅读 · 2021年4月29日

Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications

Arxiv

0+阅读 · 2021年4月28日

Fast Parallel Newton-Raphson Power Flow Solver for Large Number of System Calculations with CPU and GPU

Arxiv

0+阅读 · 2021年4月28日

Revisiting Light Field Rendering with Deep Anti-Aliasing Neural Network

Arxiv

0+阅读 · 2021年4月28日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation

FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation

Arxiv

8+阅读 · 2019年2月8日

Scalable attribute-aware network embedding with localily

Arxiv

3+阅读 · 2018年4月17日

Efficient and Deep Person Re-Identification using Multi-Level Similarity

Arxiv

4+阅读 · 2018年4月2日

微信扫码咨询专知VIP会员