通过硬件和数据结构解决方案加速向微秒提供深建议系统 (MicroRec: Accelerating Deep Recommendation Systems to Microseconds by Hardware and Data Structure Solutions) - 专知论文

会员服务 ·

0

推断 · 可约的 · FPGA · Engineering · 推荐系统 ·

2020 年 10 月 12 日

MicroRec: Accelerating Deep Recommendation Systems to Microseconds by Hardware and Data Structure Solutions

翻译：通过硬件和数据结构解决方案加速向微秒提供深建议系统

Wenqi Jiang,Zhenhao He,Shuai Zhang,Thomas B. Preußer,Kai Zeng,Liang Feng,Jiansong Zhang,Tongxuan Liu,Yong Li,Jingren Zhou,Ce Zhang,Gustavo Alonso

from arxiv, Under submission

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.

翻译：在个人化建议系统中广泛使用深心血管网络。与正常的 DNN 推断工作量不同,建议推导值与正常的 DNN 推断值不同,建议推导值具有内存性,因为要查看嵌入表需要许多随机的内存存存权限,因此建议值在延缓度方面也受到很大限制,因为为用户提出建议必须在大约几十毫秒内完成。在本文中,我们提议MicroRec,这是建议系统的一种高性能推导引擎。微Rec加速建议引文,办法是(1)重新设计嵌入中的数据结构,以减少所需的查勘次数,(2)利用FPGA 中高宽线内存(HBM)的可用性,通过平行查勘,处理延缓度问题。我们已经在FPGA 板上实施了相应的设计,包括嵌入式查取步骤以及完整的推导过程。与优化的CPU基线(16 VCPU, AVX2-C) 相比,微后加参考,微Rec 系统实现了安装全13.8~14.7x(H) 高级嵌入系统,同时查看2.5~5摩车建议。

0

相关内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

基于知识图谱的深度学习推荐系统研究，21页pdf，Deep Learning on Knowledge Graph for Recommender System: A Survey

基于知识图谱的深度学习推荐系统研究，21页pdf，Deep Learning on Knowledge Graph for Recommender System: A Survey

专知会员服务

158+阅读 · 2020年4月2日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

【WWW2020-北邮】结构深度聚类网络，Structural Deep Clustering Network

【WWW2020-北邮】结构深度聚类网络，Structural Deep Clustering Network

专知会员服务

94+阅读 · 2020年2月14日

【论文】结构GANs，Structured GANs，

【论文】结构GANs，Structured GANs，

专知会员服务

15+阅读 · 2020年1月16日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

LibRec 精选：CCF TPCI 的推荐系统专刊征稿

LibRec 精选：CCF TPCI 的推荐系统专刊征稿

LibRec智能推荐

4+阅读 · 2019年1月12日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

人工智能 | 国际会议截稿信息5条

人工智能 | 国际会议截稿信息5条

Call4Papers

6+阅读 · 2017年11月22日

The duality structure gradient descent algorithm: analysis and applications to neural networks

Arxiv

0+阅读 · 2020年11月25日

Learn to Bind and Grow Neural Structures

Arxiv

0+阅读 · 2020年11月21日

FATNN: Fast and Accurate Ternary Neural Networks

Arxiv

0+阅读 · 2020年11月19日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

Arxiv

14+阅读 · 2018年6月6日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Learning over Knowledge-Base Embeddings for Recommendation

Arxiv

23+阅读 · 2018年3月22日

Reinforcement Learning based Recommender System using Biclustering Technique

Arxiv

5+阅读 · 2018年1月17日

Learning Tree-based Deep Model for Recommender Systems

Arxiv

7+阅读 · 2018年1月8日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

基于知识图谱的深度学习推荐系统研究，21页pdf，Deep Learning on Knowledge Graph for Recommender System: A Survey

基于知识图谱的深度学习推荐系统研究，21页pdf，Deep Learning on Knowledge Graph for Recommender System: A Survey

专知会员服务

158+阅读 · 2020年4月2日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

【WWW2020-北邮】结构深度聚类网络，Structural Deep Clustering Network

【WWW2020-北邮】结构深度聚类网络，Structural Deep Clustering Network

专知会员服务

94+阅读 · 2020年2月14日

【论文】结构GANs，Structured GANs，

【论文】结构GANs，Structured GANs，

专知会员服务

15+阅读 · 2020年1月16日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

CCF推荐 | 国际会议信息10条

CCF推荐 | 国际会议信息10条

Call4Papers

8+阅读 · 2019年5月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

LibRec 精选：CCF TPCI 的推荐系统专刊征稿

LibRec 精选：CCF TPCI 的推荐系统专刊征稿

LibRec智能推荐

4+阅读 · 2019年1月12日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

人工智能 | 国际会议截稿信息5条

人工智能 | 国际会议截稿信息5条

Call4Papers

6+阅读 · 2017年11月22日

相关论文

The duality structure gradient descent algorithm: analysis and applications to neural networks

Arxiv

0+阅读 · 2020年11月25日

Learn to Bind and Grow Neural Structures

Arxiv

0+阅读 · 2020年11月21日

FATNN: Fast and Accurate Ternary Neural Networks

Arxiv

0+阅读 · 2020年11月19日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Graph Convolutional Neural Networks for Web-Scale Recommender Systems

Arxiv

14+阅读 · 2018年6月6日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

Learning over Knowledge-Base Embeddings for Recommendation

Arxiv

23+阅读 · 2018年3月22日

Reinforcement Learning based Recommender System using Biclustering Technique

Arxiv

5+阅读 · 2018年1月17日

Learning Tree-based Deep Model for Recommender Systems

Arxiv

7+阅读 · 2018年1月8日

微信扫码咨询专知VIP会员