与同时使用主机连接的近近数据加速 (Near Data Acceleration with Concurrent Host Access) - 专知论文

会员服务 ·

0

Performer · 可约的 · 记忆容量 · 可辨认的 · Integration ·

2020 年 11 月 30 日

Near Data Acceleration with Concurrent Host Access

翻译：与同时使用主机连接的近近数据加速

Benjamin Y. Cho,Yongkee Kwon,Sangkug Lym,Mattan Erez

Near-data accelerators (NDAs) that are integrated with main memory have the potential for significant power and performance benefits. Fully realizing these benefits requires the large available memory capacity to be shared between the host and the NDAs in a way that permits both regular memory access by some applications and accelerating others with an NDA, avoids copying data, enables collaborative processing, and simultaneously offers high performance for both host and NDA. We identify and solve new challenges in this context: mitigating row-locality interference from host to NDAs, reducing read/write-turnaround overhead caused by fine-grain interleaving of host and NDA requests, architecting a memory layout that supports the locality required for NDAs and sophisticated address interleaving for host performance, and supporting both packetized and traditional memory interfaces. We demonstrate our approach in a simulated system that consists of a multi-core CPU and NDA-enabled DDR4 memory modules. We show that our mechanisms enable effective and efficient concurrent access using a set of microbenchmarks, and then demonstrate the potential of the system for the important stochastic variance-reduced gradient (SVRG) algorithm.

翻译：与主记忆结合的近数据加速器(NDAs)具有巨大的动力和性能效益的潜力。这些效益的充分实现要求主机与非数据机之间共享大量可用的存储能力,以便通过一些应用程序定期存取存储能力,并加速其他应用程序使用NDA,避免复制数据,促成合作处理,同时为主机和NDA提供高性能。我们确定并解决这方面的新挑战:减轻主机到主机的行地权干扰,减少主机和NDA请求微微重分互换引起的读/翻转间接间接费用,设计一个支持主机所需地点的存储布局,以及用于主机性能的复杂地址互换,并支持包装式和传统存储界面。我们展示了我们采用由多核心的 CPU 和 NDADA 驱动的DM4 记忆模块组成的模拟系统的方法。我们表明,我们的机制能够利用一套微小断层标记使同时存取有效、高效的连接,然后展示系统的潜力,用于重要的千位变化变的梯算算法。

0

相关内容

Performer

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

【O'Reilly AI Conference 2019】AI成长之路：使用指南（For AI to thrive, failure is necessary: A practical guide (sponsored by IBM Watson)),IBM Ritika Gunnar

【O'Reilly AI Conference 2019】AI成长之路：使用指南（For AI to thrive, failure is necessary: A practical guide (sponsored by IBM Watson)),IBM Ritika Gunnar

专知会员服务

11+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

专知会员服务

10+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

8+阅读 · 2019年1月30日

LibRec 精选：CCF TPCI 的推荐系统专刊征稿

LibRec 精选：CCF TPCI 的推荐系统专刊征稿

LibRec智能推荐

4+阅读 · 2019年1月12日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

人工智能类 | 国际会议/SCI期刊专刊信息9条

人工智能类 | 国际会议/SCI期刊专刊信息9条

Call4Papers

4+阅读 · 2018年7月10日

【论文推荐】最新八篇目标跟踪相关论文—自适应相关滤波、因果关系图模型、TrackingNet、ClickBAIT、图像矩模型

【论文推荐】最新八篇目标跟踪相关论文—自适应相关滤波、因果关系图模型、TrackingNet、ClickBAIT、图像矩模型

专知

4+阅读 · 2018年4月18日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Distributed Training and Optimization Of Neural Networks

Distributed Training and Optimization Of Neural Networks

Arxiv

0+阅读 · 2021年1月15日

EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs

Arxiv

0+阅读 · 2021年1月14日

A Concurrency-Optimal List-Based Set

Arxiv

0+阅读 · 2021年1月14日

On the quantization of recurrent neural networks

Arxiv

0+阅读 · 2021年1月14日

Label Embedded Dictionary Learning for Image Classification

Label Embedded Dictionary Learning for Image Classification

Arxiv

6+阅读 · 2019年3月7日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Reversible Recurrent Neural Networks

Arxiv

3+阅读 · 2018年10月25日

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Arxiv

11+阅读 · 2018年3月23日

CryptoRec: Secure Recommendations as a Service

Arxiv

6+阅读 · 2018年2月7日

Finding ReMO (Related Memory Object): A Simple Neural Architecture for Text based Reasoning

Arxiv

4+阅读 · 2018年1月26日

VIP会员

文章信息

相关主题

相关VIP内容

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

【O'Reilly TensorFlow Conference 2019】TensorFlow，开源和IBM（TensorFlow, open source, and IBM ），IBM | Fred Reiss

专知会员服务

11+阅读 · 2019年11月14日

【O'Reilly AI Conference 2019】AI成长之路：使用指南（For AI to thrive, failure is necessary: A practical guide (sponsored by IBM Watson)),IBM Ritika Gunnar

【O'Reilly AI Conference 2019】AI成长之路：使用指南（For AI to thrive, failure is necessary: A practical guide (sponsored by IBM Watson)),IBM Ritika Gunnar

专知会员服务

11+阅读 · 2019年11月5日

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

【O'Reilly AI Conference 2019】实时AI实体解析，Real-time AI for entity resolution ，Senzing 的创始人兼首席执行官Jeff Jonas

专知会员服务

10+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

已删除

将门创投

8+阅读 · 2019年1月30日

LibRec 精选：CCF TPCI 的推荐系统专刊征稿

LibRec 精选：CCF TPCI 的推荐系统专刊征稿

LibRec智能推荐

4+阅读 · 2019年1月12日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

人工智能类 | 国际会议/SCI期刊专刊信息9条

人工智能类 | 国际会议/SCI期刊专刊信息9条

Call4Papers

4+阅读 · 2018年7月10日

【论文推荐】最新八篇目标跟踪相关论文—自适应相关滤波、因果关系图模型、TrackingNet、ClickBAIT、图像矩模型

【论文推荐】最新八篇目标跟踪相关论文—自适应相关滤波、因果关系图模型、TrackingNet、ClickBAIT、图像矩模型

专知

4+阅读 · 2018年4月18日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Distributed Training and Optimization Of Neural Networks

Distributed Training and Optimization Of Neural Networks

Arxiv

0+阅读 · 2021年1月15日

EMOGI: Efficient Memory-access for Out-of-memory Graph-traversal In GPUs

Arxiv

0+阅读 · 2021年1月14日

A Concurrency-Optimal List-Based Set

Arxiv

0+阅读 · 2021年1月14日

On the quantization of recurrent neural networks

Arxiv

0+阅读 · 2021年1月14日

Label Embedded Dictionary Learning for Image Classification

Label Embedded Dictionary Learning for Image Classification

Arxiv

6+阅读 · 2019年3月7日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Reversible Recurrent Neural Networks

Arxiv

3+阅读 · 2018年10月25日

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Arxiv

11+阅读 · 2018年3月23日

CryptoRec: Secure Recommendations as a Service

Arxiv

6+阅读 · 2018年2月7日

Finding ReMO (Related Memory Object): A Simple Neural Architecture for Text based Reasoning

Arxiv

4+阅读 · 2018年1月26日

微信扫码咨询专知VIP会员