隐藏增加的非流动缓存缓存读缓存延迟度的方法 (A Method for Hiding the Increased Non-Volatile Cache Read Latency) - 专知论文

会员服务 ·

0

可约的 · CASES · 容差 · Buffer（公司） · ReQuEST ·

2021 年 12 月 20 日

A Method for Hiding the Increased Non-Volatile Cache Read Latency

翻译：隐藏增加的非流动缓存缓存读缓存延迟度的方法

Apostolos Kokolis,Namrata Mantri,Shrikanth Ganapathy,Josep Torrellas,John Kalamatianos

from arxiv, 14 pages, 15 figures

The increased memory demands of workloads is putting high pressure on Last Level Caches (LLCs). Unfortunately, there is limited opportunity to increase the capacity of LLCs due to the area and power requirements of the underlying SRAM technology. Interestingly, emerging Non-Volatile Memory (NVM) technologies promise a feasible alternative to SRAM for LLCs due to their higher area density. However, NVMs have substantially higher read and write latencies, which offset their area density benefit. Although researchers have proposed methods to tolerate NVM's increased write latency, little emphasis has been placed on reducing the critical NVM read latency. To address this problem, this paper proposes Cloak. Cloak exploits data reuse in the LLC at the page level, to hide NVM read latency. Specifically, on certain L1 TLB misses to a page, Cloak transfers LLC-resident data belonging to the page from the LLC NVM array to a set of small SRAM Page Buffers that will service subsequent requests to this page. Further, to enable the high-bandwidth, low-latency transfer of lines of a page to the page buffers, Cloak uses an LLC layout that accelerates the discovery of LLC-resident cache lines from the page. We evaluate Cloak with full-system simulations of a 4-core processor across 14 workloads. We find that, on average, Cloak outperforms an SRAM LLC by 23.8% and an NVM-only LLC by 8.9% -- in both cases, with negligible additional area. Further, Cloak's ED^2 is 39.9% and 17.5% lower, respectively, than these designs.

翻译：工作量的记忆需求增加对Last level Caches(LLCCs)造成了很大的压力。不幸的是,由于SRAM技术的面积和动力要求,提高LLMC的能力的机会有限。有趣的是,新兴的非Vol内存(NVM)技术为LLCs的SRAM提供了一种可行的替代方案。然而,NVMs的读写迟误率高得多,这抵消了他们的地区密度效益。虽然研究人员提出了容忍NVM增加的写缓冲的方法,但很少强调减少关键的NVM读延时。为了解决这个问题,本文建议 Clok利用LRAM(NRM) 的数据在LRAC 的页面水平和动力上再利用数据再利用(NVM) 。具体地,L1 TLB误读到一页, Cloak 将LLCsaldaldald的数据从LM 阵列转到一套小的SRAMP2 。此外,我们用高频值 OrmalLMLMLRL5 格式向整个LLLLOLLLLLLOs 。

0

相关内容

可约的

【ICML2021】动量残差神经网络

专知会员服务

31+阅读 · 2021年7月19日

【博士论文】非易失内存系统中的写优化和持久化技术研究

专知会员服务

11+阅读 · 2020年12月23日

异构混合并行计算综述

专知会员服务

41+阅读 · 2020年8月14日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

计算机 | ICDE 2020等国际会议信息8条

计算机 | ICDE 2020等国际会议信息8条

Call4Papers

3+阅读 · 2019年5月24日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Authenticated time for detecting GNSS attacks

Arxiv

0+阅读 · 2022年2月22日

Non-Volatile Memory Accelerated Posterior Estimation

Arxiv

0+阅读 · 2022年2月21日

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

Arxiv

0+阅读 · 2022年2月21日

A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning

Arxiv

0+阅读 · 2022年2月21日

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Arxiv

0+阅读 · 2022年2月21日

Lightweight Soft Error Resilience for In-Order Cores

Arxiv

0+阅读 · 2022年2月18日

Enabling Volatile Caches for Energy Harvesting Systems

Arxiv

0+阅读 · 2022年2月18日

VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

Arxiv

0+阅读 · 2022年2月18日

PTR: Prompt Tuning with Rules for Text Classification

Arxiv

7+阅读 · 2021年5月24日

Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach

Arxiv

4+阅读 · 2021年1月19日

VIP会员

文章信息

相关主题

Buffer（公司）

相关VIP内容

【ICML2021】动量残差神经网络

专知会员服务

31+阅读 · 2021年7月19日

【博士论文】非易失内存系统中的写优化和持久化技术研究

专知会员服务

11+阅读 · 2020年12月23日

异构混合并行计算综述

专知会员服务

41+阅读 · 2020年8月14日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

计算机 | ICDE 2020等国际会议信息8条

计算机 | ICDE 2020等国际会议信息8条

Call4Papers

3+阅读 · 2019年5月24日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Authenticated time for detecting GNSS attacks

Arxiv

0+阅读 · 2022年2月22日

Non-Volatile Memory Accelerated Posterior Estimation

Arxiv

0+阅读 · 2022年2月21日

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

Arxiv

0+阅读 · 2022年2月21日

A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning

Arxiv

0+阅读 · 2022年2月21日

Enabling On-Device Smartphone GPU based Training: Lessons Learned

Arxiv

0+阅读 · 2022年2月21日

Lightweight Soft Error Resilience for In-Order Cores

Arxiv

0+阅读 · 2022年2月18日

Enabling Volatile Caches for Energy Harvesting Systems

Arxiv

0+阅读 · 2022年2月18日

VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

Arxiv

0+阅读 · 2022年2月18日

PTR: Prompt Tuning with Rules for Text Classification

Arxiv

7+阅读 · 2021年5月24日

Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach

Arxiv

4+阅读 · 2021年1月19日

微信扫码咨询专知VIP会员