The increased memory demands of workloads is putting high pressure on Last Level Caches (LLCs). Unfortunately, there is limited opportunity to increase the capacity of LLCs due to the area and power requirements of the underlying SRAM technology. Interestingly, emerging Non-Volatile Memory (NVM) technologies promise a feasible alternative to SRAM for LLCs due to their higher area density. However, NVMs have substantially higher read and write latencies, which offset their area density benefit. Although researchers have proposed methods to tolerate NVM's increased write latency, little emphasis has been placed on reducing the critical NVM read latency. To address this problem, this paper proposes Cloak. Cloak exploits data reuse in the LLC at the page level, to hide NVM read latency. Specifically, on certain L1 TLB misses to a page, Cloak transfers LLC-resident data belonging to the page from the LLC NVM array to a set of small SRAM Page Buffers that will service subsequent requests to this page. Further, to enable the high-bandwidth, low-latency transfer of lines of a page to the page buffers, Cloak uses an LLC layout that accelerates the discovery of LLC-resident cache lines from the page. We evaluate Cloak with full-system simulations of a 4-core processor across 14 workloads. We find that, on average, Cloak outperforms an SRAM LLC by 23.8% and an NVM-only LLC by 8.9% -- in both cases, with negligible additional area. Further, Cloak's ED^2 is 39.9% and 17.5% lower, respectively, than these designs.
翻译:工作量的记忆需求增加对Last level Caches(LLCCs)造成了很大的压力。 不幸的是,由于SRAM技术的面积和动力要求,提高LLMC的能力的机会有限。有趣的是,新兴的非Vol内存(NVM)技术为LLCs的SRAM提供了一种可行的替代方案。然而,NVMs的读写迟误率高得多,这抵消了他们的地区密度效益。虽然研究人员提出了容忍NVM增加的写缓冲的方法,但很少强调减少关键的NVM读延时。为了解决这个问题,本文建议 Clok利用LRAM(NRM) 的数据在LRAC 的页面水平和动力上再利用数据再利用(NVM) 。具体地,L1 TLB误读到一页, Cloak 将LLCsaldaldald的数据从LM 阵列转到一套小的SRAMP2 。此外,我们用高频值 OrmalLMLMLRL5 格式向整个LLLLOLLLLLLOs 。