Stencil computation is one of the most used kernels in a wide variety of scientific applications, ranging from large-scale weather prediction to solving partial differential equations. Stencil computations are characterized by three unique properties: (1) low arithmetic intensity, (2) limited temporal data reuse, and (3) regular and predictable data access pattern. As a result, stencil computations are typically bandwidth-bound workloads, which only experience limited benefits from the deep cache hierarchy of modern CPUs. In this work, we propose Casper, a near-cache accelerator consisting of specialized stencil compute units connected to the last-level cache (LLC) of a traditional CPU. Casper is based on two key ideas: (1) avoiding the cost of moving rarely reused data through the cache hierarchy, and (2) exploiting the regularity of the data accesses and the inherent parallelism of the stencil computation to increase the overall performance. With minimal changes in LLC address decoding logic and data placement, Casper performs stencil computations at the peak bandwidth of the LLC. We show that, by tightly coupling lightweight stencil compute units near to LLC, Casper improves the performance of stencil kernels by 1.65x on average, while reducing the energy consumption by 35% compared to a commercial high-performance multi-core processor. Moreover, Casper provides a 37x improvement in performance-per-area compared to a state-of-the-art GPU.
翻译:Stencils计算是各种科学应用中最常用的内核之一,从大规模天气预测到解决部分差异方程式。Stencils计算有三种独特的特性:(1) 低算术强度,(2) 有限的时间数据再利用,(3) 定期和可预测的数据存取模式。结果,Stencils计算通常是带宽工作量,从现代CPU的深层缓存等级中只得到有限的好处。在这项工作中,我们提议Casper(一个接近缓冲加速器),由与传统CPU最后一级缓存(LLC)连接的专门的静态计算器组成。Caser基于两个关键想法:(1) 避免将极少再利用的数据通过缓存等级转移的成本,(3) 利用数据存的正常性以及Stencils计算固有的平行性来提高总体性能。在LLC地址解码逻辑和数据放置方面的变化很小,Casacer-capper在LLC的峰值带宽度带中进行钢极计算。我们通过稳定性能比LC公司平均性能比LC的Sral-ral-x标准,我们通过稳定性能比Lical-ral-ral-ral-ral-ral-ral-ral-de-xxxxxxxx制制制制制的高级性能向高级性能向高级平级平级平级平平级平级平级计算。我们,我们制制制制制制制制制制制制制能,通过稳定性能到高分分制制制制制制制制制制制制制制制制能到高分制制制制能到高分制制制制制制制能到高压到高压到高压器,我们制制制制制制制制制能到高压制制制制制制制制制制制制制制能到高压器,我们。我们制能到高压器,我们制制能到高压制制制制制制能到高压到高压制能到制制制制制制制制制制制能到制制制制制制制制制能到制制制制制制制能到高压压压压制制制能到高压制能到高压制制制制制</s>