The increasing prevalence and growing size of data in modern applications have led to high costs for computation in traditional processor-centric computing systems. Moving large volumes of data between memory devices (e.g., DRAM) and computing elements (e.g., CPUs, GPUs) across bandwidth-limited memory channels can consume more than 60% of the total energy in modern systems. To mitigate these costs, the processing-in-memory (PIM) paradigm moves computation closer to where the data resides, reducing (and in some cases eliminating) the need to move data between memory and the processor. There are two main approaches to PIM: (1) processing-near-memory (PnM), where PIM logic is added to the same die as memory or to the logic layer of 3D-stacked memory; and (2) processing-using-memory (PuM), which uses the operational principles of memory cells to perform computation. Many works from academia and industry have shown the benefits of PnM and PuM for a wide range of workloads from different domains. However, fully adopting PIM in commercial systems is still very challenging due to the lack of tools and system support for PIM architectures across the computer architecture stack, which includes: (i) workload characterization methodologies and benchmark suites targeting PIM architectures; (ii) frameworks that can facilitate the implementation of complex operations and algorithms using the underlying PIM primitives; (iii) compiler support and compiler optimizations targeting PIM architectures; (iv) operating system support for PIM-aware virtual memory, memory management, data allocation, and data mapping; and (v) efficient data coherence and consistency mechanisms. Our goal in this work is to provide tools and system support for PnM and PuM architectures, aiming to ease the adoption of PIM in current and future systems.
翻译:在现代应用中,数据日益普遍且规模不断扩大,导致在传统处理器中心计算系统中计算成本高,导致在传统处理器中心计算系统中计算数据的成本高。 移动在带宽内存频道之间存储设备(例如DRAM)和计算元素(例如CPU、GPUs)之间的大量数据可以消耗现代系统总能量的60%以上。 为了减轻这些费用,处理中模(PIM)模式将计算更接近数据所在地,减少(并在某些情况下消除)在存储器和处理器之间移动数据的必要性。 PIM有两种主要的方法:(1) 处理离离离离离线的存储器设备(例如DRAM)和计算元元元素(PnM)之间的大量数据,PIM逻辑与存储存储器的计算或3D-stappack内存的逻辑层(PuM)相匹配大量数据,PnM和PM支持来自不同域的广泛工作量机制。 但是,在商业系统中完全采用PIM(PIM)的原始操作-nerialM操作系统,在使用PIM的流程中,在使用Sma 目标结构中仍然具有挑战性,在缺乏数据工具。