SIMRAM: DRAM 中位序列SIMD 计算端到端框架 (SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM)

Nastaran Hajinazar,Geraldo F. Oliveira,Sven Gregorio,João Ferreira,Nika Mansouri Ghiasi,Minesh Patel,Mohammed Alser,Saugata Ghose,Juan Gómez Luna,Onur Mutlu

from arxiv, This is an extended version of the paper that appeared at ASPLOS 2021

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex operations. In this paper, we propose SIMDRAM, a flexible general-purpose processing-using-DRAM framework that (1) enables the efficient implementation of complex operations, and (2) provides a flexible mechanism to support the implementation of arbitrary user-defined operations. The SIMDRAM framework comprises three key steps. The first step builds an efficient MAJ/NOT representation of a given desired operation. The second step allocates DRAM rows that are reserved for computation to the operation's input and output operands, and generates the required sequence of DRAM commands to perform the MAJ/NOT implementation of the desired operation in DRAM. The third step uses the SIMDRAM control unit located inside the memory controller to manage the computation of the operation from start to end, by executing the DRAM commands generated in the second step of the framework. We design the hardware and ISA support for SIMDRAM framework to (1) address key system integration challenges, and (2) allow programmers to employ new SIMDRAM operations without hardware changes. We evaluate SIMDRAM for reliability, area overhead, throughput, and energy efficiency using a wide range of operations and seven real-world applications to demonstrate SIMDRAM's generality. Using 16 DRAM banks, SIMDRAM provides (1) 88x and 5.8x the throughput, and 257x and 31x the energy efficiency, of a CPU and a high-end GPU, respectively, over 16 operations; (2) 21x and 2.1x the performance of the CPU and GPU, over seven real-world applications. SIMDRAM incurs an area overhead of only 0.2% in a high-end CPU.

翻译：已经为有限的一套基本业务(即逻辑操作和添加)提出了使用DRAM的灵活通用处理框架。但是,为了能够充分采用使用DRAM,有必要为更复杂的业务提供支持。在本文件中,我们提议SIMRAM(一个灵活的通用处理使用DRAM(一个灵活的通用DRAM)框架,它(1) 能够有效地实施复杂的业务,(2) 提供一个灵活的机制,支持执行任意用户定义的行动。SIMDRAM框架由三个关键步骤组成。第一步(1) 建立一个高效的MAJ/NOT代表特定理想业务。第二步将DRAMD(一个预留用于计算该业务投入和输出的DRAMD(DR)行)分配给DRAM(D)的DRA(DR)(DR)(DR)(DR)(S-RM)(S-RMD(S-RM)(S-RMD) (DR) (D) (DRMD) (一个高级运行和DR(S-DR) (S-L) (D) (C) (DR) (G) (G) (DRDR) (G) (G) (G) (DRD) (D) (G) (G) (D) (DRDR) (D) (D) (D) (D) (DRDRD) (D) (D) (D) (D) (D) (D) (D) (D) (D) (D) (D) (DRD) (D) (D) (DRDR) (D) (D) (D) (DR) (D) (D) (D) (D) (D) (运行和(D) (D) (D) (D) (D) (D) (D) (D) (D) (DRDRDRDRDRDRDRDMDMDRDRD) (DR) (D) (D) (D) (DR) (D) (DRD) (D) (D) (D) (D) (D) (DAD) (D) (D) (D) (D) (D) (D) (D) (D) (