This paper presents a low cost PMOS-based 8T (P-8T) SRAM Compute-In-Memory (CIM) architecture that efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights. First, bit-line (BL) charge-sharing technique is employed to design the low-cost and reliable digital-to-analog conversion of 4-bit input activations in the pro-posed SRAM CIM, where the charge domain analog computing provides variation tolerant and linear MAC outputs. The 16 local arrays are also effectively exploited to implement the analog mul-tiplication unit (AMU) that simultaneously produces 16 multipli-cation results between 4-bit input activations and 1-bit weights. For the hardware cost reduction of analog-to-digital converter (ADC) without sacrificing DNN accuracy, hardware aware sys-tem simulations are performed to decide the ADC bit-resolutions and the number of activated rows in the proposed CIM macro. In addition, for the ADC operation, the AMU-based reference col-umns are utilized for generating ADC reference voltages, with which low-cost 4-bit coarse-fine flash ADC has been designed. The 256X80 P-8T SRAM CIM macro implementation using 28nm CMOS process shows that the proposed CIM shows the accuracies of 91.46% and 66.67% with CIFAR-10 and CIFAR-100 dataset, respectively, with the energy efficiency of 50.07-TOPS/W.
翻译:本文展示了一种成本低的 PMSS 8T (P-8T) SRAM 的 SRAM 计算-In-Meory (CIM) 结构,该结构高效地将4位输入启动和8位重量之间的倍增累积(MAC)操作按每4位输入启动和8位重量之间进行。首先,Bit-line (BL) 收费共享技术用于设计以低成本和可靠的数字转换 4位输入转换 SRAM CIM, 充电域模拟计算提供可调和线性MAC 输出。16个本地阵列也被有效地利用来实施模拟模变(AM) 模增量单位(AMU),同时产生16位输入启动和1位重量之间的倍增量计算结果。对于在不牺牲 DNN 准确性的情况下降低模拟四位数字转换的硬件成本, 进行了了解Sy-tem 模拟, 以决定 ADC BIDS 和 CIM 宏中的活动行数。此外, AS-MS-O-O-RODS 运行中, 正在使用AMA-r-C-C-S-RODS-C-C-C-C-RODS-C-C-C-C-RODS-C-C-C-C-C-C-C-C-C-C-RODS-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-S-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-S-SDSDS-S-S-S-S-SDS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S