Deep neural networks (DNN) have been proved for its effectiveness in various areas such as classification problems, image processing, video segmentation, and speech recognition. The accelerator-in-memory (AiM) architectures are a promising solution to efficiently accelerate DNNs as they can avoid the memory bottleneck of the traditional von Neumann architecture. As the main memory is usually DRAM in many systems, a highly parallel multiply-accumulate (MAC) array within the DRAM can maximize the benefit of AiM by reducing both the distance and amount of data movement between the processor and the main memory. This paper presents an analog MAC array based AiM architecture named MAC-DO. In contrast with previous in-DRAM accelerators, MAC-DO makes an entire DRAM array participate in MAC computations simultaneously without idle cells, leading to higher throughput and energy efficiency. This improvement is made possible by exploiting a new analog computation method based on charge steering. In addition, MAC-DO innately supports multi-bit MACs with good linearity. MAC-DO is still compatible with current 1T1C DRAM technology without any modifications of a DRAM cell and array. A MAC-DO array can accelerate matrix multiplications based on output stationary mapping and thus supports most of the computations performed in DNNs. Our evaluation using transistor-level simulation shows that a test MAC-DO array with 16 x 16 MAC-DO cells achieves 188.7 TOPS/W, and shows 97.07% Top-1 accuracy for MNIST dataset without retraining.
翻译:深心神经网络(DNNN)在分类问题、图像处理、视频分解和语音识别等多个领域都证明了其有效性。加速器-模拟(AiM)架构是高效加速DNN(因为可以避免传统冯纽曼架构的记忆瓶颈)的一个很有希望的解决办法。由于主要记忆通常是许多系统中的DRAM(DRAM),DRAM内部一个高度平行的倍增累积(MAC)阵列能够通过减少处理器和主内存之间数据移动的距离和数量,最大限度地发挥AIM的效益。本文展示了以AiM(AIM)结构命名为MAC-DO的模拟MAC(MA-im)端阵列阵列。与以前的DRA加速器相比,MAC-DA使整个DRA阵列可以在没有闲置单元格的情况下同时参与MAC的计算,导致更高的吞吐量和能源效率。通过利用基于电路导的新模拟计算方法,MAC-DO(MAC-DO-DO-MAC-MAC-D-DMAC-D-DRD-C)在不使用目前测试的DRADRDRADRD-C(D-DDDD-D-D-C)中显示一个不升级的多式的测试阵列。