With the rapidly growing use of Convolutional Neural Networks (CNNs) in real-world applications related to machine learning and Artificial Intelligence (AI), several hardware accelerator designs for CNN inference and training have been proposed recently. In this paper, we present ATRIA, a novel bit-pArallel sTochastic aRithmetic based In-DRAM Accelerator for energy-efficient and high-speed inference of CNNs. ATRIA employs light-weight modifications in DRAM cell arrays to implement bit-parallel stochastic arithmetic based acceleration of multiply-accumulate (MAC) operations inside DRAM. ATRIA significantly improves the latency, throughput, and efficiency of processing CNN inferences by performing 16 MAC operations in only five consecutive memory operation cycles. We mapped the inference tasks of four benchmark CNNs on ATRIA to compare its performance with five state-of-the-art in-DRAM CNN accelerators from prior work. The results of our analysis show that ATRIA exhibits only 3.5% drop in CNN inference accuracy and still achieves improvements of up to 3.2x in frames-per-second (FPS) and up to 10x in efficiency (FPS/W/mm2), compared to the best-performing in-DRAM accelerator from prior work.
翻译:随着在机器学习和人工智能(AI)相关实际应用中迅速越来越多地使用革命神经网络(CNN),最近为CNN的推理和培训提出了几项硬件加速器设计,本文介绍了以DRAM为基础的以DRAM为主、以节能和高速推导CNN为主、以DRAM为主、以DRAM为主、以节能和高速推导为主、以DRAM为主的动态神经网络(CNN)加速应用。ATRIA通过仅在连续5个记忆操作周期内进行16个CNN推理操作,大大改进了CNN推理的延度、吞吐量和处理效率。我们绘制了ATRAM的4个基准CNNC加速器的推论任务,以便将其性能与此前工作的5个最先进的DRAM 加速器进行对比。我们的分析结果表明,ATRIA显示,在TRA中,从3.5 %-CM(CNNC-FAP)中,比AFS之前的精确度为3.2%,现在比AFA/FAFF322。