Deep Neural Networks (DNNs) have gained significant interest in the recent past for plethora of applications such as image and video analytics, language translation, and medical diagnosis. High memory bandwidth is required to keep up with the needs of data-intensive DNN applications when implemented on a von-Neumann hardware architecture as majority of the data resides in the main memory. Therefore, processing in memory can provide a promising solution for the memory wall bottleneck for ML workloads. In this work, we propose a DRAM-based processing-in-memory (PIM) multiplication primitive coupled with intra-bank accumulation to accelerate matrix vector operations in ML workloads. Moreover, we propose a processing-in-memory DRAM bank architecture, data mapping and dataflow based on the proposed primitive. System evaluations performed on networks like AlexNet, VGG16 and ResNet18 show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU and an ideal conventional (non-PIM) baseline architecture with infinite compute bandwidth, respectively.
翻译:深神经网络(DNNs)最近对图像和视频分析、语言翻译和医学诊断等大量应用最近变得非常感兴趣,高记忆带宽在von-Neumann硬件结构实施时,需要跟上数据密集的DNN应用程序的需要,因为大多数数据都存在于主记忆中。因此,记忆处理可以为ML工作量的记忆墙瓶颈提供一个有希望的解决方案。在这项工作中,我们提议采用基于DRAM的原始处理和银行内部积累(PIM)倍增,以加速ML工作量的矩阵矢量操作。此外,我们提议采用基于拟议原始系统的数据处理DRAM银行结构、数据测绘和数据流。对AlexNet、VGG16和ResNet18等网络进行的系统评估表明,拟议的结构、绘图和数据流可分别为GPU和带有无限宽宽宽频的理想常规(非PIM)基线结构提供23x和6.5x的效益。