Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.
翻译:神经网络越来越重要且越来越复杂。神经网络的性能(和能源效率)可能会受到计算或内存资源的限制。处理嵌入内存(PIM)模式使得在内存数组附近或内部放置计算成为加速受内存绑定的神经网络的可行解决方案。然而,PIM架构在形式上存在差异,不同的PIM方法会导致不同的折衷。我们的目标是分析,讨论和对比针对NN性能和能源效率的基于DRAM的PIM架构。为此,我们分析了三种最先进的PIM体系结构:(1)将处理器和DRAM阵列集成为单个2D芯片的UPMEM;(2)Mensa,一种专门为Edge设备量身定制的基于3D堆叠的PIM架构;(3)SIMDRAM,它使用DRAM的模拟原理来执行位串行操作。我们的分析表明,PIM极大地提高了受内存限制的NN的性能:(1)当GPU需要进行内存过量分配时,UPMEM提供了比高端GPU高23倍的性能,以进行常规矩阵-向量乘法核心;(2)针对24个Google Edge NN模型,Mensa的能源效率和吞吐量分别提高了3.0倍和3.1倍,超过了Google Edge TPU;(3)SIMDRAM在三个二进制NN上优于CPU / GPU 16.7x / 1.4x。我们总结出,基于NN模型的独特属性,理想的PIM架构取决于其固有的架构设计选择。