Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.
翻译:神经网络的重要性和复杂性正在增加。 神经网络的性能( 和能源效率) 可以通过计算或记忆资源来约束。 将计算放在记忆阵列附近或内部的模拟( PIM) 模式是加速记忆内 NNS 的可行解决办法。 但是, PIM 结构的形式不同, 不同的 PIM 方法导致不同的取舍。 我们的目标是分析、讨论和对比基于 DRAM 的 NNW 性能和能源效率的 PIM 结构。 为此, 我们分析了三种最先进的 PIM 结构:(1) UPMEM, 将处理器和 DRA 阵列整合到一个2D存储阵列内阵列中;(2) Mensa, 3D- Stack PIM 架构, 专门为边缘设备定制;(3) SIMRAM, 使用 DRAM 的类比原则来执行比级操作。 我们的分析显示, PIM 大大有利于内存的 NNNW ; (1) UPMEM 提供高端G 3 模型的绩效, 当GPUx 的 CIMx 常规结构需要多级设计时, 通过 GIMx, 通过 GIPIMex 将一个不同的G 的模型用于通用的G 。