Processing-in-memory (PIM) seeks to eliminate computation/memory data transfer using devices that support both storage and logic. Stateful logic techniques such as IMPLY, MAGIC and FELIX can perform logic gates within memristive crossbar arrays with massive parallelism. Multiplication via stateful logic is an active field of research due to the wide implications. Recently, RIME has become the state-of-the-art algorithm for stateful single-row multiplication by using memristive partitions, reducing the latency of the previous state-of-the-art by 5.1x. In this paper, we begin by proposing novel partition-based computation techniques for broadcasting and shifting data. Then, we design an in-memory multiplication algorithm based on the carry-save add-shift (CSAS) technique. Finally, we detail specific logic optimizations to the algorithm that further reduce latency. These contributions constitute MultPIM, a multiplier that reduces state-of-the-art time complexity from quadratic to linear-log. For 32-bit numbers, MultPIM improves latency by an additional 3.8x over RIME, while even slightly reducing area overhead. Furthermore, we optimize MultPIM for full-precision matrix-vector multiplication and demonstrate 22.0x latency improvement over FloatPIM matrix-vector multiplication.
翻译:用于支持存储和逻辑的装置的计算/ 模拟数据传输( PIM ) 。 IMPLY、 MAGIC 和 FELIX 等状态逻辑技术可以在弥漫的跨条形阵列内使用极大的平行阵列中执行逻辑门。 由于影响广泛, 以显性逻辑进行乘法是一个积极的研究领域。 最近, RIME 已经成为了使用中间分隔线使状态单行倍增的最先进的算法, 减少了5. 1%x 之前的状态的静态。 在本文中, 我们首先提出新的基于偏移的计算技术, 用于广播和移动数据。 然后, 我们设计了一个基于随附变换( CSAS) 技术的内模数倍倍倍倍倍倍倍倍增算法。 最后, 我们详细介绍了进一步减少延缩缩的算法的具体逻辑优化。 这些贡献构成MultPIM, 一种将州际矩阵复杂性从四度降为线性。 对于32位数数字来说, MultPIM 将最小化的多式矩阵改进, 以微缩缩缩缩缩缩缩图区域 。