Digital memristive processing-in-memory overcomes the memory wall through a fundamental storage device capable of stateful logic within crossbar arrays. Dynamically dividing the crossbar arrays by adding memristive partitions further increases parallelism, thereby overcoming an inherent trade-off in memristive processing-in-memory. The algorithmic topology of partitions is highly unique, and was recently exploited to accelerate multiplication (11x with 32 partitions) and sorting (14x with 16 partitions). Yet, the physical implementation of memristive partitions, such as the peripheral decoders and the control message, has never been considered and may lead to vast impracticality. This paper overcomes that challenge with several novel techniques, presenting efficient practical designs of memristive partitions. We begin by formalizing the algorithmic properties of memristive partitions into serial, parallel, and semi-parallel operations. Peripheral overhead is addressed via a novel technique of half-gates that enables efficient decoding with negligible overhead. Control overhead is addressed by carefully reducing the operation set of memristive partitions, while resulting in negligible performance impact, by utilizing techniques such as shared indices and pattern generators. Ultimately, these efficient practical solutions, combined with the vast algorithmic potential, may revolutionize digital memristive processing-in-memory.
翻译:数字化处理( 数字化处理- 模拟) 克服了内存墙 。 通过一个基本存储装置, 能够在十字栏阵列内有明确逻辑 。 通过添加中间分隔, 动态地分割横截阵列, 进一步增加了平行性, 从而克服了中间处理( 模拟) 的内在取舍。 分区的算法地形非常独特, 最近被利用来加速倍增( 11x 32个分区) 和排序( 14x 16个分区 ) 。 然而, 实际实施中间分割, 如周边解密器和控制信息, 从未被考虑过, 并可能导致巨大的不切实际性。 本文用一些新颖技术克服了这个挑战, 展示了中间分割法的实用性设计。 我们从将混合分割的算法特性正规化到序列、 平行和半平行操作。 极空的间接间接间接间接处理技术, 通过共享的混合处理方法, 从而实现最小的模型化。