Processing-in-memory (PIM) solutions vastly accelerate systems by reducing data transfer between computation and memory. Memristors possess a unique property that enables storage and logic within the same device, which is exploited in the memristive Memory Processing Unit (mMPU). The mMPU expands fundamental stateful logic techniques, such as IMPLY, MAGIC and FELIX, to high-throughput parallel logic and arithmetic operations within the memory. Unfortunately, memristive processing-in-memory is highly vulnerable to soft errors and this massive parallelism is not compatible with traditional reliability techniques, such as error-correcting-code (ECC). In this paper, we discuss reliability techniques that efficiently support the mMPU by utilizing the same principles as the mMPU computation. We detail ECC techniques that are based on the unique properties of the mMPU to efficiently utilize the massive parallelism. Furthermore, we present novel solutions for efficiently implementing triple modular redundancy (TMR). The short-term and long-term reliability of large-scale applications, such as neural-network acceleration, are evaluated. The analysis clearly demonstrates the importance of high-throughput reliability mechanisms for memristive processing-in-memory.
翻译:通过减少计算和记忆之间的数据传输,处理中处理(PIM)的解决方案大大加速了系统。 记忆者拥有一种独特的属性,能够在同一装置内进行存储和逻辑,该功能在记忆存储处理股(MMPU)中得到利用。 模模模U将基本、有规律的逻辑技术(如IMPLY、MAGIC和FELIX)扩展至记忆中的高通量平行逻辑和算术操作。 不幸的是,内流处理(Mimistry-imory)极易受到软错误的影响,而这种庞大的平行性与传统的可靠性技术(如纠正错误编码(ECC)不相容。 在本文中,我们讨论了能有效地支持模模模U的可靠性技术,使用了与模模模U计算相同的原则。我们详细介绍了ECC技术,这些技术的基础是模模U的独特性,以便有效地利用巨大的平行性。 此外,我们提出了高效实施三重模块冗余(TMRMR)的新解决办法。 大规模应用的短期和长期可靠性,例如神经网络加速(EC)的短期和长期性加速性加速等,正在评估。 高压处理可靠性机制的重要性。