Digital processing-in-memory (PIM) architectures mitigate the memory wall problem by facilitating parallel bitwise operations directly within the memory. Recent works have demonstrated their algorithmic potential for accelerating data-intensive applications; however, there remains a significant gap in the programming model and microarchitectural design. This is further exacerbated by aspects unique to memristive PIM such as partitions and operations across both directions of the memory array. To address this gap, this paper provides an end-to-end architectural integration of digital memristive PIM from a high-level Python library for tensor operations (similar to NumPy and PyTorch) to the low-level microarchitectural design. We begin by proposing an efficient microarchitecture and instruction set architecture (ISA) that bridge the gap between the low-level control periphery and an abstraction of PIM parallelism. We subsequently propose a PIM development library that converts high-level Python to ISA instructions and a PIM driver that translates ISA instructions into PIM micro-operations. We evaluate PyPIM via a cycle-accurate simulator on a wide variety of benchmarks that both demonstrate the versatility of the Python library and the performance compared to theoretical PIM bounds. Overall, PyPIM drastically simplifies the development of PIM applications and enables the conversion of existing tensor-oriented Python programs to PIM with ease.
翻译:暂无翻译