Data movement between main memory and the processor is a significant contributor to the execution time and energy consumption of memory-intensive applications. This data movement bottleneck can be alleviated using Processing-in-Memory (PiM), which enables computation inside the memory chip. However, existing PiM architectures often lack support for complex operations, since supporting these operations increases design complexity, chip area, and power consumption. We introduce pLUTo (processing-in-memory with lookup table [LUT] operations), a new DRAM substrate that leverages the high area density of DRAM to enable the massively parallel storing and querying of lookup tables (LUTs). The use of LUTs enables the efficient execution of complex operations in-memory, which has been a long-standing challenge in the domain of PiM. When running a state-of-the-art binary neural network in a single DRAM subarray, pLUTo outperforms the baseline CPU and GPU implementations by $33\times$ and $8\times$, respectively, while simultaneously achieving energy savings of $110\times$ and $80\times$.
翻译:主内存和处理器之间的数据移动是执行时间和内存密集应用的能量消耗的一个重要因素。数据移动瓶颈可以通过处理存储器(PiM)来缓解,从而能够在内存芯片内进行计算。然而,现有的PiM结构往往缺乏对复杂操作的支持,因为支持这些操作会增加设计复杂性、芯片面积和电耗。我们引入了pLUTO(带外观表的处理-模件操作)),一个新的DRAM基质,利用DRAM的高面积密度使外观表(LUTs)能够大规模平行存储和查询。LUTs的使用使复杂的模拟操作能够高效进行,而这是PimM领域长期存在的一个挑战。当一个单一的DRAM 亚拉里运行一个最先进的二进神经网络时,pLUTo比CUP和GPU的基线执行率分别高出33\time和8\time$,同时实现了110\time的节能。