Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.
翻译:面向内存中处理 (PIM) 的实现能够减轻现代计算系统中的数据移动瓶颈。然而,当前实际 PIM 系统存在固有劣势,它们的硬件受限于常规处理器 (CPU、GPU) 中的硬件,因为在内存附近或内部构建处理元素举步维艰并且代价高昂。因此,通用 PIM 架构支持的指令集相对有限,难以执行诸如超越函数和其他难以计算的操作 (例如平方根) 等复杂操作。这些操作对于一些现代工作负载尤其重要,例如机器学习应用程序中的激活函数。为了在通用 PIM 系统中提供超越函数 (和其他难以计算的) 函数的支持,我们提出了一个名为 TransPimLib 的库,该库提供基于 CORDIC 和 LUT 的三角函数、双曲函数、指数、对数、平方根等方法。我们为 UPMEM PIM 架构开发了 TransPimLib 的实现,并使用微基准和三个完整的工作负载 (Blackscholes、Sigmoid、Softmax) 对 TransPimLib 的方法进行了精确度和性能方面的全面评估。我们在~\url{https://github.com/CMU-SAFARI/transpimlib} 上开源了所有我们的代码和数据集。