Processing-in-memory (PIM) promises to alleviate the data movement bottleneck in modern computing systems. However, current real-world PIM systems have the inherent disadvantage that their hardware is more constrained than in conventional processors (CPU, GPU), due to the difficulty and cost of building processing elements near or inside the memory. As a result, general-purpose PIM architectures support fairly limited instruction sets and struggle to execute complex operations such as transcendental functions and other hard-to-calculate operations (e.g., square root). These operations are particularly important for some modern workloads, e.g., activation functions in machine learning applications. In order to provide support for transcendental (and other hard-to-calculate) functions in general-purpose PIM systems, we present \emph{TransPimLib}, a library that provides CORDIC-based and LUT-based methods for trigonometric functions, hyperbolic functions, exponentiation, logarithm, square root, etc. We develop an implementation of TransPimLib for the UPMEM PIM architecture and perform a thorough evaluation of TransPimLib's methods in terms of performance and accuracy, using microbenchmarks and three full workloads (Blackscholes, Sigmoid, Softmax). We open-source all our code and datasets at~\url{https://github.com/CMU-SAFARI/transpimlib}.
翻译:处理内存(PIM)技术有望缓解现代计算系统中的数据传输瓶颈。然而,由于构建处理元素接近或内置于内存的困难和成本,当前现实生活中的PIM系统的硬件比传统处理器(CPU、GPU)更受限制。因此,通用PIM架构仅支持相当有限的指令集,并且难以执行复杂的运算,例如超越函数和其他难以计算的运算(例如平方根)。这些运算对于一些现代工作负载特别重要,例如机器学习应用中的激活函数。为了在通用PIM系统中提供对超越函数(和其他难以计算的运算)的支持,我们提出了TransPimLib,这是一个库,提供基于CORDIC和LUT的三角函数、双曲函数、指数、对数、平方根等方法。我们为UPMEM PIM架构开发了TransPimLib的实现,并使用微基准和三个完整的工作负载(Blackscholes、Sigmoid、Softmax)详细评估了TransPimLib的方法的性能和准确性。我们在~\url{https://github.com/CMU-SAFARI/transpimlib}开源了所有的代码和数据集。