确定新范例基准:对实际处理记忆结构的实验分析 (Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture)

Many modern workloads, such as neural networks, databases, and graph processing, are fundamentally memory-bound. For such workloads, the data movement between main memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of main memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new 3D-stacked memory technologies that integrate memory with a logic layer where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. This paper provides the first comprehensive analysis of the first publicly-available real-world PIM architecture. We make two key contributions. First, we conduct an experimental characterization of the UPMEM-based PIM system using microbenchmarks to assess various architecture limits such as compute throughput and memory bandwidth, yielding new insights. Second, we present PrIM, a benchmark suite of 16 workloads from different application domains (e.g., linear algebra, databases, graph processing, neural networks, bioinformatics).

翻译：许多现代工作量,如神经网络、数据库和图解处理,基本上都是记忆性的。对于这些工作量,主要记忆和CPU核心之间的数据流动在潜伏和能量两方面都造成了巨大的间接成本。一个主要原因是,这种通信是通过一条狭窄的公交车进行的,该公交车带有高潜伏和有限的带宽,在内存工作量中数据再利用率低,不足以分散主要记忆存取的成本。从根本上解决这一数据流动瓶颈需要一种模式,即记忆系统通过整合处理能力在计算中发挥积极作用。这一模式被称为模拟(PIM)处理。最近的研究探索了不同形式的PIM结构结构,其动机是出现新的3D型缓冲存储技术,将记忆与易于放置处理要素的逻辑层结合起来。过去的工作在模拟中或者最好用简化的硬件原型来评估这些结构。相比之下,UPMEM公司设计和制造了第一个公开使用的实时直径直径直径直图结构。本文首次全面分析了目前可公开使用的直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直的网络直径直径直径直径直径直径直径直路路路径直径直路径直径直径直径直径直的网络直径直径直路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路,我们。我们。