Today's computing systems require moving data back-and-forth between computing resources (e.g., CPUs, GPUs, accelerators) and off-chip main memory so that computation can take place on the data. Unfortunately, this data movement is a major bottleneck for system performance and energy consumption. One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging applications is processing-in-memory (PIM), where the cost of data movement to/from main memory is reduced by placing computation capabilities close to memory. Naively employing PIM to accelerate data-intensive workloads can lead to sub-optimal performance due to the many design constraints PIM substrates impose. Therefore, many recent works co-design specialized PIM accelerators and algorithms to improve performance and reduce the energy consumption of (i) applications from various application domains; and (ii) various computing environments, including cloud systems, mobile systems, and edge devices. We showcase the benefits of co-designing algorithms and hardware in a way that efficiently takes advantage of the PIM paradigm for two modern data-intensive applications: (1) machine learning inference models for edge devices and (2) hybrid transactional/analytical processing databases for cloud systems. We follow a two-step approach in our system design. In the first step, we extensively analyze the computation and memory access patterns of each application to gain insights into its hardware/software requirements and major sources of performance and energy bottlenecks in processor-centric systems. In the second step, we leverage the insights from the first step to co-design algorithms and hardware accelerators to enable high-performance and energy-efficient data-centric architectures for each application.
翻译:今天的计算系统需要在计算资源(如CPU、GPU、加速器、加速器)和离芯主内存之间移动数据,以便计算数据。 不幸的是,这一数据移动是系统性能和能源消耗的一个主要瓶颈。一个有希望的执行模式是减少数据流动在现代和新兴应用程序中的瓶颈,即处理模拟(PIM),使计算能力接近记忆,从而降低数据向主内存移动/从主内存移动的成本。利用PIM加速数据密集工作量可以导致亚精度预估性内存性,因为许多设计上的限制。因此,许多最近的工程共同指定了专门的PIM加速器和能源消耗的瓶颈,以改善数据在现代应用领域的性能和能量消耗;以及(二)各种计算机环境,包括云系统、移动系统和边缘装置。我们展示了共同设计算法和硬件的好处,以便首先有效地利用PIM系统性能-直达性能性能,因为PIM的预估性能性能性能,因此,PIM亚精度的次优性能性能性能性能性能性能性能性能性能表现。 因此,我们的两个高级数据-计算系统,学习了两部内极级的精度设计系统,我们两个的精度的精度-级的精度-直径性能-分析系统,我们学习了两部的精度-直径性能-直径性能-直径性能-系统,我们学习了两部的计算系统,学习了两部的能量-直径性能-直径性能-机能-直达性能-机能-直达性能-机能-直达性能-机能-直达性能-机能-机能-机能-直达性能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能-机能