新方法与基准套件DAMOV：评估数据移动瓶颈的新方法与基准套件 (DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks)

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.

翻译：CPU与主存储器之间的数据移动是当代系统提高性能、可扩展性和能源效率的首要障碍。计算机系统采用了各种技术来减少与数据移动相关的开销，从传统机制（如深层多级缓存层次结构、激进的硬件预取器）到新兴技术（如Near-Data Processing (NDP)，在这种技术中，某些计算被移到靠近内存的位置。我们的目标是在广泛的应用程序集上系统地识别潜在的数据移动源，并将传统的计算中心的数据移动缓解技术与更加内存中心的技术进行全面比较，从而全面了解缓解每个数据移动源的最佳技术。为此，我们对各种应用程序进行了首次大规模特征化，跨越了各种应用领域，以识别导致从主存储器到其的数据移动的基本程序属性。我们开发了第一个系统的方法来分类应用程序，基于导致数据移动瓶颈的源。从我们对345个应用程序中的77K函数进行的大规模特征化中，我们选择144个函数组成第一个开源基准测试套件（DAMOV），用于主存储器数据移动研究。我们选择了各种类型的函数，它们代表了不同类型的数据移动瓶颈，并来自各种应用领域。通过NDP的案例研究，我们发现了不同的数据移动瓶颈，并利用这些见解确定了特定应用程序最适合的数据移动缓解机制。我们在https://github.com/CMU-SAFARI/DAMOV上开源了DAMOV和我们新的特征化方法的完整源代码。