As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute nodes, or some portion of these data objects must be stored in a non-volatile block device such as a hard disk drive or an SSD storage device. Optane DataCenter Persistent Memory Module (DCPMM), a new technology introduced by Intel, provides non-volatile memory that can be plugged into standard memory bus slots and therefore be accessed much faster than standard storage devices. In this work, we present and analyze the results of a comprehensive performance assessment of several ways in which DCPMM can 1) replace standard storage devices, and 2) replace or augment DRAM for improving the performance of HPC scientific computations. To achieve this goal, we have configured an HPC system such that DCPMM can service I/O operations of scientific applications, replace standard storage devices and file systems (specifically for diagnostics and checkpoint-restarting), and serve for expanding applications' main memory. We focus on keeping the scientific codes with as few changes as possible, while allowing them to access the NVM transparently as if they access persistent storage. Our results show that DCPMM allows scientific applications to fully utilize nodes' locality by providing them with sufficiently-large main memory. Moreover, it can be used for providing a high-performance replacement for persistent storage. Thus, the usage of DCPMM has the potential of replacing standard HDD and SSD storage devices in HPC architectures and enabling a more efficient platform for modern supercomputing applications.
翻译:随着高性能计算世界向Exa-size 时代移动,应当对大量数据进行分析、操作和储存。在传统的存储/模拟等级中,每个计算节点都保留其数据对象。每当DRAM的能力不足以存储这些数据时,计算要么在数个计算节点之间进行分配,或者部分数据对象必须储存在非挥发性块装置中,如硬盘驱动器或SSD存储装置。 Optane DataCenter Nepern MemoryMy模块(DCPMMM)是英特尔引进的新技术,提供非挥发性存储存储存储存储存储器,可将其连接到标准存储器中,因此访问速度比标准存储装置要快得多。在这项工作中,我们介绍和分析对DRAMM能够取代标准存储器的几种方法的全面性绩效评估结果,例如硬盘驱动器或SSDM存储器的性能。为了实现这一目标,我们为DCPMDDDD提供少量的科学应用,可以取代标准存储和透明性存储器,同时为我们的主要存储器和文件系统提供不断更新的存储和升级。