Persistent Memory (PM) technologies enable program recovery to a consistent state in a case of failure. To ensure this crash-consistent behavior, programs need to enforce persist ordering by employing mechanisms, such as logging and checkpointing, which introduce additional data movement. The emerging near-data processing (NDP) architec-tures can effectively reduce this data movement overhead. In this work we propose NearPM, a near data processor that supports accelerable primitives in crash consistent programs. Using these primitives NearPM accelerate commonly used crash consistency mechanisms logging, checkpointing, and shadow-paging. NearPM further reduces the synchronization overheads between the NDP and the CPU to guarantee persistent ordering by moving ordering handling near memory. We ensures a correct persist ordering between CPU and NDP devices, as well as among multiple NDP devices with Partitioned Persist Ordering (PPO). We prototype NearPM on an FPGA platform.1 NearPM executes data-intensive operations in crash consistency mechanisms with correct ordering guarantees while the rest of the program runs on the CPU. We evaluate nine PM workloads, where each work load supports three crash consistency mechanisms -logging, checkpointing, and shadow paging. Overall, NearPM achieves 4.3-9.8X speedup in the NDP-offloaded operations and 1.22-1.35X speedup in end-to-end execution.
翻译:持久内存(PM)技术使程序可以在出现故障时恢复到一致的状态。为了确保这种崩溃一致的行为,程序需要使用机制来强制执行持久有序性,例如日志和检查点,这些机制会引入额外的数据移动。新兴的近数据处理(NDP)架构可以有效地减少这种数据移动开销。在本文中,我们提出了一种名为NearPM的近数据处理器,该处理器支持可加速的原语,可在崩溃一致性的程序中使用。使用这些原语,NearPM加速了常用的崩溃一致性机制——日志、检查点和影子分页。NearPM通过将排序处理移动到靠近内存的位置,进一步减少了NDP和CPU之间的同步开销,以保证持久排序。我们使用分区持久排序(PPO)确保了CPU和NDP设备之间以及多个NDP设备之间的正确持久排序。我们在FPGA平台上实现了NearPM。 NearPM在崩溃一致机制中执行数据密集型操作时具有正确的排序保证,而程序的其余部分在CPU上运行。我们评估了九个PM工作负载,其中每个工作负载支持三种崩溃一致性机制——日志、检查点和影子分页。总的来说,NearPM在NDP卸载操作中实现了4.3-9.8倍的加速,并在端到端执行中实现了1.22-1.35倍的加速。