The emerging CXL.mem standard provides a new type of byte-addressable remote memory with a variety of memory types and hierarchies. With CXL.mem, multiple layers of memory -- e.g., local DRAM and CXL-attached remote memory at different locations -- are exposed to operating systems and user applications, bringing new challenges and research opportunities. Unfortunately, since CXL.mem devices are not commercially available, it is difficult for researchers to conduct systems research that uses CXL.mem. In this paper, we present our ongoing work, CXLMemSim, a fast and lightweight CXL.mem simulator for performance characterization. CXLMemSim uses a performance model driven using performance monitoring events, which are supported by most commodity processors. Specifically, CXLMemSim attaches to an existing, unmodified program, and divides the execution of the program into multiple epochs; once an epoch finishes, CXLMemSim collects performance monitoring events and calculates the simulated execution time of the epoch based on these events. Through this method, CXLMemSim avoids the performance overhead of a full-system simulator (e.g., Gem5) and allows the memory hierarchy and latency to be easily adjusted, enabling research such as memory scheduling for complex applications. Our preliminary evaluation shows that CXLMemSim slows down the execution of the attached program by 4.41x on average for real-world applications.
翻译:正在形成的 CXL.mem 标准提供了新型的、 字面可处理的远程记忆, 包括各种记忆类型和等级。 在 CXL.mem 中, 多层记忆 -- -- 例如, 本地 DRAM 和 CXL- 附加在不同地点的远程记忆 -- -- 暴露于操作系统和用户应用程序中, 带来了新的挑战和研究机会。 不幸的是, 由于 CXL.mem 设备没有商业可用, 研究人员很难进行使用 CXL.m 的系统研究。 在本文件中, 我们展示了我们正在进行的工作, CXLMemSim, 一个快速和轻量级的 CXL.mem 模拟存储器, 使用由大多数商品处理器支持的性能监测事件驱动的性能模型。 具体来说, CXL.mSimSimm 将程序的执行分为多个缩放程序; 一旦完成后, CXLMemSimS 将收集快速的性能监测事件, 并计算出我们的平均性能时间。</s>