清理混乱 (Cleaning up the Mess)

A MICRO 2024 best paper runner-up publication (the Mess paper) with all three artifact badges awarded (including "Reproducible") proposes a new benchmark to evaluate real and simulated memory system performance. In this paper, we demonstrate that the Ramulator 2.0 simulation results reported in the Mess paper are incorrect and, at the time of the publication of the Mess paper, irreproducible. We find that the authors of Mess paper made multiple trivial human errors in both the configuration and usage of the simulators. We show that by correctly configuring Ramulator 2.0, Ramulator 2.0's simulated memory system performance actually resembles real system characteristics well, and thus a key claimed contribution of the Mess paper is factually incorrect. We also identify that the DAMOV simulation results in the Mess paper use wrong simulation statistics that are unrelated to the simulated DRAM performance. Moreover, the Mess paper's artifact repository lacks the necessary sources to fully reproduce all the Mess paper's results. Our work corrects the Mess paper's errors regarding Ramulator 2.0 and identifies important issues in the Mess paper's memory simulator evaluation methodology. We emphasize the importance of both carefully and rigorously validating simulation results and contacting simulator authors and developers, in true open source spirit, to ensure these simulators are used with correct configurations and as intended. We encourage the computer architecture community to correct the Mess paper's errors. This is necessary to prevent the propagation of inaccurate and misleading results, and to maintain the reliability of the scientific record. Our investigation also opens up questions about the integrity of the review and artifact evaluation processes. To aid future work, our source code and scripts are openly available at https: //github.com/CMU-SAFARI/ramulator2/tree/mess.

翻译：一篇获得MICRO 2024最佳论文亚军（即Mess论文）且荣获全部三项制品徽章（包括“可复现”）的论文，提出了一个评估真实与模拟内存系统性能的新基准。在本文中，我们证明Mess论文中报告的Ramulator 2.0模拟结果是错误的，并且在Mess论文发表时是不可复现的。我们发现Mess论文的作者在模拟器的配置和使用上犯了多个微不足道的人为错误。我们表明，通过正确配置Ramulator 2.0，其模拟的内存系统性能实际上能很好地反映真实系统特性，因此Mess论文所声称的一个关键贡献在事实上是不正确的。我们还发现，Mess论文中的DAMOV模拟结果使用了与模拟的DRAM性能无关的错误模拟统计数据。此外，Mess论文的制品仓库缺乏完全复现其所有结果所需的源代码。我们的工作纠正了Mess论文关于Ramulator 2.0的错误，并指出了其内存模拟器评估方法中的重要问题。我们强调，必须仔细且严格地验证模拟结果，并以真正的开源精神联系模拟器的作者和开发者，以确保这些模拟器以正确的配置和预期的方式被使用。我们鼓励计算机体系结构社区纠正Mess论文的错误。这对于防止不准确和误导性结果的传播，以及维护科学记录的可靠性是必要的。我们的调查也引发了关于评审和制品评估过程完整性的疑问。为助力未来工作，我们的源代码和脚本已在 https://github.com/CMU-SAFARI/ramulator2/tree/mess 公开提供。