Tools to predict the throughput of basic blocks on a specific microarchitecture are useful to optimize software performance and to build optimizing compilers. In recent work, several such tools have been proposed. However, the accuracy of their predictions has been shown to be relatively low. In this paper, we identify the most important factors for these inaccuracies. To a significant degree these inaccuracies are due to elements and parameters of the pipelines of recent CPUs that are not taken into account by previous tools. A primary reason for this is that the necessary details are often undocumented. In this paper, we build more precise models of relevant components by reverse engineering using microbenchmarks. Based on these models, we develop a simulator for predicting the throughput of basic blocks. In addition to predicting the throughput, our simulator also provides insights into how the code is executed. Our tool supports all Intel Core microarchitecture generations released in the last decade. We evaluate it on an improved version of the BHive benchmark suite. On many recent microarchitectures, its predictions are more accurate than the predictions of state-of-the-art tools by more than an order of magnitude.
翻译:用于预测特定微构件基本块块在特定微构件上的吞吐量的工具对于优化软件性能和构建优化编译器非常有用。 在最近的工作中, 提出了若干这样的工具。 但是, 它们的预测的准确性被证明相对较低 。 在本文中, 我们确定了这些不准确性的最重要因素 。 这些不准确性在很大程度上是由于最近的CPU管道的元素和参数造成的, 而以前的工具没有考虑到这些元素和参数 。 其主要原因是, 必要的细节往往没有记录下来。 本文中, 我们用微构标记来建立更精确的相关部件模型 。 基于这些模型, 我们开发了一个模拟器来预测基本块的吞吐量。 除了预测吞吐量之外, 我们的模拟器还提供如何执行代码的洞察力。 我们的工具支持过去十年中释放出来的所有核心微构件子世代。 我们用改进版的BHive基准套件来评估它。 在许多最近的微构件上, 其预测比工具质量的预测更精确性强。