Performance models that statically predict the steady-state throughput of basic blocks on particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or CQA, can guide optimizing compilers and aid manual software optimization. However, their utility heavily depends on the accuracy of their predictions. The average error of existing models compared to measurements on the actual hardware has been shown to lie between 9% and 36%. But how good is this? To answer this question, we propose an extremely simple analytical throughput model that may serve as a baseline. Surprisingly, this model is already competitive with the state of the art, indicating that there is significant potential for improvement. To explore this potential, we develop a simulation-based throughput predictor. To this end, we propose a detailed parametric pipeline model that supports all Intel Core microarchitecture generations released between 2011 and 2021. We evaluate our predictor on an improved version of the BHive benchmark suite and show that its predictions are usually within 1% of measurement results, improving upon prior models by roughly an order of magnitude. The experimental evaluation also demonstrates that several microarchitectural details considered to be rather insignificant in previous work, are in fact essential for accurate prediction. Our throughput predictor is available as open source at https://github.com/andreas-abel/uiCA.
翻译:静态预测特定微构石(如IACA, Ithemal, llvm-mca, OSACA, 或 CQA)等基本构件稳定状态吞吐量的性能模型可以指导优化编译员和援助手工软件优化。 但是,它们的实用性在很大程度上取决于预测的准确性。 与实际硬件测量相比,现有模型的平均误差在9%至36%之间。 但是,这有多好? 为了回答这个问题,我们提出了一个极简单且分析的透算模型,可以作为基准。 令人惊讶的是,这一模型已经与艺术状态具有竞争力,表明有很大的改进潜力。 为了探索这一潜力,我们开发了一个基于模拟的吞吐预测器。为此,我们提出了一个详细的模拟输油管模型,支持2011至2021年释放的所有英特尔核心微型建筑群。我们用改进的Bhive基准套件的预测器来评估我们的预测器,并显示其预测通常在测量结果的1%之内,在先前的模型中经过大约一种程度的改进,用一种程度的顺序来改进。 我们开发一个模拟的预估测图,从以前的预测器,从以前的预测器到以前的精确到以前的预测, 通过以前的预测, 通过以前的预测的源。 我们的预测也是一个基础的。 通过一个基础的预测, 通过一个基本的精确的源。