Performance models that statically predict the steady-state throughput of basic blocks on particular microarchitectures, such as IACA, Ithemal, llvm-mca, OSACA, or DiffTune, can guide optimizing compilers and aid manual software optimization. However, their utility heavily depends on the accuracy of their predictions. The average error of existing models compared to measurements on the actual hardware has been shown to lie between 9% and 36%. But how good is this? To answer this question, we propose an extremely simple analytical throughput model that may serve as a baseline. Surprisingly, this model is already competitive with the state of the art, indicating that there is significant potential for improvement. To explore this potential, we develop a simulation-based throughput predictor. To this end, we propose a detailed parametric pipeline model that supports all Intel Core microarchitecture generations released between 2011 and 2021. We evaluate our predictor on an improved version of the BHive benchmark suite and show that its predictions are usually within 1% of measurement results, improving upon prior models by roughly an order of magnitude. The experimental evaluation also demonstrates that several microarchitectural details considered to be rather insignificant in previous work, are in fact essential for accurate prediction. Our throughput predictor is available as open source at https://github.com/andreas-abel/uiCA.
翻译:静态地预测特定微构件(如IACA, Ithemal, llvm-mca, OSACA, 或 DiffTune ) 基本组块稳定通过量的实绩模型可以指导优化编译员和辅助手工软件优化。 然而,它们的实用性在很大程度上取决于预测的准确性。 与实际硬件测量相比,现有模型的平均误差在9%至36%之间。 但是,这有多好? 为了回答这个问题,我们提议了一个极简单、分析性的分析集成模型,可以作为基准。令人惊讶的是,这一模型已经与艺术状态具有竞争力,表明有很大的改进潜力。为了探索这一潜力,我们开发了一个基于模拟的吞吐预测器。为此,我们提出了一个详细的模拟管道模型,用以支持2011至2021年间释放的所有英特尔核心微型结构代人的测量结果。我们对改进版Bhive基准套件的预测器进行了评估,并显示其预测通常在1%的测量结果范围内,在先前的模型上作了改进,以大约一种精确的顺序改进了先前的模型。 我们的直观性预测数据库中, 也显示,从一个基础的源到可追溯的预测。