近期智能微结构化微构件基本构件的准确输送流量预测 (Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures)

Tools to predict the throughput of basic blocks on a specific microarchitecture are useful to optimize software performance and to build optimizing compilers. In recent work, several such tools have been proposed. However, the accuracy of their predictions has been shown to be relatively low. In this paper, we identify the most important factors for these inaccuracies. To a significant degree these inaccuracies are due to elements and parameters of the pipelines of recent CPUs that are not taken into account by previous tools. A primary reason for this is that the necessary details are often undocumented. In this paper, we build more precise models of relevant components by reverse engineering using microbenchmarks. Based on these models, we develop a simulator for predicting the throughput of basic blocks. In addition to predicting the throughput, our simulator also provides insights into how the code is executed. Our tool supports all Intel Core microarchitecture generations released in the last decade. We evaluate it on an improved version of the BHive benchmark suite. On many recent microarchitectures, its predictions are more accurate than the predictions of state-of-the-art tools by more than an order of magnitude.

翻译：用于预测特定微构件基本块块在特定微构件上的吞吐量的工具对于优化软件性能和构建优化编译器非常有用。在最近的工作中, 提出了若干这样的工具。但是, 它们的预测的准确性被证明相对较低。在本文中, 我们确定了这些不准确性的最重要因素。这些不准确性在很大程度上是由于最近的CPU管道的元素和参数造成的, 而以前的工具没有考虑到这些元素和参数。其主要原因是, 必要的细节往往没有记录下来。本文中, 我们用微构标记来建立更精确的相关部件模型。基于这些模型, 我们开发了一个模拟器来预测基本块的吞吐量。除了预测吞吐量之外, 我们的模拟器还提供如何执行代码的洞察力。我们的工具支持过去十年中释放出来的所有核心微构件子世代。我们用改进版的BHive基准套件来评估它。在许多最近的微构件上, 其预测比工具质量的预测更精确性强。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【剑桥大学】神经机器翻译综述论文，Neural Machine Translation: A Review，附88页pdf

专知会员服务

37+阅读 · 2019年12月4日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日