从微观业务方案到抽象资源:通过微观基准标志建立简单的CPU性能模型 (From micro-OPs to abstract resources: constructing a simpler CPU performance model through microbenchmarking)

In a super-scalar architecture, the scheduler dynamically assigns micro-operations ({\mu}ops) to execution ports. The port mapping of an architecture describes how an instruction decomposes into {\mu}ops and lists for each {\mu}ops the set of ports it can be mapped to. It is used by compilers and performance debugging tools to characterize the performance throughput of a sequence of instructions repeatedly executed as the core component of a loop. This paper introduces a dual equivalent representation: The resource mapping of an architecture is an abstract model where, to be executed, an instruction must use a set of abstract resources, themselves representing combinations of execution ports. For a given architecture, finding a port mapping is an important but difficult problem. Building a resource mapping is a more tractable problem and provides a simpler and equivalent model. This paper describes PALMED, a tool that automatically builds a resource mapping for pipelined, super-scalar, out-of-order CPU architectures. PALMED does not require hardware performance counters, and relies solely on runtime measurements. We evaluate the pertinence of our dual representation for throughput modeling by extracting a representative set of basic-blocks from the compiled binaries of the SPEC CPU 2017 benchmarks~\cite{SPECCPU2017}. We compared the throughput predicted by existing machine models to that produced by \tool, and found comparable accuracy to state-of-the art tools, achieving sub-10 \% mean square error rate on this workload on Intel's Skylake microarchitecture.

翻译：在超级星标架构中, 调度器动态地将微操作( lmu}ops) 指派给执行端口。港口结构映射描述一个指令如何分解成 {mu} ops 并列出每个 mu} 能够映射的端口。它被编译者和性能调试工具用于描述反复执行的指令序列的性能通过量, 并成为循环的核心组成部分。本文引入了一种双重等值的表示 : 一个架构的资源映射是一个抽象模型, 执行时, 指令必须使用一组抽象资源, 本身代表执行港口的组合。对于一个特定的架构来说, 寻找一个港口映射是一个重要但困难的问题。建立一个资源映射器是一个更可移植的问题, 并提供一个更简单和等效的模型。本文描述PALMED, 这个工具可以自动构建一个资源映射管道、超级卡路标、离线 CP 架构。 PALMED 不需要硬件性能反射, 并且只依靠运行时间测量。我们通过可比较的S- CD CD IM IM 校验的模型, 通过S- brealal imalal press ral imate imate imate imate ralbalbalbalbalbalate views views views 。