L1 instruction (L1-I) cache misses are a source of performance bottleneck. Sequential prefetchers are simple solutions to mitigate this problem; however, prior work has shown that these prefetchers leave considerable potentials uncovered. This observation has motivated many researchers to come up with more advanced instruction prefetchers. In 2011, Proactive Instruction Fetch (PIF) showed that a hardware prefetcher could effectively eliminate all of the instruction-cache misses. However, its enormous storage cost makes it an impractical solution. Consequently, reducing the storage cost was the main research focus in the instruction prefetching in the past decade. Several instruction prefetchers, including RDIP and Shotgun, were proposed to offer PIF-level performance with significantly lower storage overhead. However, our findings show that there is a considerable performance gap between these proposals and PIF. While these proposals use different mechanisms for instruction prefetching, the performance gap is largely not because of the mechanism, and instead, is due to not having sufficient storage. Prior proposals suffer from one or both of the following shortcomings: (1) a large number of metadata records to cover the potential, and (2) a high storage cost of each record. The first problem causes metadata miss, and the second problem prohibits the prefetcher from storing enough records within reasonably-sized storage.
翻译:L1 指令( L1- I) 缓冲误差是性能瓶颈的一个来源。 序列预发器是缓解这一问题的简单解决方案; 但是, 先前的工作表明, 这些预发器留下了相当大的潜力。 这一观察促使许多研究人员提出了更先进的预发件器。 2011年, 预发式指令( PIF) 显示, 硬件预发器可以有效消除所有教缓漏, 然而, 其巨大的存储成本使得它成为一个不切实际的解决方案。 因此, 降低存储成本是过去十年来指令预发中的主要研究焦点。 一些预发器, 包括RDIP和Shotgun, 提议提供PIF级的性能, 并大大降低存储间接费用。 然而, 我们的研究结果表明,这些提案与PIFT之间存在相当大的绩效差距。 虽然这些提案使用不同的指令预发漏机制, 但绩效差距在很大程度上不是由于这一机制,而是由于没有充足的存储。 之前的建议存在以下两个缺陷:(1) 大量元数据记录, 并且每个存储记录中都有相当大的一个缺陷。