Over the years, processor throughput has steadily increased. However, the memory throughput has not increased at the same rate, which has led to the memory wall problem in turn increasing the gap between effective and theoretical peak processor performance. To cope with this, there has been an abundance of work in the area of data/instruction prefetcher designs. Broadly, prefetchers predict future data/instruction address accesses and proactively fetch data/instructions in the memory hierarchy with the goal of lowering data/instruction access latency. To this end, one or more prefetchers are deployed at each level of the memory hierarchy, but typically, each prefetcher gets designed in isolation without comprehensively accounting for other prefetchers in the system. As a result, individual prefetchers do not always complement each other, and that leads to lower average performance gains and/or many negative outliers. In this work, we propose Puppeteer, which is a hardware prefetcher manager that uses a suite of random forest regressors to determine at runtime which prefetcher should be ON at each level in the memory hierarchy, such that the prefetchers complement each other and we reduce the data/instruction access latency. Compared to a design with no prefetchers, using Puppeteer we improve IPC by 46.0% in 1 Core (1C), 25.8% in 4 Core (4C), and 11.9% in 8 Core (8C) processors on average across traces generated from SPEC2017, SPEC2006, and Cloud suites with ~10KB overhead. Moreover, we also reduce the number of negative outliers by over 89%, and the performance loss of the worst-case negative outlier from 25% to only 5% compared to the state-of-the-art.
翻译:多年来,处理器吞吐量稳步增加。然而,内存吞吐量并没有以同样的速度增加,导致记忆墙问题,从而导致记忆墙问题,从而增加了有效与理论峰值处理器性能之间的差距。要解决这个问题,在数据/指令预菲切设计领域开展了大量工作。从广义上,预产物预测未来数据/指令访问权限,并主动获取记忆层的数据/指令,目的是降低数据/指令访问延缓度。为此,在存储层的每一级都部署了一个或多个预估器,但通常每个预估器都会在孤立中设计出,而不全面核算系统中的其他预估量器。因此,个别预估量器并不总是互相补充,导致平均性能增益和(或)许多负值。在这项工作中,我们建议Putetereer只用硬件预选器,5级的轨迹管理器使用随机森林累增器在运行期间确定最差的预估量为每级C值为0.8, 核心的预估量为每级中,Seffereral 将Seral 25 。