Many hardware structures in today's high-performance out-of-order processors do not scale in an efficient way. To address this, different solutions have been proposed that build execution schedules in an energy-efficient manner. Issue time prediction processors are one such solution that use data-flow dependencies and predefined instruction latencies to predict issue times of repeated instructions. In this work, we aim to improve their accuracy, and consequently their performance, in an energy efficient way. We accomplish this by taking advantage of two key observations. First, memory accesses often take additional time to arrive than the static, predefined access latency that is used to describe these systems. Second, we find that these memory access delays often repeat across iterations of the same code. This, in turn, allows us to predict the arrival time of these accesses. In this work, we introduce a new processor microarchitecture, that replaces a complex reservation-station-based scheduler with an efficient, scalable alternative. Our proposed scheduling technique tracks real-time delays of loads to accurately predict instruction issue times, and uses a reordering mechanism to prioritize instructions based on that prediction, achieving close-to-out-of-order processor performance. To accomplish this in an energy-efficient manner we introduce: (1) an instruction delay learning mechanism that monitors repeated load instructions and learns their latest delay, (2) an issue time predictor that uses learned delays and data-flow dependencies to predict instruction issue times and (3) priority queues that reorder instructions based on their issue time prediction. Together, our processor achieves 86.2% of the performance of a traditional out-of-order processor, higher than previous efficient scheduler proposals, while still consuming 30% less power.
翻译:为了解决这个问题,提出了不同的解决方案,以节能的方式构建执行时间表。 发布时间预测处理器是使用数据流依赖性和预定义指示延迟来预测重复指示时间的一种解决方案。 在这项工作中,我们的目标是提高它们的准确性,从而以节能方式提高它们的性能。 我们利用两个关键观察来完成这项工作。 首先, 记忆电路接入往往需要更多的时间才能到达, 而不是用来描述这些系统的静态、 预设的存取时间。 第二, 我们发现这些存储访问延迟经常在相同代码的反复重复中重复出现。 这反过来, 使我们能够预测这些访问的到达时间。 在这项工作中, 我们引入一个新的处理器或微结构, 以高效、 可缩放的替代一个复杂的定点调度器。 我们提议的时间安排技术仍然跟踪实时延迟, 以准确预测指示问题的时间, 并且使用一个调整指令指令时间的机制, 以预测一个基于系统运行运行周期的连续指令时间, 从而实现快速的运行指令进程, 从而实现我们之前的学习机制的更新程序 。