Industry has gradually moved towards application-specific hardware accelerators in order to attain higher efficiency. While such a paradigm shift is already starting to show promising results, designers need to spend considerable manual effort and perform a large number of time-consuming simulations to find accelerators that can accelerate multiple target applications while obeying design constraints. Moreover, such a "simulation-driven" approach must be re-run from scratch every time the set of target applications or design constraints change. An alternative paradigm is to use a "data-driven", offline approach that utilizes logged simulation data, to architect hardware accelerators, without needing any form of simulations. Such an approach not only alleviates the need to run time-consuming simulation, but also enables data reuse and applies even when set of target applications changes. In this paper, we develop such a data-driven offline optimization method for designing hardware accelerators, dubbed PRIME, that enjoys all of these properties. Our approach learns a conservative, robust estimate of the desired cost function, utilizes infeasible points, and optimizes the design against this estimate without any additional simulator queries during optimization. PRIME architects accelerators -- tailored towards both single and multiple applications -- improving performance upon state-of-the-art simulation-driven methods by about 1.54x and 1.20x, while considerably reducing the required total simulation time by 93% and 99%, respectively. In addition, PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.
翻译:工业逐渐转向应用专用硬件加速器,以提高效率。虽然这种范式转变已经开始显示出令人乐观的结果,但设计师需要花大量人工精力,并进行大量耗时的模拟,以寻找加速器,既能加速多目标应用程序,同时又能遵守设计限制。此外,这种“模拟驱动”方法必须在目标应用程序集或设计限制变化时从头到尾重新运行。另一种模式是使用“数据驱动”离线方法,利用已登录的模拟数据,向建筑硬件加速器,而不需要任何形式的模拟。这种方法不仅可以减轻操作耗时模拟的需要,而且可以使数据再利用和即使在设定目标应用程序变化时也适用。此外,在本文中,我们开发了这种数据驱动的离线优化方法,用于设计硬件加速器,调制成的PRIME,这些功能都具有所有这些特性。我们的方法是使用对理想的成本添加功能进行保守、稳健的估计,利用不可行的点,并且对建筑硬件模拟器进行优化,同时在不作任何额外的性能调整时,在一次测试中,对设计设计进行精确的周期性分析,同时通过一个快速的周期性分析,同时大幅地调整,在任何一级进行。