Modern Systems on Chip (SoC), almost as a rule, require accelerators for achieving energy efficiency and high performance for specific tasks that are not necessarily well suited for execution in standard processing units. Considering the broad range of applications and necessity for specialization, the design of SoCs has thus become expressively more challenging. In this paper, we put forward the concept of G-GPU, a general-purpose GPU-like accelerator that is not application-specific but still gives benefits in energy efficiency and throughput. Furthermore, we have identified an existing gap for these accelerators in ASIC, for which no known automated generation platform/tool exists. Our solution, called GPUPlanner, is an open-source generator of accelerators, from RTL to GDSII, that addresses this gap. Our analysis results show that our automatically generated G-GPU designs are remarkably efficient when compared against the popular CPU architecture RISC-V, presenting speed-ups of up to 223 times in raw performance and up to 11 times when the metric is performance derated by area. These results are achieved by executing a design space exploration of the GPU-like accelerators, where the memory hierarchy is broken in a smart fashion and the logic is pipelined on demand. Finally, tapeout-ready layouts of the G-GPU in 65nm CMOS are presented.
翻译:近似于常规的芯片(SOC)现代系统需要加速器来实现能源效率和高性能,而具体任务不一定适合标准处理单位执行。考虑到应用和专业化需要的广泛范围,SoC的设计因此变得格外具有挑战性。在本文件中,我们提出了G-GPU的概念,G-GPU是一个通用的GPU式加速器,它不是具体应用的通用GPU式加速器,但在能源效率和吞吐量方面仍然带来效益。此外,我们已经为ASIC的这些加速器找出了现有差距,因为没有已知的自动生成平台/工具。我们称为GPUPUPlanner的解决方案是从RTL到GDSSII的加速器的开源生成器,从而解决了这一差距。我们的分析结果表明,我们自动生成的G-GPU的G加速器设计与广受欢迎的CPU结构(RISC-V)相比,效率非常高,在原始性能表现方面速度高达223倍,在指标被区域贬低时达到11倍。我们称之为GPUPRER的解决方案,这些结果通过执行智能的G-rodemod Stimstal develop drutlock-hal lades the the lades the the lades the lades lades lades lades the lades lappral-s