The exponential emergence of Field Programmable Gate Array (FPGA) has accelerated the research of hardware implementation of Deep Neural Network (DNN). Among all DNN processors, domain specific architectures, such as, Google's Tensor Processor Unit (TPU) have outperformed conventional GPUs. However, implementation of TPUs in reconfigurable hardware should emphasize energy savings to serve the green computing requirement. Voltage scaling, a popular approach towards energy savings, can be a bit critical in FPGA as it may cause timing failure if not done in an appropriate way. In this work, we present an ultra low power FPGA implementation of a TPU for edge applications. We divide the systolic-array of a TPU into different FPGA partitions, where each partition uses different near threshold (NTC) biasing voltages to run its FPGA cores. The biasing voltage for each partition is roughly calculated by the proposed offline schemes. However, further calibration of biasing voltage is done by the proposed online scheme. Four clustering algorithms based on the slack value of different design paths study the partitioning of FPGA. To overcome the timing failure caused by NTC, the higher slack paths are placed in lower voltage partitions and lower slack paths are placed in higher voltage partitions. The proposed architecture is simulated in Artix-7 FPGA using the Vivado design suite and Python tool. The simulation results substantiate the implementation of voltage scaled TPU in FPGAs and also justifies its power efficiency.
翻译:现场可编程门阵列(FPGA)的快速出现加速了深神经网络硬件实施的研究。 在全部 DNNN 处理器中,特定域架构,如谷歌的Tensor处理器(TPU)比常规的GPU(TPU)优于常规的GPU。 然而,在可重新配置的硬件中实施TPU应强调节能以满足绿色计算要求。 电压缩放(一种对节能的流行方法)在FPGA中可能有点关键,因为如果不以适当的方式做,可能会造成时间错失。 在这项工作中,我们展示了超低功率的FPGA(T) 处理器(TPU) 运行一个超低功率的 TPPU 。 我们将TPPU 的系统阵列安装到不同的FPGA分区分割器(TP) 。 每个分区的偏向电压控制器使用不同的阈值运行 FPGA核心。 每个分区的偏向性电压, 由拟议的离线计划大致计算, 进一步校准电压的电压调调调调调 。, 将NFPLA 方向 的更低的压压 的压 的压 的压压 的压压压压压压压压压压压压 的压法是压压压压压压压压压的压的压的压的压的压的压的压的压的压的压的压法 。