Non-uniform performance and power consumption across the processing elements (PEs) of heterogeneous SoCs increase the computation complexity of the task scheduling problem compared to homogeneous architectures. Latency of a software-based scheduler with the increased heterogeneity level in terms of number and types of PEs creates the necessity of deploying a scheduler as an overlay processor in hardware to be able to make scheduling decisions rapidly and enable deployment of real-life applications on heterogeneous SoCs. In this study we present the design trade-offs involved for implementing and deploying the runtime variant of the heterogeneous earliest finish time algorithm (HEFT_RT) on the FPGA. We conduct performance evaluations on a SoC configuration emulated over the Xilinx Zynq ZCU102 platform. In a runtime environment we demonstrate hardware-based HEFT_RT's ability to make scheduling decisions with 9.144 ns latency on average, process 26.7% more tasks per second compared to its software counterpart, and reduce the scheduling latency by up to a factor of 183x based on workloads composed of mixture of dynamically arriving real-life signal processing applications.
翻译:与同质结构相比,异质 SoCs各加工元素的不统一性能和电耗增加了任务时间安排问题的计算复杂性。基于软件的调度器在数量和类型方面差异程度增加,使得有必要部署一个调度器作为硬件的重叠处理器,以便能够迅速做出时间安排决定,并能在异质SoCs上部署实际应用软件。在这项研究中,我们介绍了在FPGA中实施和部署混合最早完成时间算法(HEFT_RT)运行时间变方所涉及的设计权衡。我们根据Xilinx Zynq ZCU102平台所效仿的 SoC配置进行绩效评估。在运行时环境中,我们展示了基于硬件的HEFT_RT的能力,以平均9.144 ns latence做出时间安排决定,与软件对应方相比,每秒处理26.7%的任务增加,并根据由动态定位实际信号处理应用程序混合构成的工作量,将排流到183x。