芯片装置不同基因系统高离裂间任务图的性能、多目标排列 (Performant, Multi-objective Scheduling of Highly Interleaved Task Graphs on Heterogeneous System on Chip Devices)

from arxiv, 15 pages, 2 pages of appendix, 14 figures including appendix. Accepted for publication in IEEE Transactions on Parallel and Distributed Systems

Performance-, power-, and energy-aware scheduling techniques play an essential role in optimally utilizing processing elements (PEs) of heterogeneous systems. List schedulers, a class of low-complexity static schedulers, have commonly been used in static execution scenarios. However, list schedulers are not suitable for runtime decision making, particularly when multiple concurrent applications are interleaved dynamically. For such cases, the static task execution times and expectation of idle PEs assumed by list schedulers lead to inefficient system utilization and poor performance. To address this problem, we present techniques for optimizing execution of list scheduling algorithms in dynamic runtime scenarios via a family of algorithms inspired by the well-known heterogeneous earliest finish time (HEFT) list scheduler. Through dynamically arriving, realistic workload scenarios that are simulated in an open-source discrete event heterogeneous SoC simulator, we exhaustively evaluate each of the proposed algorithms across two SoCs modeled after the Xilinx Zynq Ultrascale+ ZCU102 and O-Droid XU3 development boards. Altogether, depending on the chosen variant in this family of algorithms, we are able to achieve an up to 39% execution time improvement, up to 7.24x algorithmic speedup, or up to 30% energy consumption improvement compared to the baseline HEFT implementation.

翻译：在优化利用不同系统的加工元素(PES)方面,性能和能敏度排期技术在优化利用不同系统处理元素(PES)方面发挥着关键作用。列表调度器是一组低复杂度静态调度器,通常用于静态执行方案。然而,列表调度器不适合运行时间决策,特别是当多个同时应用程序动态地相互脱节时。对于这种情况,由列表调度器所假设的闲置的PE的静态任务执行时间和期望导致系统利用率低效和性能差。为解决这一问题,我们提出在动态运行时间情景中优化列表算法执行的技术,这些算法是由众所周知的复杂度早期完成时间(HEFT)列表仪所启发的一组算法。通过动态到达的、现实工作量假设,在开放源离散事件、差异性 SoC模拟器中模拟,我们详尽地评估了两个SoCs(以Xilinx Zynq Ultrascale+ ZCUU102和O-Droid XU3开发板为模型)的拟议算法的每一项。根据所选的变式,根据所选的变式,根据所选的30-x级算算算算算法的实施速度改进速度,我们能够实现到7-24的进度的进度改进,我们能够实现SHEFFFFT的进度到7-30的进度的进度的改进。