HPC users aim to improve their execution times without particular regard for increasing system utilization. On the contrary, HPC operators favor increasing the number of executed applications per time unit and increasing system utilization. This difference in the preferences promotes the following operational model. Applications execute on exclusively-allocated computing resources for a specific time and applications are assumed to utilize the allocated resources efficiently. In many cases, this operational model is inefficient, i.e., applications may not fully utilize their allocated resources. This inefficiency results in increasing application execution time and decreasing system utilization. In this work, we propose a resourceful coordination approach (RCA) that enables the cooperation between, currently independent, batch- and application-level schedulers. RCA enables application schedulers to share their allocated but idle computing resources with other applications through the batch system. The effective system performance (ESP) benchmark is used to assess the proposed approach. The results show that RCA increased system utilization up to 12.6% and decreased system makespan by the same percent without affecting applications' performance.
翻译:HPC用户的目标是在不特别注意提高系统利用率的情况下改进其执行时间;相反,HPC操作员偏好于增加按时间单位执行的申请数量和增加系统利用率;这种偏好的差异促进以下操作模式;应用在特定时间内对专门分配的计算资源实施,并假定应用能够有效利用所分配的资源;在许多情况下,这种操作模式效率低下,即应用程序可能无法充分利用所分配的资源;这种效率低下导致应用执行时间增加和系统利用率下降;在这项工作中,我们建议一种资源丰富的协调办法,使目前独立的分批和应用程序级别调度员之间能够开展合作;RCA使应用程序调度员能够通过批量系统与其他应用程序共享分配但闲置的计算资源;有效的系统性能基准被用来评估拟议方法;结果显示,RCA将系统利用率提高到12.6%,系统利用率下降幅度相同,不影响应用程序的绩效。