Modern ARM-based servers such as ThunderX and ThunderX2 offer a tremendous amount of parallelism by providing dozens or even hundreds of processors. However, exploiting these computing resources for reuse-heavy, data dependent workloads is a big challenge because of shared cache resources. In particular, schedulers have to conservatively co-locate processes to avoid cache conflicts since miss penalties are detrimental and conservative co-location decisions lead to lower resource utilization. To address these challenges, in this paper we explore the utility of predictive analysis of applications' execution to dynamically forecast resource-heavy workload regions, and to improve the efficiency of resource management through the use of new proactive methods. Our approach relies on the compiler to insert "beacons" in the application at strategic program points to periodically produce and/or update the attributes of anticipated resource-intense program region(s). The compiler classifies loops in programs based on predictability of their execution time and inserts different types of beacons at their entry/exit points. The precision of the information carried by beacons varies as per the analyzability of the loops, and the scheduler uses performance counters to fine tune co-location decisions. The information produced by beacons in multiple processes is aggregated and analyzed by the proactive scheduler to respond to the anticipated workload requirements. For throughput environments, we develop a framework that demonstrates high-quality predictions and improvements in throughput over CFS by 1.4x on an average and up to 4.7x on ThunderX and 1.9x on an average and up to 5.2x on ThunderX2 servers on consolidated workloads across 45 benchmarks.
翻译:以ARM为基础的现代服务器,如SunderX和SunderX2提供数十个甚至数百个处理器,从而提供了巨大的平行效应。然而,利用这些计算资源进行再利用,数据依赖工作量由于共享缓冲资源而是一个巨大的挑战。特别是,调度员必须保守地共同放置程序以避免缓冲冲突,因为误判是有害和保守的合用同一地点决定导致资源利用率下降。为了应对这些挑战,我们在本文件中探讨了预测分析应用程序执行情况对动态预测资源重重工作量区域的作用,并通过使用新的主动方法提高资源管理的效率。我们的方法依靠汇编员在战略程序点的应用程序中插入“信标”以定期制作和/或更新预期的资源密集程序区域的属性。编译员根据执行时间的可预测性对程序进行循环分类,并将不同种类的灯塔插入到其进入/输出点。信标所收集的信息的准确性与循环的可分析性速度不同,以及使用新的主动性方法,我们的方法依靠汇编者在平均速度的服务器上,通过预估的进度中,通过预估的进度来分析一个预估的进度。