Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are computationally intensive, they need to be accelerated by graphics processing units (GPUs) to meet stringent timing constraints. However, despite the wide adoption of GPUs, efficiently scheduling multiple GPU applications while providing rigorous real-time guarantees remains a challenge. In this paper, we propose RTGPU, which can schedule the execution of multiple GPU applications in real-time to meet hard deadlines. Each GPU application can have multiple CPU execution and memory copy segments, as well as GPU kernels. We start with a model to explicitly account for the CPU and memory copy segments of these applications. We then consider the GPU architecture in the development of a precise timing model for the GPU kernels and leverage a technique known as persistent threads to implement fine-grained kernel scheduling with improved performance through interleaved execution. Next, we propose a general method for scheduling parallel GPU applications in real time. Finally, to schedule multiple parallel GPU applications, we propose a practical real-time scheduling algorithm based on federated scheduling and grid search (for GPU kernel segments) with uniprocessor fixed priority scheduling (for multiple CPU and memory copy segments). Our approach provides superior schedulability compared with previous work, and gives real-time guarantees to meet hard deadlines for multiple GPU applications according to comprehensive validation and evaluation on a real NVIDIA GTX1080Ti GPU system.
翻译:许多新兴的网络物理系统,如自主飞行器和机器人,都严重依赖人工智能和机器学习算法来进行重要的系统操作。由于这些高度平行的应用在计算上十分密集,因此需要由图形处理器(GPU)来加速,以适应严格的时间限制。然而,尽管广泛采用GPU,但高效地安排多个GPU应用程序,同时提供严格的实时保障,这仍然是一个挑战。在本文件中,我们提议RTGPU,它可以实时安排执行多个GPU应用程序,以达到艰难的最后期限。每个GPU应用程序可以有多个CPU执行和存储复制部分以及GPU G内核内核。我们从一个模型开始,以明确核算这些应用程序的CPU和存储复制部分。我们随后考虑GPU架构,为GPU内核内核开发一个精确的计时模型,并运用一种称为持续线的技术,以便通过内部执行,改进性能。接下来,我们提出一个将GPUPU应用程序实时排成的通用方法。最后,我们提出一个将GPU的同步应用程序排成一个比级系统,用于实际的GPU的固定的进度,以便实时的SLILA进行实时的升级的进度。我们实际的S-S-LILLA。我们提出一个实际和GVLUC的固定的系统,以便提供实时的固定的固定的升级的固定的进度。