Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control often involve dynamic behaviors in various levels; task, model, and layers (or, ML operators) within a model. Such dynamic behaviors are new challenges to the system software in an ML system because the overall system load is unpredictable unlike traditional ML workloads. Also, the real-time processing requires to meet deadlines, and multi-model workloads involve highly heterogeneous models. As RTMM workloads often run on resource-constrained devices (e.g., VR headset), developing an effective scheduler is an important research problem. Therefore, we propose a new scheduler, SDRM3, that effectively handles various dynamicity in RTMM style workloads targeting multi-accelerator systems. To make scheduling decisions, SDRM3 quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. SDRM3 has tunable parameters that provide fast adaptivity to dynamic workload changes based on a gradient descent-like online optimization, which typically converges within five steps for new workloads. In addition, we also propose a method to exploit model level dynamicity based on Supernet for exploiting the trade-off between the scheduling effectiveness and model performance (e.g., accuracy), which dynamically selects a proper sub-network in a Supernet based on the system loads. In our evaluation on five realistic RTMM workload scenarios, SDRM3 reduces the overall UXCost, which is a energy-delay-product (EDP)-equivalent metric for real-time applications defined in the paper, by 37.7% and 53.2% on geometric mean (up to 97.6% and 97.1%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.
翻译:新兴的实时多模ML(RTMM)工作量,如AR/VR和无人机控制等新兴的多模ML(RTMMM)工作量,往往涉及不同层次的动态行为;任务、模型和层(或ML操作员),在模型中。这种动态行为是对ML系统系统系统软件的新挑战,因为整个系统负荷与传统的ML工作量不同,无法预测。此外,实时处理需要满足最后期限,多模工作量涉及高度差异的模型。由于TRMM工作量往往在资源限制的设备(如,VR头)上运行,开发有效的调度器是一个重要的研究问题。因此,我们提出了一个新的调度器(SDRM3),该程序有效地处理RTMM的多种动态工作量,针对的是多加速系统。要做出时间安排决定,SDRM3,SM3 实时处理的模型要求对RTM工作量做出独特的要求,并使用量化的分数来驱动时间安排决定,考虑到当前系统负荷和不同模型和输入框架中的其他推力工作。SDRM3的可调度参数参数参数参数参数参数参数参数参数,可以提供快速适应动态的弹性调整,而快速适应动态的SDRFMMM3,在动态的弹性工作量中, 正常的进度系统里,在正常的进度系统里,在正常的进度中,在正常的进度系统里,在正常的进度系统里,在正常的进度上进行。