Real-time multi-model multi-task (MMMT) workloads, a new form of deep learning inference workloads, are emerging for applications areas like extended reality (XR) to support metaverse use cases. These workloads combine user interactivity with computationally complex machine learning (ML) activities. Compared to standard ML applications, these ML workloads present unique difficulties and constraints. Real-time MMMT workloads impose heterogeneity and concurrency requirements on future ML systems and devices, necessitating the development of new capabilities. This paper begins with a discussion of the various characteristics of these real-time MMMT ML workloads and presents an ontology for evaluating the performance of future ML hardware for XR systems. Next, we present XRBench, a collection of MMMT ML tasks, models, and usage scenarios that execute these models in three representative ways: cascaded, concurrent, and cascaded-concurrency for XR use cases. Finally, we emphasize the need for new metrics that capture the requirements properly. We hope that our work will stimulate research and lead to the development of a new generation of ML systems for XR use cases.
翻译:与标准 ML 应用相比,这些MMMT工作量具有独特的困难和限制。实时MMMT工作量对未来的ML系统和设备提出了异质和同价货币要求,这就要求开发新的能力。本文件首先讨论这些实时MMMT ML工作量的各种特点,为评估未来XR系统ML硬件的性能提供说明。接下来,我们提出XRBench,这是MMMT ML任务、模型和使用设想的汇编,以三种具有代表性的方式执行这些模型:级联、同时并用和连锁货币处理XR案件。最后,我们强调需要新的指标来正确反映这些要求。我们希望我们的工作将刺激研究和导致XL系统新一代的使用。