单项性: AI 工作量表的行星规模、可允许性、高等排程 (Singularity: Planet-Scale, Preemptible, Elastic Scheduling of AI Workloads)

Dharma Shukla,Muthian Sivathanu,Srinidhi Viswanatha,Bhargav Gulavani,Rimma Nehme,Amey Agrawal,Chen Chen,Nipun Kwatra,Ramachandran Ramjee,Pankaj Sharma,Atul Katiyar,Vipul Modi,Vaibhav Sharma,Abhishek Singh,Shreshth Singhal,Kaustubh Welankar,Lu Xun,Ravi Anupindi,Karthik Elangovan,Hasibur Rahman,Zhou Lin,Rahul Seetharaman,Cheng Xu,Eddie Ailijiang,Suresh Krishnappa,Mark Russinovich

Lowering costs by driving high utilization across deep learning workloads is a crucial lever for cloud providers. We present Singularity, Microsoft's globally distributed scheduling service for highly-efficient and reliable execution of deep learning training and inference workloads. At the heart of Singularity is a novel, workload-aware scheduler that can transparently preempt and elastically scale deep learning workloads to drive high utilization without impacting their correctness or performance, across a global fleet of AI accelerators (e.g., GPUs, FPGAs). All jobs in Singularity are preemptable, migratable, and dynamically resizable (elastic) by default: a live job can be dynamically and transparently (a) preempted and migrated to a different set of nodes, cluster, data center or a region and resumed exactly from the point where the execution was preempted, and (b) resized (i.e., elastically scaled-up/down) on a varying set of accelerators of a given type. Our mechanisms are transparent in that they do not require the user to make any changes to their code or require using any custom libraries that may limit flexibility. Additionally, our approach significantly improves the reliability of deep learning workloads. We show that the resulting efficiency and reliability gains with Singularity are achieved with negligible impact on the steady-state performance. Finally, our design approach is agnostic of DNN architectures and handles a variety of parallelism strategies (e.g., data/pipeline/model parallelism).

翻译：通过在深层学习工作量中推动高利用率降低成本是云端供应商的关键杠杆。我们展示了单质性、微软在全球分布的用于高效和可靠地开展深层学习培训和推算工作量的全方位调度服务。在单质性的核心是一个创新的、有工作量的排程器,它能够透明地预先和持续地扩大深层次的学习工作量,以驱动高利用率,同时又不影响其正确性或性能,而不会影响全球一队AI加速器(如GPUs、FPGAs)的精确度。单质性的所有工作都是先发制人、可缩水力和动态地通过默认(弹性)可恢复的:活性工作可以动态和透明地进行(a) 预设和迁移到不同的节点、集群、数据中心或区域,并完全从执行前期性能的点恢复到高水平,以及(b) 重新规模(例如,有弹性的提升/降级性提升/降级) 处理一种特定类型的平行操作器。我们的机制是透明的,它们并不要求用户在设计上灵活度上改变其设计效率,我们最终要求用户在设计上取得一定的可靠性。我们的精确性, 学习效率,我们可能显示我们的任何格式,我们的任何程度, 的进度要求,我们的任何程度, 学习效率,我们的任何程度都会要求。