Building and maintaining large AI fleets to efficiently support the fast-growing DL workloads is an active research topic for modern cloud infrastructure providers. Generating accurate benchmarks plays an essential role in the design and evaluation of rapidly evoloving software and hardware solutions in this area. Two fundamental challenges to make this process scalable are (i) workload representativeness and (ii) the ability to quickly incorporate changes to the fleet into the benchmarks. To overcome these issues, we propose Mystique, an accurate and scalable framework for production AI benchmark generation. It leverages the PyTorch execution graph (EG), a new feature that captures the runtime information of AI models at the granularity of operators, in a graph format, together with their metadata. By sourcing EG traces from the fleet, we can build AI benchmarks that are portable and representative. Mystique is scalable, with its lightweight data collection, in terms of runtime overhead and user instrumentation efforts. It is also adaptive, as the expressiveness and composability of EG format allows flexible user control over benchmark creation. We evaluate our methodology on several production AI workloads, and show that benchmarks generated with Mystique closely resemble original AI models, both in execution time and system-level metrics. We also showcase the portability of the generated benchmarks across platforms, and demonstrate several use cases enabled by the fine-grained composability of the execution graph.
翻译:为高效率地支持快速增长的DL工作量,建设和维持大型AI机队以高效地支持快速增长的DL工作量,这是现代云层基础设施供应商的一个积极研究课题。准确的基准在设计和评价该领域迅速爱好的软件和硬件解决方案方面起着关键作用。使这一进程可以伸缩的两个基本挑战是:(一) 工作量代表性,以及(二) 将机队变化迅速纳入基准的能力。为了克服这些问题,我们提议神秘,这是制作AI基准生成的准确和可扩展的框架。它利用PyToirch执行图(EG),这是一个新特点,它以图表格式记录运营商颗粒度的AI模型的运行时间信息及其元数据。通过从机队获取EG的跟踪,我们可以建立可移动和具有代表性的AI基准。根据运行时空和用户仪仪仪的收集量轻度数据是可缩放的。它还具有适应性,因为EG格式的清晰度和可调适度使用户能够灵活地控制基准创建。我们用图表形式评估了我们的一些AI型模型的运行时间间隔信息,同时展示了各种标准。</s>