The plethora of complex artificial intelligence (AI) algorithms and available high performance computing (HPC) power stimulates the expeditious development of AI components in both hardware and software domains. Existing HPC and AI benchmarks fail to cover the variety of heterogeneous systems while providing a simple yet comprehensive measurement of the cross-stack performance. To address the challenges, we propose an end-to-end benchmark suite utilizing automated machine learning (AutoML) as a representative AI application. The extreme computational cost and scalability make AutoML a desired workload for benchmarking AI-HPC. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and customizability of diverse systems. The major metric to quantify the system performance is floating-point operations per second (FLOPS), which is measured in a systematic and analytical approach. We verify the benchmark's stability at discrete timestamps on different types and scales of machines equipped with up to 400 AI accelerators. Our evaluation shows the benchmark has near-optimal speedup and the scores scale linearly with the number of machines to reflect the overall computing power on AI. The source code, specifications and detailed procedures are publicly accessible on GitHub.
翻译:复杂的人工智能(AI)算法和现有高性能计算(HPC)功能的繁多复杂人工智能(AI)算法和现有高性能计算(HPC)能力刺激了在硬件和软件领域迅速开发AI组件。现有的HPC和AI基准没有涵盖多种不同的系统,而提供了对交叉堆式性能的简单而全面的衡量。为了应对挑战,我们提议了一个端对端基准套件,使用具有代表性的AI应用程序的自动机器学习(Automle)作为自动计算机应用。极端计算成本和可扩缩性使自动ML成为基准的预期工作量。我们以高度平行和灵活的方式执行这些算法,以确保不同系统的效率和可定制性。量化系统性的主要衡量标准是每秒浮动点操作(FLOPS),这是以系统化和分析方法衡量的。我们核实基准在不同类型和规模的机器的离散时间站点上是否稳定。我们的评估显示,该基准具有近于最佳速度和分数的线级比例,以机器数反映AI上的总体计算能力。源码、规格和详细程序在GiH上是可公开查阅的。