The plethora of complex artificial intelligence (AI) algorithms and available high performance computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems emerges rapidly. The de facto HPC benchmark LINPACK can not reflect AI computing power and I/O performance without representative workload. The current popular AI benchmarks like MLPerf have fixed problem size therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning (AutoML), which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize operations per second (OPS), which is measured in an analytical and systematic approach, as the major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark's stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured), up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With flexible workload and single metric, our benchmark can scale and rank AI-HPC easily.
翻译:复杂的人工智能(AI)算法和现有高性能计算(HPC)的繁多复杂人工智能(AI)算法和现有高性能计算(HPC)能力刺激了AI组件的快速发展,因此,对AI-HPC系统的跨堆式性能基准要求的需求迅速出现。事实上的HPC基准LINPACK基准LINPACK不能反映AI计算能力和I/O性能,而没有代表工作量。MLPerf等当前流行的AI基准有固定的问题规模,因此其可扩展性有限。为了解决这些问题,我们提议采用一个端对端对端基准套,利用自动机器学习(AutomotMLML)的简单化标准套套件,不仅代表真实的AI假设,而且自动调整可适用于各种机器的规模。我们以高度平行和灵活的方式执行这些算法,以确保具有可定制配置配置的多种系统的效率和优化潜力。我们使用以分析和系统衡量的操作(OPS)作为量化AI业绩的主要衡量尺度。我们对各种系统进行评价,以确保基准的稳定性和可调适性,从32 NVIDIA Tesla Te-Teal Teasla T4 (56.1194和A-BA-BA-BA-SA-S-S-S-SA-SA-SA-S-S-S-SA-S-S-S-S-S-S-S-S-SA-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-A-A-A-A-A-A-A