AIPerf:自动机器学习作为AI-HPC的基准 (AIPerf: Automated machine learning as an AI-HPC benchmark)

The plethora of complex artificial intelligence (AI) algorithms and available high performance computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems emerges rapidly. The de facto HPC benchmark LINPACK can not reflect AI computing power and I/O performance without representative workload. The current popular AI benchmarks like MLPerf have fixed problem size therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning (AutoML), which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize operations per second (OPS), which is measured in an analytical and systematic approach, as the major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark's stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured), up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With flexible workload and single metric, our benchmark can scale and rank AI-HPC easily.

翻译：复杂的人工智能(AI)算法和现有高性能计算(HPC)的繁多复杂人工智能(AI)算法和现有高性能计算(HPC)能力刺激了AI组件的快速发展,因此,对AI-HPC系统的跨堆式性能基准要求的需求迅速出现。事实上的HPC基准LINPACK基准LINPACK不能反映AI计算能力和I/O性能,而没有代表工作量。MLPerf等当前流行的AI基准有固定的问题规模,因此其可扩展性有限。为了解决这些问题,我们提议采用一个端对端对端基准套,利用自动机器学习(AutomotMLML)的简单化标准套套件,不仅代表真实的AI假设,而且自动调整可适用于各种机器的规模。我们以高度平行和灵活的方式执行这些算法,以确保具有可定制配置配置的多种系统的效率和优化潜力。我们使用以分析和系统衡量的操作(OPS)作为量化AI业绩的主要衡量尺度。我们对各种系统进行评价,以确保基准的稳定性和可调适性,从32 NVIDIA Tesla Te-Teal Teasla T4 (56.1194和A-BA-BA-BA-SA-S-S-S-SA-SA-SA-S-S-S-SA-S-S-S-S-S-S-S-S-SA-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-A-A-A-A-A-A-A-A

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

专知会员服务

39+阅读 · 2020年11月3日

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

【金融机器学习课程资料】Financial Machine Learning

专知会员服务

118+阅读 · 2019年12月24日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation