Novel artificial intelligence (AI) technology has expedited various scientific research, e.g., cosmology, physics and bioinformatics, inevitably becoming a significant category of workload on high performance computing (HPC) systems. Existing AI benchmarks tend to customize well-recognized AI applications, so as to evaluate the AI performance of HPC systems under predefined problem size, in terms of datasets and AI models. Due to lack of scalability on the problem size, static AI benchmarks might be under competent to help understand the performance trend of evolving AI applications on HPC systems, in particular, the scientific AI applications on large-scale systems. In this paper, we propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems with scaling the problem sizes of customized AI applications. To enable scalability, SAIH builds a set of novel mechanisms for augmenting problem sizes. As the data and model constantly scale, we can investigate the trend and range of AI performance on HPC systems, and further diagnose system bottlenecks. To verify our methodology, we augment a cosmological AI application to evaluate a real HPC system equipped with GPUs as a case study of SAIH.
翻译:新的人工智能技术加快了各种科学研究,例如宇宙学、物理学和生物信息学等,不可避免地成为高性能计算系统工作量的一个重要类别。现有的人工智能基准倾向于定制公认的人工智能应用,以便根据预定的问题大小,在数据集和人工智能模型方面,根据预先界定的问题大小,评价高常委会系统的AI性能。由于在问题大小上缺乏可缩放性,静态的人工智能基准可能有能力帮助理解在高性能计算系统上不断演进的人工智能应用的性能趋势,特别是在大型系统上的科学人工智能应用。在本文件中,我们提出了一种可扩缩的评价方法,用于分析高性能计算系统的AI性能趋势,以缩小定制的人工智能应用的麻烦大小。为了能够伸缩性,高级人工智能系统建立了一套新的机制,以扩大问题大小。作为数据和模型的不断扩展性,我们可以调查人工智能在高常态系统上的性能趋势和范围,并进一步诊断系统瓶颈。为了核实我们的方法,我们增加了一个宇宙性人工智能应用软件,以评价配备高性观测系统作为案例。