No area of computing is hungrier for performance than High Performance Computing (HPC), the demands of which continue to be a major driver for processor performance and adoption of accelerators, and also advances in memory, storage, and networking technologies. A key feature of the Intel processor domination of the past decade has been the extensive adoption of GPUs as coprocessors, whilst more recent developments have seen the increased availability of a number of CPU processors, including the novel ARM-based chips. This paper analyses the performance and scalability of a state-of-the-art Computational Fluid Dynamics (CFD) code on three HPC cluster systems equipped with AMD EPYC-Rome (EPYC, 4096 cores), ARM-based Marvell ThunderX2 (TX2, 8192 cores) and Intel Skylake (SKL, 8000 cores) processors. Three benchmark cases are designed with increasing computation-to-communication ratio and numerical complexity, namely lid-driven cavity flow, Taylor-Green vortex and a travelling solitary wave using the level-set method, adopted with $4^{th}$-order central-differences or a $5^{th}$-order WENO scheme. Our results show that the EPYC cluster delivers the best code performance for all the setups under consideration. In the first two benchmarks, the SKL cluster demonstrates faster computing times than the TX2 system, whilst in the solitary wave simulations, the TX2 cluster achieves good scalability and similar performance to the EPYC system, both improving on that obtained with the SKL cluster. These results suggest that while the Intel SKL cores deliver the best strong scalability, the associated cluster performance is lower compared to the EPYC system. The TX2 cluster performance is promising considering its recent addition to the HPC portfolio.
翻译:高性能计算(HPC)对于性能来说没有比高性能计算(HPC)(HPC)更困难的领域,而高性能计算(HPC)的需求继续是流程性能和采用加速器的主要驱动力,同时也是记忆、存储和网络技术方面的进步。 过去十年内英特尔处理器主宰的一个关键特点是广泛采用GPU作为协同处理器,而最近的事态发展表明,包括新型的ARM芯片在内的一些CPU处理器的可用性能有所增加。 本文分析了三个高水平的计算性能和可缩缩缩缩缩码(CFD)对三套高性能计算器的性能和可缩缩缩缩缩缩缩。 以AMYC-罗马(EPYC,4096核心 核心) AM-Marveell SunX2 (TX, 8192 核心) 和英特尔·SK(SK,8000 核心) 处理器的可用性能增加。 三个基准案例的设计是最近计算到通信比率比率和数字复杂,即液压流流、Tal-Rex-Revlex) 和移动的自动显示S-ral-ral-reck的S-lax的S-rol) 两种S-ral-ral-s的性能、S-ro化S-lax的S-rmax的S-roups的性能的S-s。