High-performance computing (HPC) clusters are widely used in-house at scientific and academic research institutions. For some users, the transition from running their analyses on a single workstation to running them on a complex, multi-tenanted cluster, usually employing some degree of parallelism, can be challenging, if not bewildering, especially for users whose role is not predominantly computational in nature. On the other hand, there are more experienced users, who can benefit from pointers on how to get the best from their use of HPC. This Ten Simple Rules guide is aimed at helping you identify ways to improve your utilisation of HPC, avoiding common pitfalls that can negatively impact other users and will also help ease the load (pun intended) on your HPC sysadmin. It is intended to provide technical advice common to the use of HPC platforms such as LSF, Slurm, PBS/Torque, SGE, LoadLeveler and YARN, the scheduler used with Hadoop/Spark platform.
翻译:高性能计算(HPC)群集在科学和学术研究机构内部广泛使用。对于一些用户来说,从在单一工作站上进行分析到在复杂、多租赁的群集上进行分析,通常采用某种程度的平行,如果不是迷惑,特别是对于作用不以计算为主的用户来说,这种转变即使不是迷惑,也可能具有挑战性。另一方面,在如何从HPC得到最佳利用方面,有更多的有经验的用户可以从指针中受益。这本《十种简单规则》指南》旨在帮助你确定如何改进对HPC的利用,避免共同的陷阱,以免对其他用户产生不利影响,并有助于减轻HPC的系统(pun 预想)的负担。它旨在为HPC平台的使用提供常见的技术咨询,如LSF、Slurm、PBS/Torque、SGE、Malodationer和YARN,这是Htoop/Spoint平台使用的调度器。