The recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the Python programming language on HPC systems. While Python provides many advantages for the users, it has not been designed with a focus on multi-user environments or parallel programming - making it quite challenging to maintain stable and secure Python workflows on a HPC system. In this paper, we analyze the key problems induced by the usage of Python on HPC clusters and sketch appropriate workarounds for efficiently maintaining multi-user Python software environments, securing and restricting resources of Python jobs and containing Python processes, while focusing on Deep Learning applications running on GPU clusters.
翻译:最近在计算集约机学习和数据分析方法方面的成功和广泛应用促进了在高频PC系统中使用Python编程语言,虽然Python为用户提供了许多优势,但设计它时没有将重点放在多用户环境或平行编程上,这使得在高频PC系统中维持稳定和安全的Python工作流程非常困难。在本文件中,我们分析了在高频PC群集上使用Python引起的关键问题,并勾画了适当的变通办法,以便有效地维护多用户Python软件环境,确保并限制Python工作的资源,并包含Python程序,同时侧重于在GPU群中运行的深学习应用程序。