Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on systems without a requisite loss in performance. While high-performance libraries often provide adequate performance within a node, distributed computing is required to scale Python across nodes and make it genuinely competitive in large-scale high-performance computing. Many frameworks, such as Charm4Py, DaCe, Dask, Legate Numpy, mpi4py, and Ray, scale Python across nodes. However, little is known about these frameworks' relative strengths and weaknesses, leaving practitioners and scientists without enough information about which frameworks are suitable for their requirements. In this paper, we seek to narrow this knowledge gap by studying the relative performance of two such frameworks: Charm4Py and mpi4py. We perform a comparative performance analysis of Charm4Py and mpi4py using CPU and GPU-based microbenchmarks other representative mini-apps for scientific computing.
翻译:Python 正在迅速成为机器学习和科学计算机的通用方言。随着Numpy、SciPy和TensorFlow等框架的广泛使用,科学计算和机器学习在系统上正在看到生产力的提高,而没有造成必要的性能损失。虽然高性能图书馆经常在一个节点内提供适当的性能,但需要进行分布式计算,以在节点之间推广Python,使其在大规模高性能计算中真正具有竞争力。许多框架,例如Charm4Py、DaCe、Dask、Legate Numpy、Mpi4Py和Ray, 以及Ray, 横跨节点的Python。然而,对这些框架的相对长处和弱点知之甚少,使从业人员和科学家无法充分了解哪些框架适合其需要。在本文件中,我们试图通过研究两个框架的相对性能来缩小这种知识差距:Charm4Py和mpi4py。我们利用CPU和GPUmibenchmarksdes的其他具有代表性的科学计算微型软件进行比较性分析。