Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python support in High Performance Computing (HPC) has skyrocketed. However, the Python language itself does not necessarily offer high performance. In this work, we present a workflow that retains Python's high productivity while achieving portable performance across different architectures. The workflow's key features are HPC-oriented language extensions and a set of automatic optimizations powered by a data-centric intermediate representation. We show performance results and scaling across CPU, GPU, FPGA, and the Piz Daint supercomputer (up to 23,328 cores), with 2.47x and 3.75x speedups over previous-best solutions, first-ever Xilinx and Intel FPGA results of annotated Python, and up to 93.16% scaling efficiency on 512 nodes.
翻译:Python语已成为科学计算的实际语言。 Python语的编程效率很高,这主要是因为它围绕NumPy 模块构建了丰富的科学导向软件生态系统。结果,对高性能计算机(HPC)中Python支持的需求急剧上升。然而,Python语本身并不一定能提供高性能。在这项工作中,我们展示了一个保住Python高生产率的工作流程,同时在不同结构中实现便携式性能。工作流程的主要特征是HPC导向语言扩展和一套由以数据为中心的中间代表力驱动的自动优化。我们展示了整个CPU、GPU、FPGA和Piz Daint超级计算机(高达23,328个核心)的性能和规模,其中2.47x和3.75x超前最佳解决方案的加速率,第一个附加注解的 Xilinx 和Intel FPGA结果, 512节点的提升效率达93.16%。