Within the last years, Python became more prominent in the scientific community and is now used for simulations, machine learning, and data analysis. All these tasks profit from additional compute power offered by parallelism and offloading. In the domain of High Performance Computing (HPC), we can look back to decades of experience exploiting different levels of parallelism on the core, node or inter-node level, as well as utilising accelerators. By using performance analysis tools to investigate all these levels of parallelism, we can tune applications for unprecedented performance. Unfortunately, standard Python performance analysis tools cannot cope with highly parallel programs. Since the development of such software is complex and error-prone, we demonstrate an easy-to-use solution based on an existing tool infrastructure for performance analysis. In this paper, we describe how to apply the established instrumentation framework \scorep to trace Python applications. We finish with a study of the overhead that users can expect for instrumenting their applications.
翻译:在过去几年里,Python在科学界变得更加突出,现在被用于模拟、机器学习和数据分析。所有这些任务都得益于平行和卸载提供的更多计算能力。在高性能计算(HPC)领域,我们可以回顾几十年的经验,在核心、节点或间节点一级利用不同程度的平行,以及使用加速器。通过使用业绩分析工具调查所有这些平行水平,我们可以调和应用,以达到前所未有的业绩。不幸的是,标准的Python性能分析工具无法应付高度平行的程序。由于这种软件的开发复杂和容易出错,我们展示了一种基于现有业绩分析工具基础设施的容易使用的解决办法。在本文中,我们描述了如何应用既定的仪器框架\ pocorp 追踪 Python 应用程序。我们完成了对用户在仪表应用中可以期望的间接费用的研究。