Alchemist is a system that allows Apache Spark to achieve better performance by interfacing with HPC libraries for large-scale distributed computations. In this paper, we highlight some recent developments in Alchemist that are of interest to Cray users and the scientific community in general. We discuss our experience porting Alchemist to container images and deploying it on Cray XC (using Shifter) and CS (using Singularity) series supercomputers and on a local Kubernetes cluster. Newly developed interfaces for Python, Dask, and PySpark enable the use of Alchemist with additional data analysis frameworks. We also briefly discuss the combination of Alchemist with RLlib, an increasingly popular library for reinforcement learning, and consider the benefits of leveraging HPC simulations in reinforcement learning. Finally, since data transfer between the client applications and Alchemist are the main overhead Alchemist encounters, we give a qualitative assessment of these transfer times with respect to different~factors.
翻译:Apache Spark 是一个系统,它使Apache Spark 能够通过与HPC 图书馆互连而取得更好的业绩,以便进行大规模分布式计算。在本文中,我们强调了对Cray用户和科学界感兴趣的Alchemist 近期的一些发展动态。我们讨论了我们的经验,将Alchemist 移植到容器图像中,并在Cray XC(使用 Shifter)和CS(使用奇数)系列超级计算机以及本地Kubernetes集群上部署。新开发的Python、Dask和PySpark的界面使Alchemist 能够使用更多的数据分析框架。我们还简要讨论了Alchemist 与RLlib(一个越来越受欢迎的强化学习图书馆)的结合,并审议了利用HPC模拟加强学习的好处。最后,由于客户应用程序和Alchemist 之间的数据传输是主要的间接Alterchemist 遭遇,我们从质量上评估不同用户的转移时间。