The evolution of High-Performance Computing (HPC) platforms enables the design and execution of progressively larger and more complex workflow applications in these systems. The complexity comes not only from the number of elements that compose the workflows but also from the type of computations they perform. While traditional HPC workflows target simulations and modelling of physical phenomena, current needs require in addition data analytics (DA) and artificial intelligence (AI) tasks. However, the development of these workflows is hampered by the lack of proper programming models and environments that support the integration of HPC, DA, and AI, as well as the lack of tools to easily deploy and execute the workflows in HPC systems. To progress in this direction, this paper presents use cases where complex workflows are required and investigates the main issues to be addressed for the HPC/DA/AI convergence. Based on this study, the paper identifies the challenges of a new workflow platform to manage complex workflows. Finally, it proposes a development approach for such a workflow platform addressing these challenges in two directions: first, by defining a software stack that provides the functionalities to manage these complex workflows; and second, by proposing the HPC Workflow as a Service (HPCWaaS) paradigm, which leverages the software stack to facilitate the reusability of complex workflows in federated HPC infrastructures. Proposals presented in this work are subject to study and development as part of the EuroHPC eFlows4HPC project.
翻译:高性能计算平台的演变使得这些系统中的工作流程应用程序的设计和执行能够逐步扩大和复杂得多,其复杂性不仅来自构成工作流程的要素数量,而且来自它们所执行的计算类型。虽然传统的高常委会工作流程的目标是模拟和模拟物理现象,但目前的需求还需要数据分析(DA)和人工智能(AI)任务等额外数据分析(AI),然而,这些工作流程的发展受到以下因素的阻碍:缺乏支持高常委会、DA和AI一体化的适当方案拟定模式和环境,以及缺乏便于在高常委会系统中部署和执行工作流程的工具。关于这一方向的进展,本文件介绍了需要复杂工作流程的案例,并调查了为高常委会/DA/AI趋同而需要处理的主要问题。根据这项研究,本文件确定了管理复杂工作流程的新工作流程平台的挑战。最后,它提出了从两个方向处理这些挑战的工作流程平台的发展办法:第一,确定一个软件库,为管理这些复杂工作流程系统工作流程的功能提供软件库,4 将高常委会工作流程作为高常委会工作流程的流程的一部分,向高常委会的系统过渡。