Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022.
翻译:科学工作流程已成为广泛科学计算应用的重要工具。科学发现越来越依赖于工作流程来编排大型和复杂的科学实验,这些实验涵盖从执行云端数据预处理管道到多设施仪器到边缘到高性能计算(HPC)计算工作流程的范围。鉴于科学计算的变化和新兴科学应用的不断发展,开发新的科学工作流程和系统功能必须致力于提高现有系统和应用的效率、韧性和普及性。特别是机器学习/人工智能(ML/AI)工作流程的广泛应用、需处理边缘仪器产生的大规模数据集、近实时数据处理的加强、长期实验活动的支持以及量子计算作为HPC的辅助工具的出现,极大改变了工作流系统的功能和操作要求。例如工作流系统现在需要支持边缘到云端到HPC的数据流、管理许多小尺寸的文件、在保证高准确性的同时进行数据缩减、编排分布式服务(工作流、仪器、数据移动、渊源、发布等)跨计算和用户设施等。此外,为了加速科学进展,这些系统还需要实现规范/标准和API,以实现系统和应用程序之间的无缝(横向和纵向)集成,并根据FAIR原则发布工作流程及其相关产品。本文报告了2022年11月29日至30日举行的工作流程社区峰会国际版的讨论和研究结果。