The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019 to coordinate work being performed at LBNL to support this model, and to provide a coherent and comprehensive set of science requirements to drive existing and new work. A key component of the project was the in-depth engagements with eight science teams that represent challenging use cases across the DOE Office of Science. By the close of the project, we met our project goal by enabling our science application engagements to demonstrate automated pipelines that analyze data from remote facilities at large scale, without routine human intervention. In several cases, we have gone beyond demonstrations and now provide production-level services. To achieve this goal, the Superfacility team developed tools, infrastructure, and policies for near-real-time computing support, dynamic high-performance networking, data management and movement tools, API-driven automation, HPC-scale notebooks via Jupyter, authentication using Federated Identity and container-based edge services supported. The lessons we learned during this project provide a valuable model for future large, complex, cross-disciplinary collaborations. There is a pressing need for a coherent computing infrastructure across national facilities, and LBNL's Superfacility project is a unique model for success in tackling the challenges that will be faced in hardware, software, policies, and services across multiple science domains.
翻译:超级设施模型的设计是为了利用高频平台进行实验科学,它不仅仅是一个连接实验、网络和高频平台设施的模型,它包括基础设施、软件、工具和专门知识的完整生态系统,使连接设施易于使用。三年的劳伦斯·伯克利国家实验室(LBNL)超级设施模型项目于2019年启动,目的是协调在LBNL开展的工作,以支持这一模型,并提供一套连贯和全面的科学要求,以推动现有的和新的工作。该项目的一个关键组成部分是与八个科学小组深入接触,这八个科学小组代表了指定经营实体科学办公室具有挑战性的应用案例。在项目结束时,我们实现了我们的项目目标,通过科学应用活动展示自动化管道,分析来自远程设施的数据,而没有常规的人类干预。在几个案例中,我们超越了演示范围,现在提供了生产层面的服务。为了实现这一目标,超级设施模型团队开发了各种工具、基础设施和政策,在近实时计算机支持、动态高性联网、数据管理和移动工具、AIPI-S-S-S-Slorvac的系统化管道,在高频级的实验室、高频度的实验室和高频级的实验室上,一个支持了我们所了解到的高级自动化的计算机化的系统化的系统自动化、高频级的系统自动化、高频度的系统化、高频度的网络化的系统化的系统化的网络化的网络化的网络化的系统。